Skip to content

RashmitTopG/durable-execution-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Durable Execution Engine (Java)

A native durable execution engine that allows developers to write normal code (using loops, conditionals, and standard concurrency) that becomes durable when wrapped in the engine. The engine ensures that workflows can be interrupted at any point and resume from the exact point of failure without re-executing previously completed steps.

Architecture

Core Components

  1. Step Primitive: Any function with side effects must be wrapped in a step call, which provides durability guarantees through checkpointing.

  2. Memoization: The engine checks a persistent SQLite database before executing a step. If the step has been executed before for that specific workflow instance, it returns the cached result.

  3. Sequence Management: The engine tracks a logical sequence ID for each step execution, allowing it to handle loops and conditional logic correctly, even when steps share the same name.

  4. Concurrency Support: Steps can run in parallel using Java's CompletableFuture, with thread-safe database operations.

Features

  • Durability: Workflows survive crashes and resume from the last completed step
  • Type Safety: Generic-based API supports any return type
  • Concurrency: Parallel step execution with thread-safe database operations
  • Zombie Step Protection: Handles crashes between execution and commit
  • No DSL Required: Write workflows in standard, idiomatic Java code
  • Automatic Sequence IDs: Internal atomic tracking supports loops without manual ID management

Non-Functional Requirements & Bonus Features

🚀 Performance Optimizations

  • SQLite WAL Mode: Enabled Write-Ahead Logging to allow concurrent read and write operations without blocking.
  • Busy Timeout: Configured a 5000ms busy timeout to handle lock contention gracefully during high concurrency.
  • Minimal Locking: Database locks are only held during atomic state transitions (checking result, marking in-progress, or storing final result).

🛡️ Security & Integrity

  • SQL Injection Protection: All database interactions use PreparedStatement with parameterized queries to ensure data security.
  • Type Safety: Utilizes Java Generics and TypeTokens to ensure compile-time type checking and safe deserialization of complex objects.
  • Atomic State Transitions: Uses a state machine (pending -> in_progress -> completed) to ensure that partially executed steps are never mistakenly treated as finished.

🧪 Resilience (The "Zombie Step" Problem)

The engine handles the case where a process crashes after a side effect has started but before its result is saved. By checkpointing the in_progress state, the engine detects interrupted steps upon restart and triggers a safe re-execution (Replay Pattern).

Prerequisites

  • Java 17 or higher
  • Maven 3.6 or higher

Installation

mvn clean install

Usage

Running a Workflow

mvn exec:java -Dexec.mainClass="Main" -Dexec.args="-workflow-id onboarding-001 -name 'Jane Smith' -email 'jane@example.com'"

Or compile and run directly:

mvn package
java -cp target/durable-execution-engine-1.0.0-jar-with-dependencies.jar Main -workflow-id onboarding-001 -name "Jane Smith" -email "jane@example.com"

Simulating a Crash

To test durability, you can interrupt the workflow at any time using Ctrl+C. The workflow will stop, and when you run the same command again, it will resume from where it left off.

# Start workflow
java -cp target/durable-execution-engine-1.0.0-jar-with-dependencies.jar Main -workflow-id onboarding-001

# Press Ctrl+C to simulate crash

# Resume workflow (same command)
java -cp target/durable-execution-engine-1.0.0-jar-with-dependencies.jar Main -workflow-id onboarding-001

Resetting a Workflow

To start fresh, use the -reset flag:

java -cp target/durable-execution-engine-1.0.0-jar-with-dependencies.jar Main -workflow-id onboarding-001 -reset

Example: Employee Onboarding Workflow

The example workflow demonstrates:

  1. Step 1: Create Record (Sequential)
  2. Step 2 & 3: Provision Laptop & Access (Parallel)
  3. Step 4: Send Welcome Email (Sequential)

Code Example

// Step 1: Create Record
Employee employee = dc.step("create_record", () -> {
    Employee emp = new Employee();
    emp.setId("EMP-001");
    emp.setName("John");
    return emp;
}, Employee.class);

// Step 2 & 3: Parallel execution
Map<String, DurableContext.StepFunction<String>> parallelSteps = new HashMap<>();
parallelSteps.put("provision_laptop", () -> "LAPTOP-001");
parallelSteps.put("provision_access", () -> "ACCESS-001");

Map<String, String> results = Concurrency.runParallel(dc, parallelSteps, String.class);

// Step 4: Send Email
dc.step("send_welcome_email", () -> {
    // Send email
    return true;
}, Boolean.class);

Sequence Tracking

The engine uses an atomic sequence counter to track step execution order. Each call to step increments the sequence number, creating a unique step_key by combining the step ID with the sequence number (e.g., "create_record:1", "create_record:2").

This approach handles:

  • Loops: Each iteration gets a unique sequence number
  • Conditionals: Steps in different branches get different sequence numbers
  • Parallel Steps: Each parallel step gets its own sequence number

Example with Loops

for (int i = 0; i < 3; i++) {
    final int index = i;
    String result = dc.step("process_item", () -> {
        // This will create step keys: "process_item:1", "process_item:2", "process_item:3"
        return "item-" + index;
    }, String.class);
}

Thread Safety

The engine ensures thread safety during parallel execution through multiple mechanisms:

1. Database-Level Concurrency

  • WAL Mode: SQLite is configured with Write-Ahead Logging (WAL) mode, which allows concurrent reads and writes
  • Busy Timeout: A 5-second busy timeout is set to handle SQLITE_BUSY errors gracefully
  • ReentrantLock Protection: Critical sections (database writes) are protected with ReentrantLock

2. Step Execution Flow

  1. Check Phase: Thread-safe read to check if step exists (no lock needed for reads in WAL mode)
  2. Mark In Progress: Atomic insert with lock protection
  3. Execute: Function execution (no database access)
  4. Store Result: Atomic update with lock protection

3. Zombie Step Handling

The engine protects against "zombie steps" (crashes between execution and commit) by:

  1. Marking steps as "in_progress" before execution
  2. On resume, detecting steps stuck in "in_progress" state
  3. Automatically marking them as failed and re-executing

Database Schema

CREATE TABLE steps (
    workflow_id TEXT NOT NULL,
    step_key TEXT NOT NULL,
    status TEXT NOT NULL,
    result BLOB,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (workflow_id, step_key)
);
  • workflow_id: Unique identifier for the workflow instance
  • step_key: Combination of step ID and sequence number (e.g., "create_record:1")
  • status: One of "in_progress", "completed", or "failed"
  • result: JSON-serialized result of the step execution

API Reference

step<T>(String id, StepFunction<T> fn, Class<T> clazz) throws Exception

Executes a function with durability guarantees. Returns cached result if step was previously executed.

Parameters:

  • id: Step identifier (can be reused in loops)
  • fn: Functional interface that returns type T
  • clazz: Class object for type T (for JSON deserialization)

Returns:

  • Result of type T if successful
  • Throws Exception if execution fails

runParallel<T>(DurableContext dc, Map<String, StepFunction<T>> steps, Class<T> clazz) throws Exception

Executes multiple steps in parallel using Java's CompletableFuture.

Parameters:

  • dc: Durable context for the workflow
  • steps: Map of step IDs to functions
  • clazz: Class object for type T

Returns:

  • Map of step IDs to results
  • Throws Exception if any step fails

DurableContext(String workflowID, Connection db) throws SQLException

Creates a new durable execution context.

Parameters:

  • workflowID: Unique workflow identifier
  • db: SQLite database connection

Testing Durability

  1. Start a workflow:

    java -cp target/durable-execution-engine-1.0.0-jar-with-dependencies.jar Main -workflow-id test-001
  2. Interrupt it during execution (Ctrl+C)

  3. Resume the same workflow:

    java -cp target/durable-execution-engine-1.0.0-jar-with-dependencies.jar Main -workflow-id test-001
  4. Observe that completed steps are skipped and the workflow resumes from the last incomplete step.

Project Structure

durable-execution-engine/
├── src/main/java/
│   ├── engine/
│   │   ├── DurableContext.java  # Core Step primitive and DurableContext
│   │   ├── Database.java         # Database initialization and schema
│   │   └── Concurrency.java     # Parallel execution support
│   ├── examples/
│   │   └── onboarding/
│   │       ├── Employee.java        # Employee data class
│   │       └── OnboardingWorkflow.java # Example employee onboarding workflow
│   └── Main.java                # CLI tool
├── pom.xml                      # Maven project definition
└── README.md                    # This file

Dependencies

  • org.xerial:sqlite-jdbc:3.44.1.0: SQLite JDBC driver for Java
  • com.google.code.gson:gson:2.10.1: JSON serialization library

Building

# Compile the project
mvn clean compile

# Package as JAR with dependencies
mvn clean package

# Run Unit Tests
mvn -Dtest=DurableEngineTest test

# Run Stress Test (100 concurrent workflows)
mvn -Dtest=StressTest test

# Run a Single Test Case
mvn -Dtest=DurableEngineTest#testDurableLoop test

License

This project is created for educational purposes as part of an assignment.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published