Skip to content

cici/temporal-error-handling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Temporal Error Handling — Simplified Propagation Pattern

This project demonstrates a clean error handling pattern for Temporal workflows in Java. It solves a common problem: how to propagate structured business error details from an Activity through Child Workflows to a Parent Workflow (and ultimately to an external client) without excessive try/catch nesting, casting, and instanceof chains.


Table of Contents

  1. The Problem
  2. The Solution
  3. Project Structure
  4. Design Pattern Walkthrough
  5. Why Not Use an Interceptor?
  6. Key Classes
  7. Error Flow Diagrams
  8. Prerequisites
  9. How to Build and Run
  10. Test Scenarios

The Problem

In a typical Temporal application with nested workflows, error handling can become deeply tangled. Consider this architecture:

External Client → Parent Workflow → Child Workflow → Activity

When an Activity fails with a business error (e.g., "invalid customer ID" or "connection timeout"), that error gets wrapped at every level of the chain:

ChildWorkflowFailure
  └─ ActivityFailure
       └─ ApplicationFailure (your actual error is buried here)

A naive approach to extracting the original error in the Parent Workflow leads to code like this:

catch (ChildWorkflowFailure e) {
    if (e.getCause() instanceof ActivityFailure cause) {
        if (cause.getCause() instanceof ApplicationFailure appFailure) {
            if ("WORKFLOW_USER_VALIDATE".equals(cause.getType())) {
                Type listType = new TypeReference<List<ActivityError>>() {}.getType();
                response = buildResponse(cause.getDetails().get(List.class, listType), true);
            } else {
                response = buildResponse(List.of(cause.getDetails().get(ActivityError.class)), false);
            }
        }
    } else if (e.getCause() instanceof ActivityFailure cause
               && cause.getCause() instanceof ApplicationFailure appFailure) {
        response = buildResponse(List.of(appFailure.getDetails().get(ActivityError.class)), false);
    } else {
        response = buildResponse(List.of(), false);
    }
    throw ApplicationFailure.newFailure("User management workflow failure", ...);
}

This is brittle, hard to read, and error-prone. Every new nesting level or failure type requires more instanceof checks and casting.

Additionally, if the Activity throws a custom business exception (not an ApplicationFailure), Temporal will convert it automatically — but during that conversion, the exception's class name becomes the type, and any structured details (like an ActivityError POJO) are lost. As Temporal's own team puts it: custom business exceptions happen "outside of Temporal's context," and the details get lost or require excessive wrapping to preserve.


The Solution

This project applies three design decisions that eliminate the complexity:

  1. Throw ApplicationFailure directly from the Activity — not a custom exception. This keeps everything within Temporal's failure context from the very start. Your structured ActivityError POJO is serialized into the failure's details and propagates cleanly through the entire chain.

  2. Let Child Workflows propagate failures naturally — no try/catch, no re-wrapping. Temporal automatically nests the failure into ChildWorkflowFailure → ActivityFailure → ApplicationFailure, and your details ride along untouched.

  3. Use a small utility (FailureUtils) to walk the cause chain — one method call replaces all the nested instanceof / getCause() / casting logic.

The result: the Parent Workflow's error handling goes from 20+ lines of nested conditionals down to this:

catch (ChildWorkflowFailure e) {
    Optional<ActivityError> error = FailureUtils.extractActivityError(e);
    boolean isNonRetryable = FailureUtils.isNonRetryableActivityError(e);
    // ... log and re-throw cleanly
}

Project Structure

temporal-error-handling/
├── build.gradle
├── settings.gradle
├── src/main/resources/
│   └── logback.xml
└── src/main/java/com/example/temporal/
    ├── model/
    │   ├── ActivityError.java          # Structured error detail POJO
    │   └── ActivityErrors.java         # Helper for throwing ApplicationFailure
    ├── activity/
    │   ├── UserProvisioningActivities.java      # Activity interface
    │   └── UserProvisioningActivitiesImpl.java  # Throws ApplicationFailure directly
    ├── util/
    │   └── FailureUtils.java           # Walks cause chain, extracts details
    ├── workflow/
    │   ├── ChildUserWorkflow.java               # Child interface
    │   ├── ChildUserWorkflowImpl.java           # No error handling — propagates naturally
    │   ├── ParentUserManagementWorkflow.java    # Parent interface
    │   └── ParentUserManagementWorkflowImpl.java # Clean extraction with FailureUtils
    ├── worker/
    │   └── TemporalWorker.java         # Starts the worker (no interceptor needed)
    └── starter/
        └── WorkflowStarter.java        # Client-side entry point for testing

Design Pattern Walkthrough

Layer 1: Activity — Throw ApplicationFailure Directly

The Activity is where errors originate. Instead of throwing a custom business exception (which Temporal would convert and lose details), we throw ApplicationFailure directly using the ActivityErrors helper:

// In UserProvisioningActivitiesImpl.java
catch (Exception e) {
    ActivityError error = new ActivityError("CREATE_USER", "CONNECTION_TIMEOUT", "customerId=" + id);

    // Retryable — Temporal will retry per your RetryOptions
    throw ActivityErrors.retryable("Failed to create user", error, e);

    // Or non-retryable — Temporal fails immediately, no retry
    throw ActivityErrors.nonRetryable("Failed to create user", error, e);
}

The ActivityErrors helper is a thin wrapper that creates ApplicationFailure with a consistent type tag ("ActivityErrorFailure") and your ActivityError POJO serialized in the details:

public static ApplicationFailure retryable(String message, ActivityError details, Throwable cause) {
    return ApplicationFailure.newFailureWithCause(message, ACTIVITY_ERROR_TYPE, cause, details);
}

The constant type tag is critical — it's how FailureUtils identifies your failures later in the cause chain without brittle casting.

Layer 2: Child Workflow — Do Nothing

The Child Workflow has no error handling code at all. It simply calls activities and lets any failures propagate upward naturally:

// In ChildUserWorkflowImpl.java
@Override
public String provisionUser(String customerId) {
    activities.validateUser(customerId);
    String resourceId = activities.createUser(customerId);
    return resourceId;
}

When createUser fails, Temporal automatically wraps the ApplicationFailure into an ActivityFailure, and then when the child workflow fails, it wraps that into a ChildWorkflowFailure. Your ActivityError POJO survives this entire chain intact inside the details.

Layer 3: Parent Workflow — Extract with FailureUtils

The Parent Workflow catches ChildWorkflowFailure and uses FailureUtils to extract the original error in a single call:

// In ParentUserManagementWorkflowImpl.java
catch (ChildWorkflowFailure e) {
    Optional<ActivityError> activityError = FailureUtils.extractActivityError(e);
    boolean isNonRetryable = FailureUtils.isNonRetryableActivityError(e);

    // Log with full structured detail
    activityError.ifPresentOrElse(
        ae -> log.error("Failed: step={}, code={}, context={}, nonRetryable={}",
            ae.getStep(), ae.getErrorCode(), ae.getContext(), isNonRetryable),
        () -> log.error("Failed with unknown error", e)
    );

    // Re-throw for external clients with the error in details
    throw ApplicationFailure.newFailure(
        "User management workflow failure",
        "WORKFLOW_USER_MANAGEMENT",
        activityError.orElse(new ActivityError("UNKNOWN", "UNKNOWN", customerId))
    );
}

FailureUtils.extractActivityError() walks the entire cause chain — regardless of depth — looking for an ApplicationFailure with the type tag "ActivityErrorFailure", then deserializes the ActivityError POJO from its details. This replaces all instanceof checks, getCause() chains, and casting.

Layer 4: External Client — Same Pattern

When an external client (like a management API) calls getResult() on the workflow, it catches WorkflowFailedException. The Parent Workflow re-packages the error under a different type ("WORKFLOW_USER_MANAGEMENT"), so the client uses extractActivityErrorFromWorkflow():

// In WorkflowStarter.java or your management client
catch (WorkflowFailedException e) {
    Optional<ActivityError> error = FailureUtils.extractActivityErrorFromWorkflow(e);
    // Use error.getStep(), error.getErrorCode(), error.getContext()
}

Why Not Use an Interceptor?

An earlier version of this pattern used an Activity Interceptor to convert a custom UpwException into ApplicationFailure. The interceptor's only job was that conversion — catching the custom exception and re-throwing it as ApplicationFailure with the ActivityError in the details.

By throwing ApplicationFailure directly from the Activity, that conversion happens at the source, and the interceptor becomes unnecessary. This is the approach Temporal recommends as best practice.

When an interceptor still makes sense:

  • You have a large existing codebase where many activities already throw custom exceptions, and refactoring them all is impractical.
  • You want centralized cross-cutting error enrichment (e.g., adding request IDs or metric tags to every failure).
  • You need to apply consistent retry/non-retry classification logic across many activities.

For new code or a targeted refactor, throwing ApplicationFailure directly is simpler, has fewer moving parts, and keeps the error handling logic visible where the error actually occurs.


Key Classes

Class Purpose
ActivityError POJO carrying structured error details (step, errorCode, context). Serialized into ApplicationFailure details.
ActivityErrors Static helper methods for creating ApplicationFailure with consistent type tags. Provides retryable() and nonRetryable() variants.
FailureUtils Walks Temporal's nested failure cause chain and extracts ActivityError in a single call. Replaces all manual instanceof/casting logic.

Error Flow Diagrams

Retryable Error (e.g., connection timeout)

Activity throws ApplicationFailure (retryable, type="ActivityErrorFailure")
  │
  ├── Temporal retries per RetryOptions (up to maxAttempts)
  │
  ▼ (all retries exhausted)
ActivityFailure wraps the ApplicationFailure
  │
  ▼
ChildWorkflowFailure wraps the ActivityFailure
  │
  ▼
Parent catches ChildWorkflowFailure
  → FailureUtils.extractActivityError(e) walks the chain
  → Returns Optional<ActivityError> with step, code, context
  → Parent re-throws as ApplicationFailure("WORKFLOW_USER_MANAGEMENT")
  │
  ▼
External client catches WorkflowFailedException
  → FailureUtils.extractActivityErrorFromWorkflow(e)
  → Gets the structured error details

Non-Retryable Error (e.g., invalid input)

Activity throws ApplicationFailure (non-retryable, type="ActivityErrorFailure")
  │
  ├── Temporal does NOT retry — fails immediately
  │
  ▼
ActivityFailure wraps the ApplicationFailure
  │
  ▼
ChildWorkflowFailure wraps the ActivityFailure
  │
  ▼
Parent catches ChildWorkflowFailure
  → FailureUtils.isNonRetryableActivityError(e) returns true
  → Same clean extraction as above

Prerequisites

  • Java 17+
  • Temporal CLI — for the local development server
    # macOS
    brew install temporal
    
    # Or see https://docs.temporal.io/cli

How to Build and Run

1. Start Temporal locally

temporal server start-dev

This starts the Temporal server on localhost:7233 and the Web UI on http://localhost:8233.

2. Build the project

cd temporal-error-handling
gradle build

3. Start the Worker (Terminal 1)

gradle run

The worker registers both workflows and the activity on the user-provisioning task queue, then polls for tasks.

4. Start a Workflow (Terminal 2)

# Happy path — completes successfully
gradle startWorkflow

# Retryable error — retries 3 times, then fails
gradle startWorkflow --args="FAIL_RETRYABLE"

# Non-retryable error — fails immediately, no retry
gradle startWorkflow --args="FAIL_NON_RETRYABLE"

5. Inspect in Temporal Web UI

Open http://localhost:8233 and navigate to the workflow execution to see the full event history, failure details, and retry attempts.


Test Scenarios

Happy Path (customer-123)

Activity: createUser succeeds → returns "resource-customer-123"
Child Workflow: returns resource ID
Parent Workflow: returns resource ID
Client: logs "Workflow succeeded! resourceId=resource-customer-123"

Retryable Failure (FAIL_RETRYABLE)

Activity: throws retryable ApplicationFailure (CONNECTION_TIMEOUT)
Temporal: retries 3 times (per RetryOptions), all fail
Child Workflow: fails with ChildWorkflowFailure
Parent Workflow: extracts ActivityError → step=CREATE_USER, code=CONNECTION_TIMEOUT
Client: logs step, code, and context from the structured error

Non-Retryable Failure (FAIL_NON_RETRYABLE)

Activity: throws non-retryable ApplicationFailure (INVALID_CUSTOMER_ID)
Temporal: does NOT retry — fails immediately
Child Workflow: fails with ChildWorkflowFailure
Parent Workflow: extracts ActivityError → step=CREATE_USER, code=INVALID_CUSTOMER_ID
Client: logs step, code, and context from the structured error

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors