Distributed Scheduler

A production-ready, cloud-native distributed scheduling system built with Spring Boot 3, Quartz, and PostgreSQL.

Overview

This project implements a Distributed, Multi-Node, Database-Backed Scheduler that solves the core problem of running scheduled tasks reliably across multiple instances in a distributed system.

Core Features

Exactly-Once Execution: Guarantees only one node executes a job at a time across the cluster using database-backed locking
Database-Backed Persistence: All job definitions and execution history persisted to PostgreSQL
Dynamic Job Management: Create, update, pause, resume, and delete jobs via GraphQL API at runtime
Multiple Schedule Types: Support for CRON, FIXED_RATE, and FIXED_DELAY scheduling
Execution Logging: Complete audit trail of all job executions with status and error tracking
Cluster-Safe: Built on Quartz in cluster mode with automatic failover and recovery
Smart Reconciliation: Automatically repairs scheduler state on node startup
GraphQL API: Type-safe, fully-featured API for job management and observability
Load Balancing: NGINX load balancer distributes traffic across multiple scheduler instances
Dynamic Instance Scaling: Auto-scaling support with Eureka service discovery for instance registration/deregistration
Horizontally Scalable: Add or remove scheduler instances without code changes or downtime

UI Dashboard

The project includes a fully interactive web UI dashboard built using:

React + Vite
Apollo GraphQL
React Query
TailwindCSS + ShadCN UI
NGINX static hosting
Docker Compose deployment

The UI provides complete visibility and control over the distributed scheduler cluster.

Job List

Shows all scheduled jobs with real-time status, schedule type, timestamps, and actions (Run Now, Pause, Resume, Delete, View Logs).

Screenshot:

Create Job

Supports all schedule types:

CRON
FIXED_RATE
FIXED_DELAY

It includes dynamic form fields, a JSON payload editor, and validation.

Screenshot:

Execution Logs Screen

Displays execution history for a job including:

Fire time
Status (SUCCESS / FAILED)
Error message (if any)

Screenshot:

Why Not Spring's @Scheduled?

Traditional in-memory schedulers fail in distributed environments:

Problem	@Scheduled	Distributed Scheduler
Execution Count	Runs on every node (duplicates)	Exactly-once execution
Job Persistence	Lost on restart	Persisted to database
Dynamic Management	Can't add/remove jobs at runtime	Full GraphQL API
Execution History	No audit trail	Complete execution logs
Cluster Awareness	No consensus mechanism	DB-backed distributed locking
Scalability	Not designed for clusters	Horizontally scalable
Observability	Limited visibility	Query job status & history

Architecture

High-Level Design

                      +------------------------+
                      |   GraphQL API Layer    |
                      +-----------+------------+
                                  |
                                  v
               +---------------------------------------+
               |     Application Services (Use Cases)  |
               |  - CreateJobService                   |
               |  - PauseJobService                    |
               |  - ResumeJobService                   |
               |  - DeleteJobService                   |
               |  - RunJobNowService                   |
               +--------------------+-------------------+
                                   |
      +----------------------------+-----------------------------+
      |                                                          |
      v                                                          v
+---------------------------+              +----------------------+
| Persistence Layer         |              | Quartz Integration   |
| - JobRepository           |              | - QuartzScheduler    |
| - ExecutionLogRepository  |              | - ReconciliationSvc  |
| - Domain Models           |              |                      |
| - JPA Entities            |              | Database Locking:    |
+---------------------------+              | - Cluster leadership |
      |                                    | - Failover handling  |
      v                                    | - Misfire policies   |
+---------------------------+              +----------------------+
| PostgreSQL Database       |                     |
| - job_definition          |                     |
| - job_execution_log       |                     |
| - QRTZ_* (Quartz tables)  |<-------------------+
+---------------------------+

Design Patterns

Hexagonal Architecture: Clear separation between API, application, domain, and infrastructure layers
Domain-Driven Design: Rich domain models (JobDefinition, JobExecutionLog) with business logic
Repository Pattern: Data access abstraction via repository interfaces
Adapter Pattern: Quartz scheduler integrated via dedicated QuartzSchedulingAdapter
Service Layer: Application services orchestrate between domain and infrastructure

Directory Structure

scheduler-instance/
├── src/main/java/com/example/scheduler/
│   ├── DistributedSchedulerApplication.java    # Spring Boot entry point
│   ├── api/
│   │   └── graphql/                            # GraphQL controllers
│   ├── application/
│   │   ├── dto/                                # DTOs for API
│   │   ├── mapper/                             # Entity ↔ DTO mapping
│   │   └── service/                            # Application services
│   ├── domain/
│   │   ├── model/                              # JobDefinition, JobExecutionLog
│   │   └── port/                               # Repository interfaces
│   ├── infrastructure/
│   │   ├── persistence/                        # JPA entities, repositories
│   │   └── quartz/                             # Quartz scheduler, adapter
│   └── config/                                 # Spring configuration
├── src/main/resources/
│   ├── application.yml                         # Spring Boot config
│   ├── graphql/jobs.graphqls                   # GraphQL schema
│   └── db/migration/                           # Flyway migrations
└── pom.xml                                     # Maven dependencies

Data Model

job_definition Table

Stores the definition of each scheduled job:

CREATE TABLE job_definition (
    id                    UUID PRIMARY KEY,
    name                  VARCHAR(255) NOT NULL,
    schedule_type         VARCHAR(50) NOT NULL,     -- CRON, FIXED_RATE, FIXED_DELAY
    cron_expression       VARCHAR(255),             -- for CRON jobs
    interval_seconds      BIGINT,                   -- for FIXED_RATE/FIXED_DELAY
    initial_delay_seconds BIGINT,                   -- for FIXED_DELAY
    payload               JSONB,                    -- custom job data
    status                VARCHAR(32) NOT NULL,     -- ACTIVE, PAUSED, DELETED
    version               INTEGER DEFAULT 0,        -- for optimistic locking
    created_at            TIMESTAMP DEFAULT now(),
    updated_at            TIMESTAMP DEFAULT now()
);

job_execution_log Table

Tracks every execution of every job:

CREATE TABLE job_execution_log (
    id             UUID PRIMARY KEY,
    job_id         UUID NOT NULL,
    fire_time      TIMESTAMP NOT NULL,
    status         VARCHAR(32) NOT NULL,     -- SUCCESS, FAILED
    error_message  TEXT,
    created_at     TIMESTAMP DEFAULT now()
);

Quartz Tables

Standard Quartz cluster schema (managed by Flyway migration V1):

QRTZ_JOBS - Job details
QRTZ_TRIGGERS - Trigger definitions
QRTZ_CALENDARS - Calendar exclusions
QRTZ_LOCKS - Distributed locking for cluster coordination
And 7+ more tables for complete cluster support

GraphQL API

Complete Schema

All endpoints are at POST /scheduler-instance/graphql

Queries

type Query {
  """List all jobs"""
  jobs: [Job!]!
  
  """Get a single job by ID"""
  job(id: ID!): Job
  
  """Get execution logs for a job"""
  jobLogs(jobId: ID!): [JobExecutionLog!]!
}

Mutations

type Mutation {
  """Create and schedule a job"""
  createJob(input: CreateJobInput!): Job!
  
  """Pause a job (stops execution, keeps definition)"""
  pauseJob(id: ID!): Job!
  
  """Resume a paused job"""
  resumeJob(id: ID!): Job!
  
  """Delete a job and unschedule it"""
  deleteJob(id: ID!): Boolean!
  
  """Fire a job immediately on the cluster"""
  runJobNow(id: ID!): Boolean!
}

Types

enum JobScheduleType {
  CRON
  FIXED_RATE
  FIXED_DELAY
}

enum JobStatus {
  ACTIVE
  PAUSED
  DELETED
}

type Job {
  id: ID!
  name: String!
  scheduleType: JobScheduleType!
  cronExpression: String
  intervalSeconds: Int
  initialDelaySeconds: Int
  payload: String
  status: JobStatus!
  version: Int!
  createdAt: String
  updatedAt: String
}

type JobExecutionLog {
  id: ID!
  jobId: ID!
  fireTime: String!
  status: String!
  errorMessage: String
  createdAt: String!
}

input CreateJobInput {
  name: String!
  scheduleType: JobScheduleType!
  cronExpression: String
  intervalSeconds: Int
  initialDelaySeconds: Int
  payload: JSON
}

Example Requests

Create a CRON Job

mutation CreateJob($input: CreateJobInput!) {
  createJob(input: $input) {
    id
    name
    scheduleType
    cronExpression
    status
  }
}

Variables:

{
  "input": {
    "name": "Daily Report Generation",
    "scheduleType": "CRON",
    "cronExpression": "0 2 * * *",
    "payload": {
      "reportType": "daily",
      "recipients": ["admin@example.com"]
    }
  }
}

List All Jobs

query {
  jobs {
    id
    name
    scheduleType
    status
  }
}

Get Job Execution History

query GetLogs($jobId: ID!) {
  jobLogs(jobId: $jobId) {
    id
    fireTime
    status
    errorMessage
  }
}

Getting Started

Prerequisites

Java 17+
Maven 3.8+
Docker & Docker Compose
PostgreSQL 14+ (or use Docker)

Quick Start with Docker

Clone and navigate to the project:
```
cd distributed-scheduler
```
Start all services:
```
docker-compose up --build
```
This starts:
- PostgreSQL (port 5432)
- Eureka Service Registry (port 8761)
- Multiple Scheduler Instances (port 8081+)
- NGINX Load Balancer (port 80)

Access the GraphQL playground:

http://localhost:8081/scheduler-instance/graphql

Create a test job:

curl -X POST http://localhost:8081/scheduler-instance/graphql \
  -H "Content-Type: application/json" \
  -d '{
    "query": "mutation { createJob(input: {name: \"Test Job\", scheduleType: CRON, cronExpression: \"0/10 * * * * ?\", payload: {type: \"test\"}}) { id name status } }"
  }'

Running Locally (Without Docker)

Start PostgreSQL:

# If using Homebrew on macOS
brew services start postgresql

# Create the database
createdb -U postgres scheduler

Configure PostgreSQL connection:
- Edit scheduler-instance/src/main/resources/application.yml
- Update datasource URL, username, password
Build the project:
```
mvn clean package
```

Run the scheduler:

java -jar scheduler-instance/target/distributed-scheduler-1.0-SNAPSHOT.jar \
  --spring.profiles.active=local

Access GraphQL:

http://localhost:8081/scheduler-instance/graphql

Configuration

Environment Profiles

Docker Profile (docker):

Uses PostgreSQL at postgres:5432
Registers with Eureka at eureka:8761
Enables clustering

Local Profile (local):

Uses local PostgreSQL at localhost:5432
Disables Eureka registration (single instance)
Useful for development

Key Configuration Properties

server.port: 8081                                    # Server port
spring.datasource.url: jdbc:postgresql://...         # Database URL
spring.jpa.hibernate.ddl-auto: validate              # Don't modify schema
spring.flyway.locations: classpath:db/migration      # Migration scripts

org.quartz.scheduler.instanceId: AUTO                # Cluster instance ID
org.quartz.jobStore.isClustered: true                # Enable clustering
org.quartz.jobStore.clusterCheckinInterval: 10000    # Heartbeat interval
org.quartz.threadPool.threadCount: 10                # Job execution threads

What's Implemented

Phase 1: Quartz Clustering [COMPLETE]

Multi-instance Quartz with PostgreSQL job store
Cluster locking and failover
Heartbeat-based node detection
Verified exactly-once execution

Phase 2: Domain Model & Persistence [COMPLETE]

Clean domain entities (JobDefinition, JobExecutionLog)
JPA repository implementations
Flyway-managed database schema
Reconciliation on startup

Phase 3: GraphQL API [COMPLETE]

Full job lifecycle via mutations (create, pause, resume, delete)
Job queries and execution log retrieval
Type-safe schema with scalars (JSON)
Error handling and validation

Phase 4: Infrastructure & Operations [COMPLETE]

NGINX load balancer with dynamic upstream configuration
Eureka service discovery for instance registration
Eureka2NGINX reconciler for automatic load balancer updates
Docker Compose orchestration with all services
Postman collection for API testing
Health checks and auto-recovery

Phase 5: UI Dashboard [COMPLETE]

Real-time job monitoring
Visual job creation/editing forms
Execution timeline
Node status dashboard :(

How It Works: End-to-End Flow

Job Creation Flow

1. User sends GraphQL mutation: createJob(...)
   ↓
2. GraphQL controller validates input
   ↓
3. CreateJobService orchestrates:
   a. Create JobDefinition domain object
   b. Save to job_definition table
   c. Call QuartzSchedulingAdapter.schedule()
   ↓
4. QuartzSchedulingAdapter:
   a. Converts JobDefinition to Quartz Trigger
   b. Submits trigger to Quartz Scheduler
   ↓
5. Quartz (in cluster mode):
   a. Stores trigger in QRTZ_TRIGGERS
   b. Acquires cluster lock
   c. Propagates to all nodes
   ↓
6. Response returned to client with job ID

Job Execution Flow

1. Quartz detects trigger fire time reached
   ↓
2. Acquires QRTZ_LOCKS for exactly-once guarantee
   ↓
3. Custom Job handler executes business logic
   ↓
4. ExecutionLogRepository logs result:
   - job_id, fire_time, status, error_message
   ↓
5. Other cluster nodes skip (no lock acquired)
   ↓
6. Result queryable via jobLogs(jobId) GraphQL query

Reconciliation on Startup

1. DistributedSchedulerApplication starts
   ↓
2. Flyway applies pending migrations
   ↓
3. Quartz initializes with clustered job store
   ↓
4. ReconciliationService runs:
   a. Queries all ACTIVE jobs from database
   b. For each job, checks if Quartz trigger exists
   c. If missing, reschedules the trigger
   d. If orphaned trigger, removes it
   ↓
5. Cluster now has consistent state
   ↓
6. Application ready to process requests

Testing

Manual Testing with Postman

Open postman/collections/New Collection.postman_collection.json
Set environment variable: baseURL=http://localhost:8081/scheduler-instance
Run requests in sequence:
- Create Job (creates a new job)
- List Jobs (saves job IDs to environment)
- Get Job By ID (retrieves specific job)
- Pause Job (pauses execution)
- Resume Job (resumes execution)
- Run Job Now (triggers immediate execution)
- Get Job Logs (views execution history)
- Delete Job (removes job)

Verifying Multi-Node Execution

Start multiple scheduler instances via Docker Compose
Create a job with frequent execution (e.g., 0/5 * * * * ? - every 5 seconds)
Query execution logs: SELECT * FROM job_execution_log ORDER BY fire_time DESC
Verify only one instance appears per fire time (exactly-once guarantee)

Observability & Debugging

View Execution History

-- Latest 10 executions of a job
SELECT * FROM job_execution_log 
WHERE job_id = 'YOUR_JOB_ID' 
ORDER BY fire_time DESC 
LIMIT 10;

-- Failed executions
SELECT * FROM job_execution_log 
WHERE status = 'FAILED' 
ORDER BY fire_time DESC;

-- Job execution count by day
SELECT DATE(fire_time), COUNT(*) as executions 
FROM job_execution_log 
GROUP BY DATE(fire_time) 
ORDER BY DATE(fire_time) DESC;

Check Quartz Cluster Status

-- Active cluster nodes
SELECT * FROM QRTZ_SCHEDULER_STATE;

-- Scheduled triggers
SELECT * FROM QRTZ_TRIGGERS;

-- Current locks
SELECT * FROM QRTZ_LOCKS;

Application Logs

By default, logs are written to stdout. Key log messages:

INFO  Quartz Scheduler started
INFO  ReconciliationService: Reconciling active jobs
INFO  Job 'Daily Report' scheduled successfully
INFO  Job execution started: jobId=...
INFO  Job execution completed: status=SUCCESS, duration=2500ms

Development Guide

Adding a New Schedule Type

Add to JobScheduleType enum in domain model
Update GraphQL schema (jobs.graphqls)
Implement conversion in QuartzSchedulingAdapter
Add test cases

Adding a New Mutation

Add to GraphQL schema (jobs.graphqls)
Create application service (e.g., UpdateJobScheduleService)
Implement GraphQL controller
Add integration tests

Database Migrations

Create new file: src/main/resources/db/migration/VN__description.sql
Flyway auto-applies on next startup
Use standard DDL (CREATE, ALTER, etc.)

Technology Stack

Layer	Technology	Version
Framework	Spring Boot	3.2.0+
Scheduler	Quartz	2.3.x
Database	PostgreSQL	14+
Migrations	Flyway	10.x
ORM	JPA/Hibernate	6.x
API	GraphQL (Spring GraphQL)	1.x
Build	Maven	3.8+
Java	OpenJDK/Oracle	17+

Troubleshooting

Job not executing

Check job status: query { job(id: "...") { status } }
Verify schedule: SELECT * FROM QRTZ_TRIGGERS WHERE JOB_KEY = '...'
Check for errors: SELECT * FROM job_execution_log WHERE status = 'FAILED'

Duplicate executions on different nodes

Ensure org.quartz.jobStore.isClustered: true
Verify QRTZ_LOCKS table has entries
Check cluster checkin interval: should be < job frequency

No database connection

Verify PostgreSQL is running: psql -U scheduler -d scheduler
Check connection string in application.yml
Ensure network connectivity to database host

Migrations not applied

Check flyway_schema_history table
Ensure migration files are in resources/db/migration/
Check file naming: VN__description.sql

License

Internal project / Proprietary

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.idea		.idea
.postman		.postman
distributed-scheduler-ui		distributed-scheduler-ui
infra		infra
postman		postman
scheduler-instance		scheduler-instance
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

License

docflex/Distributed-Scheduler

Folders and files

Latest commit

History

Repository files navigation

Distributed Scheduler

Overview

UI Dashboard

Job List

Create Job

Execution Logs Screen

Why Not Spring's @Scheduled?

Architecture

High-Level Design

Design Patterns

Directory Structure

Data Model

job_definition Table

job_execution_log Table

Quartz Tables

GraphQL API

Complete Schema

Queries

Mutations

Types

Example Requests

Getting Started

Prerequisites

Quick Start with Docker

Running Locally (Without Docker)

Configuration

Environment Profiles

Key Configuration Properties

What's Implemented

Phase 1: Quartz Clustering [COMPLETE]

Phase 2: Domain Model & Persistence [COMPLETE]

Phase 3: GraphQL API [COMPLETE]

Phase 4: Infrastructure & Operations [COMPLETE]

Phase 5: UI Dashboard [COMPLETE]

How It Works: End-to-End Flow

Job Creation Flow

Job Execution Flow

Reconciliation on Startup

Testing

Manual Testing with Postman

Verifying Multi-Node Execution

Observability & Debugging

View Execution History

Check Quartz Cluster Status

Application Logs

Development Guide

Adding a New Schedule Type

Adding a New Mutation

Database Migrations

Technology Stack

Troubleshooting

Job not executing

Duplicate executions on different nodes

No database connection

Migrations not applied

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages