Skip to content

Latest commit

 

History

History
423 lines (309 loc) · 10.4 KB

File metadata and controls

423 lines (309 loc) · 10.4 KB

Event Partitioning Documentation

Overview

The workflow_events table uses PostgreSQL native table partitioning to handle the large volume of event data generated by workflow executions. This document describes the partitioning strategy, maintenance procedures, and best practices.

Partitioning Strategy

Partition Type

  • Method: Range partitioning by timestamp (ts column)
  • Granularity: Monthly partitions
  • Naming Convention: workflow_events_y{YYYY}m{MM} (e.g., workflow_events_y2025m01)

Why Partitioning?

The workflow_events table is the primary event sourcing log for MeshHook. It stores:

  • Every state change in workflow execution
  • All node execution events
  • Debugging and replay information

This table will grow rapidly in production environments, making partitioning essential for:

  1. Query Performance: Partition pruning allows PostgreSQL to scan only relevant partitions
  2. Maintenance: Easier to archive or drop old data by partition
  3. Index Efficiency: Smaller indexes per partition improve performance
  4. Backup/Restore: Partition-level operations are faster
  5. Storage Management: Better control over data lifecycle

Architecture

Table Structure

CREATE TABLE workflow_events (
  id bigserial,
  run_id uuid NOT NULL,
  ts timestamptz NOT NULL DEFAULT now(),
  type text NOT NULL,
  payload jsonb NOT NULL,
  created_at timestamptz NOT NULL DEFAULT now(),
  PRIMARY KEY (id, ts)
) PARTITION BY RANGE (ts);

Key Points:

  • Primary key includes both id and ts (required for partitioned tables)
  • Foreign key to workflow_runs(id) with cascade delete
  • Partitioned by ts (timestamp) column

Partition Boundaries

Each partition covers one calendar month:

  • Start: First day of month at 00:00:00
  • End: First day of next month at 00:00:00 (exclusive)

Example:

workflow_events_y2025m01: [2025-01-01, 2025-02-01)
workflow_events_y2025m02: [2025-02-01, 2025-03-01)

Automatic Partition Management

Initial Setup

The migration creates partitions for:

  • 2 months in the past
  • Current month
  • 3 months in the future

This ensures smooth operation during initial deployment.

Maintenance Function

The maintain_workflow_events_partitions() function ensures future partitions exist:

SELECT maintain_workflow_events_partitions();

What it does:

  • Checks for existing future partitions
  • Creates partitions for the next 3 months if needed
  • Automatically creates indexes on new partitions

Scheduling:

  • If pg_cron extension is available, runs daily at 2 AM
  • Otherwise, must be scheduled externally (cron, systemd timer, etc.)

Manual Partition Creation

To create a partition for a specific month:

SELECT create_workflow_events_partition('2025-06-01'::date);

This creates the partition for June 2025 with all necessary indexes.

Indexes

Each partition automatically receives these indexes:

  1. {partition}_run_id_idx: Index on run_id for filtering by workflow run
  2. {partition}_run_ts_idx: Composite index on (run_id, ts) for time-range queries
  3. {partition}_type_idx: Index on type for filtering by event type

These indexes are created automatically when partitions are created.

Querying Partitioned Data

Standard Queries

Queries work exactly as before - PostgreSQL handles partition routing automatically:

-- Get all events for a run
SELECT * FROM workflow_events 
WHERE run_id = 'some-uuid'
ORDER BY ts;

-- Get events in a time range
SELECT * FROM workflow_events
WHERE ts >= '2025-01-01' AND ts < '2025-02-01';

Optimized Query Function

Use the provided helper function for better performance:

-- Get events for a run with optional time filtering
SELECT * FROM get_workflow_run_events(
  'run-uuid'::uuid,
  '2025-01-01'::timestamptz,  -- optional start
  '2025-01-31'::timestamptz   -- optional end
);

Partition Pruning

PostgreSQL automatically prunes (skips) irrelevant partitions when:

  • WHERE clause includes the partition key (ts)
  • Time range is specified

Example:

-- Only scans January 2025 partition
SELECT * FROM workflow_events
WHERE ts >= '2025-01-15' AND ts < '2025-01-20';

Monitoring

View Partition Information

-- Simple view of all partitions
SELECT * FROM workflow_events_partition_info;

Returns:

  • Partition name
  • Size (human-readable)
  • Row count estimate

Detailed Statistics

-- Detailed partition statistics
SELECT * FROM get_workflow_events_partition_stats();

Returns:

  • Partition name
  • Date range
  • Exact row count
  • Size in bytes and human-readable format

Check Partition Health

-- List all partitions with their boundaries
SELECT 
  tablename,
  pg_get_expr(c.relpartbound, c.oid) as bounds
FROM pg_tables t
JOIN pg_class c ON c.relname = t.tablename
WHERE schemaname = 'public'
  AND tablename LIKE 'workflow_events_y%'
ORDER BY tablename;

Data Retention

Dropping Old Partitions

To remove old data, drop entire partitions:

-- Drop partitions older than 12 months (default)
SELECT drop_old_workflow_events_partitions();

-- Drop partitions older than 6 months
SELECT drop_old_workflow_events_partitions(6);

Warning: This permanently deletes data. Ensure backups exist before dropping partitions.

Manual Partition Drop

-- Drop a specific partition
DROP TABLE workflow_events_y2024m01;

Archiving Strategy

Before dropping old partitions, consider archiving:

  1. Export to file:
COPY workflow_events_y2024m01 TO '/path/to/archive/2024-01.csv' CSV HEADER;
  1. Move to archive table:
CREATE TABLE workflow_events_archive (LIKE workflow_events INCLUDING ALL);
INSERT INTO workflow_events_archive SELECT * FROM workflow_events_y2024m01;
DROP TABLE workflow_events_y2024m01;

Maintenance Procedures

Daily Maintenance

Run automatically via pg_cron or external scheduler:

SELECT maintain_workflow_events_partitions();

Monthly Maintenance

  1. Verify new partition was created
  2. Check partition sizes
  3. Consider archiving old partitions
-- Check partition sizes
SELECT * FROM workflow_events_partition_info
ORDER BY partition_name DESC
LIMIT 6;

Quarterly Maintenance

  1. Review retention policy
  2. Archive or drop old partitions
  3. Analyze partition performance
-- Analyze all partitions
DO $$
DECLARE
  partition_name text;
BEGIN
  FOR partition_name IN 
    SELECT tablename FROM pg_tables 
    WHERE schemaname = 'public' 
    AND tablename LIKE 'workflow_events_y%'
  LOOP
    EXECUTE 'ANALYZE ' || partition_name;
  END LOOP;
END $$;

Performance Considerations

Query Optimization

  1. Always include time range when possible:
-- Good: Uses partition pruning
SELECT * FROM workflow_events
WHERE run_id = 'uuid' AND ts >= '2025-01-01';

-- Less optimal: Scans all partitions
SELECT * FROM workflow_events
WHERE run_id = 'uuid';
  1. Use appropriate indexes:
  • Single run queries: run_id index
  • Time-range queries: run_ts composite index
  • Event type filtering: type index
  1. Avoid full table scans:
-- Bad: Scans all partitions
SELECT COUNT(*) FROM workflow_events;

-- Better: Query specific partition
SELECT COUNT(*) FROM workflow_events_y2025m01;

Partition Size Guidelines

  • Optimal size: 10-100 GB per partition
  • Maximum recommended: 200 GB per partition
  • If partitions grow too large: Consider daily or weekly partitioning

Monitor partition growth:

SELECT 
  partition_name,
  size_pretty,
  row_count
FROM get_workflow_events_partition_stats()
ORDER BY partition_name DESC
LIMIT 3;

Troubleshooting

Partition Not Found Error

Error: no partition of relation "workflow_events" found for row

Cause: Trying to insert data outside existing partition ranges

Solution:

-- Create partition for the required date
SELECT create_workflow_events_partition('2025-12-01'::date);

Slow Queries

Symptoms: Queries taking longer than expected

Diagnosis:

-- Check if partition pruning is working
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM workflow_events 
WHERE ts >= '2025-01-01' AND ts < '2025-02-01';

Look for "Partitions removed" in the output.

Solutions:

  1. Ensure WHERE clause includes ts column
  2. Verify indexes exist on partitions
  3. Run ANALYZE on partitions

Missing Future Partitions

Symptoms: Inserts failing for current/future dates

Solution:

-- Manually create future partitions
SELECT maintain_workflow_events_partitions();

-- Or create specific partition
SELECT create_workflow_events_partition(CURRENT_DATE + interval '1 month');

Migration Notes

From Non-Partitioned Table

The migration automatically:

  1. Creates new partitioned table structure
  2. Migrates existing data (if any)
  3. Drops old table
  4. Renames partitioned table

Downtime: Minimal for small datasets, plan maintenance window for large datasets

Rollback Procedure

If rollback is needed:

  1. Export data from partitioned table
  2. Create non-partitioned table
  3. Import data
  4. Update application

Note: Rollback is complex. Test thoroughly before production deployment.

Best Practices

  1. Monitor partition creation: Ensure maintain_workflow_events_partitions() runs regularly
  2. Set retention policy: Define how long to keep event data
  3. Archive before dropping: Always backup data before deletion
  4. Include time filters: Always filter by ts when possible for better performance
  5. Regular ANALYZE: Run ANALYZE on partitions after bulk inserts
  6. Monitor partition sizes: Alert when partitions exceed size thresholds
  7. Test queries: Verify partition pruning with EXPLAIN ANALYZE

Verification

Run the verification script to ensure partitioning is working correctly:

node scripts/verify-event-partitioning.js

This script checks:

  • Table is partitioned
  • Partitions exist
  • Functions are available
  • Event insertion works
  • Queries execute correctly

References