Database Partitioning Documentation

This document provides a detailed explanation of the PostgreSQL database partitioning simulation project.

Overview
What is Database Partitioning?
Project Architecture
Docker Compose Setup
SQL Schema Explanation
Go Application Walkthrough
Running the Project

Overview

This project demonstrates three PostgreSQL partitioning strategies:

Strategy	Table	Partition Key	Use Case
Range	`orders`	`order_date`	Time-series data, logs, historical records
Hash	`users`	`id`	Even distribution across partitions
List	`products`	`category`	Categorical data with known values

What is Database Partitioning?

Partitioning splits a large table into smaller, more manageable pieces called partitions. Each partition stores a subset of the data based on a partition key.

Benefits

Improved Query Performance: Queries only scan relevant partitions (partition pruning)
Easier Maintenance: Archive or drop old partitions without affecting other data
Parallel Processing: Operations can run on multiple partitions simultaneously
Better Index Performance: Smaller indexes per partition

Partitioning Types in PostgreSQL

Type	Description	Example
Range	Data divided by value ranges	Orders by date quarters
Hash	Data distributed using hash function	Users by ID modulus
List	Data grouped by explicit value lists	Products by category

Project Architecture

simulate_database_partitioning/
├── docker compose.yml          # PostgreSQL container configuration
├── init/
│   └── 01_create_partitioned_table.sql  # Database schema with partitions
├── main.go                     # Go application demonstrating partitioning
├── go.mod                      # Go module dependencies
├── README.md                   # Quick start guide
└── DOCUMENTATION.md            # This file

Docker Compose Setup

File: `docker compose.yml`

services:
  postgres:
    image: postgres:18-bookworm
    container_name: partitioned_postgres
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: partitioned_db
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql
      - ./init:/docker-entrypoint-initdb.d
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

volumes:
  postgres_data:

Line-by-Line Explanation

Line	Code	Explanation
1	`services:`	Defines Docker services (no version needed in modern Compose)
3	`image: postgres:18-bookworm`	Uses PostgreSQL 18 on Debian Bookworm
4	`container_name: partitioned_postgres`	Names the container for easy reference
5-8	`environment:`	Sets PostgreSQL credentials and database name
9-10	`ports: "5432:5432"`	Maps container port 5432 to host port 5432
11-13	`volumes:`	Persists data and mounts init scripts
12	`postgres_data:/var/lib/postgresql`	Named volume at parent dir (PostgreSQL 18+ requirement)
13	`./init:/docker-entrypoint-initdb.d`	Auto-executes SQL files on first startup
14-18	`healthcheck:`	Verifies PostgreSQL is ready to accept connections

Note: PostgreSQL 18+ stores data in version-specific subdirectories (e.g., /var/lib/postgresql/data/18/). The mount point must be /var/lib/postgresql (not /var/lib/postgresql/data) to support pg_upgrade.

SQL Schema Explanation

File: `init/01_create_partitioned_table.sql`

This file creates three partitioned tables demonstrating different strategies.

1. Range Partitioning (Orders Table)

CREATE TABLE orders (
    id SERIAL,
    order_date DATE NOT NULL,
    customer_id INTEGER NOT NULL,
    amount DECIMAL(10, 2) NOT NULL,
    status VARCHAR(50) DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id, order_date)
) PARTITION BY RANGE (order_date);

Line-by-Line Explanation

Line	Code	Explanation
1	`CREATE TABLE orders`	Creates the parent (partitioned) table
2	`id SERIAL`	Auto-incrementing integer ID
3	`order_date DATE NOT NULL`	Partition key - determines which partition stores the row
4-6	`customer_id, amount, status`	Regular columns for order data
7	`created_at TIMESTAMP`	Automatic timestamp on insert
8	`PRIMARY KEY (id, order_date)`	Important: Partition key must be part of primary key
9	`PARTITION BY RANGE (order_date)`	Declares range partitioning on `order_date`

Creating Range Partitions

CREATE TABLE orders_2024_q1 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

Part	Explanation
`orders_2024_q1`	Partition table name (stores Q1 2024 orders)
`PARTITION OF orders`	Links this partition to the parent table
`FOR VALUES FROM ... TO`	Defines the date range (inclusive start, exclusive end)

Range boundaries:

FROM ('2024-01-01') - Includes January 1, 2024
TO ('2024-04-01') - Excludes April 1, 2024 (goes to next partition)

2. Hash Partitioning (Users Table)

CREATE TABLE users (
    id INTEGER NOT NULL,
    username VARCHAR(100) NOT NULL,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id)
) PARTITION BY HASH (id);

Line-by-Line Explanation

Line	Code	Explanation
1	`CREATE TABLE users`	Creates the parent table
2	`id INTEGER NOT NULL`	Partition key - hash is computed from this
3-4	`username, email`	User profile data
5	`PRIMARY KEY (id)`	Primary key (partition key included)
6	`PARTITION BY HASH (id)`	Declares hash partitioning on `id`

Creating Hash Partitions

CREATE TABLE users_p0 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE users_p1 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE users_p2 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE users_p3 PARTITION OF users FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Parameter	Explanation
`MODULUS 4`	Total number of partitions (divides hash space into 4)
`REMAINDER 0-3`	Which partition gets rows where `hash(id) % 4 = remainder`

How it works:

PostgreSQL computes hash(user_id)
Calculates hash(user_id) % 4
Routes row to partition matching the remainder

3. List Partitioning (Products Table)

CREATE TABLE products (
    id SERIAL,
    name VARCHAR(255) NOT NULL,
    category VARCHAR(50) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id, category)
) PARTITION BY LIST (category);

Line-by-Line Explanation

Line	Code	Explanation
1	`CREATE TABLE products`	Creates the parent table
2	`id SERIAL`	Auto-incrementing product ID
3	`category VARCHAR(50) NOT NULL`	Partition key - determines partition by exact value
4-5	`name, price`	Product details
6	`PRIMARY KEY (id, category)`	Partition key must be in primary key
7	`PARTITION BY LIST (category)`	Declares list partitioning on `category`

Creating List Partitions

CREATE TABLE products_electronics PARTITION OF products
    FOR VALUES IN ('electronics', 'computers', 'phones');

CREATE TABLE products_clothing PARTITION OF products
    FOR VALUES IN ('clothing', 'shoes', 'accessories');

CREATE TABLE products_home PARTITION OF products
    FOR VALUES IN ('furniture', 'kitchen', 'garden');

CREATE TABLE products_other PARTITION OF products DEFAULT;

Partition	Categories	Explanation
`products_electronics`	electronics, computers, phones	Tech products
`products_clothing`	clothing, shoes, accessories	Fashion items
`products_home`	furniture, kitchen, garden	Home goods
`products_other`	DEFAULT	Catches any unlisted category

Important: The DEFAULT partition catches rows that don't match any defined list.

Go Application Walkthrough

File: `main.go`

Imports and Constants

import (
    "context"
    "fmt"
    "log"
    "math/rand"
    "time"

    "github.com/jackc/pgx/v5"
    "github.com/jackc/pgx/v5/pgxpool"
)

const (
    dbURL = "postgres://postgres:postgres@localhost:5432/partitioned_db"
)

Import/Constant	Purpose
`context`	Manages request lifecycle and cancellation
`fmt`, `log`	Output and error logging
`math/rand`, `time`	Random data generation
`pgx/v5`	PostgreSQL driver for Go (high-performance)
`pgxpool`	Connection pooling for concurrent database access
`dbURL`	PostgreSQL connection string

Main Function

func main() {
    ctx := context.Background()

    pool, err := pgxpool.New(ctx, dbURL)
    if err != nil {
        log.Fatalf("Unable to connect to database: %v", err)
    }
    defer pool.Close()
    // ... demo functions
}

Line	Code	Explanation
1	`ctx := context.Background()`	Creates root context for database operations
2	`pgxpool.New(ctx, dbURL)`	Creates connection pool to PostgreSQL
3-4	Error handling	Exits if connection fails
5	`defer pool.Close()`	Ensures connections are released on exit

demonstrateRangePartitioning Function

func demonstrateRangePartitioning(ctx context.Context, pool *pgxpool.Pool) {
    orders := []struct {
        orderDate  string
        customerID int
        amount     float64
        status     string
    }{
        {"2024-01-15", 1, 150.00, "completed"},
        // ... more orders
    }

    for _, o := range orders {
        _, err := pool.Exec(ctx,
            "INSERT INTO orders (order_date, customer_id, amount, status) VALUES ($1, $2, $3, $4) ON CONFLICT DO NOTHING",
            o.orderDate, o.customerID, o.amount, o.status)
        // ...
    }
}

Part	Explanation
`orders := []struct{...}`	Anonymous struct slice holding sample order data
`{"2024-01-15", 1, 150.00, "completed"}`	Order from Q1 2024 → goes to `orders_2024_q1` partition
`pool.Exec(ctx, ...)`	Executes INSERT statement
`$1, $2, $3, $4`	Parameterized query (prevents SQL injection)
`ON CONFLICT DO NOTHING`	Ignores duplicate inserts (idempotent)

Partition routing: PostgreSQL automatically routes each order to the correct partition based on order_date.

demonstrateHashPartitioning Function

func demonstrateHashPartitioning(ctx context.Context, pool *pgxpool.Pool) {
    users := []struct {
        id       int
        username string
        email    string
    }{
        {1, "alice", "alice@example.com"},
        // ... more users
    }

    // Insert users
    for _, u := range users {
        _, err := pool.Exec(ctx,
            "INSERT INTO users (id, username, email) VALUES ($1, $2, $3) ON CONFLICT DO NOTHING",
            u.id, u.username, u.email)
    }

    // Show distribution
    partitions := []string{"users_p0", "users_p1", "users_p2", "users_p3"}
    for _, p := range partitions {
        var count int
        err := pool.QueryRow(ctx, fmt.Sprintf("SELECT COUNT(*) FROM %s", p)).Scan(&count)
        fmt.Printf("   %s: %d users\n", p, count)
    }
}

Part	Explanation
`users := []struct{...}`	Sample user data with explicit IDs
`pool.QueryRow(...).Scan(&count)`	Queries each partition directly to show distribution
`SELECT COUNT(*) FROM users_p0`	Counts rows in specific partition

Hash distribution: Users are distributed based on hash(id) % 4. Distribution may not be perfectly even with small datasets.

demonstrateListPartitioning Function

func demonstrateListPartitioning(ctx context.Context, pool *pgxpool.Pool) {
    products := []struct {
        name     string
        category string
        price    float64
    }{
        {"iPhone 15", "phones", 999.00},      // → products_electronics
        {"Nike Shoes", "shoes", 150.00},       // → products_clothing
        {"Dining Table", "furniture", 599.00}, // → products_home
        {"Book: Go Programming", "books", 49.99}, // → products_other (DEFAULT)
    }
    // ...
}

Product	Category	Partition
iPhone 15	phones	`products_electronics`
Nike Shoes	shoes	`products_clothing`
Dining Table	furniture	`products_home`
Book: Go Programming	books	`products_other` (DEFAULT)

showPartitionStats Function

func showPartitionStats(ctx context.Context, pool *pgxpool.Pool) {
    query := `
        SELECT
            parent.relname AS parent_table,
            child.relname AS partition_name,
            pg_size_pretty(pg_relation_size(child.oid)) AS partition_size
        FROM pg_inherits
        JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
        JOIN pg_class child ON pg_inherits.inhrelid = child.oid
        WHERE parent.relname IN ('orders', 'users', 'products')
        ORDER BY parent.relname, child.relname
    `
    // ...
}

SQL Part	Explanation
`pg_inherits`	System catalog storing partition inheritance relationships
`pg_class`	System catalog with table metadata
`parent.relname`	Parent (partitioned) table name
`child.relname`	Child (partition) table name
`pg_size_pretty(pg_relation_size(...))`	Human-readable partition size (e.g., "8192 bytes")

demonstratePartitionPruning Function

func demonstratePartitionPruning(ctx context.Context, pool *pgxpool.Pool) {
    // Batch insert 1000 random orders
    batch := &pgx.Batch{}
    for i := 0; i < 1000; i++ {
        days := rand.Intn(820)
        orderDate := time.Date(2024, 1, 1, 0, 0, 0, 0, time.UTC).AddDate(0, 0, days)
        // ...
        batch.Queue("INSERT INTO orders ...", orderDate, customerID, amount, status)
    }
    br := pool.SendBatch(ctx, batch)
    br.Exec()
    br.Close()

    // Query with partition pruning
    rows, _ := pool.Query(ctx,
        "EXPLAIN SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2024-04-01'")
}

Part	Explanation
`pgx.Batch{}`	Batches multiple queries for efficient execution
`batch.Queue(...)`	Adds query to batch
`pool.SendBatch(ctx, batch)`	Sends all queries in single round-trip
`EXPLAIN SELECT ...`	Shows query execution plan

Partition Pruning Output Example:

Seq Scan on orders_2024_q1 orders  (cost=0.00..15.00 rows=100 width=52)
  Filter: ((order_date >= '2024-01-01') AND (order_date < '2024-04-01'))

Notice: Only orders_2024_q1 is scanned, not all partitions!

Running the Project

Step 1: Start PostgreSQL

docker compose up -d

This:

Pulls postgres:18-bookworm image
Creates container partitioned_postgres
Executes init/01_create_partitioned_table.sql automatically
Creates all partitioned tables

Step 2: Verify Database is Ready

docker compose logs -f postgres

Wait for: database system is ready to accept connections

Step 3: Install Go Dependencies

go mod tidy

Downloads pgx/v5 driver and dependencies.

Step 4: Run the Demo

go run main.go

Expected Output

Connected to PostgreSQL!
=

📊 RANGE PARTITIONING DEMO (Orders by Date)
-
✅ Inserted orders across multiple quarters (2024-2026)

📋 Orders from Q1 2024 (partition: orders_2024_q1):
   ID: 1, Date: 2024-01-15, Customer: 1, Amount: $150.00, Status: completed
   ID: 2, Date: 2024-02-20, Customer: 2, Amount: $250.50, Status: completed

🔢 HASH PARTITIONING DEMO (Users by ID)
-
✅ Inserted users (distributed across 4 hash partitions)

📋 User distribution across hash partitions:
   users_p0: 1 users
   users_p1: 3 users
   users_p2: 1 users
   users_p3: 3 users

📝 LIST PARTITIONING DEMO (Products by Category)
-
✅ Inserted products across category partitions

📋 Product distribution across list partitions:
   Electronics (electronics, computers, phones): 3 products
   Clothing (clothing, shoes, accessories): 2 products
   Home (furniture, kitchen, garden): 3 products
   Other (default partition): 1 products

📈 PARTITION STATISTICS
-
📊 Partition Information:

   Table: orders
      └── orders_2024_q1 (size: 8192 bytes)
      └── orders_2024_q2 (size: 8192 bytes)
      └── orders_2024_q3 (size: 8192 bytes)
      └── orders_2024_q4 (size: 8192 bytes)
      └── orders_2025_q1 (size: 8192 bytes)
      └── orders_2025_q2 (size: 8192 bytes)
      └── orders_2025_q3 (size: 8192 bytes)
      └── orders_2025_q4 (size: 0 bytes)
      └── orders_2026_q1 (size: 8192 bytes)
      └── orders_2026_q2 (size: 0 bytes)

   Table: products
      └── products_clothing (size: 8192 bytes)
      └── products_electronics (size: 8192 bytes)
      └── products_home (size: 8192 bytes)
      └── products_other (size: 8192 bytes)

   Table: users
      └── users_p0 (size: 8192 bytes)
      └── users_p1 (size: 8192 bytes)
      └── users_p2 (size: 8192 bytes)
      └── users_p3 (size: 8192 bytes)

⚡ QUERY PERFORMANCE WITH PARTITION PRUNING
-
📥 Inserting 1000 random orders for performance testing...
✅ Inserted 1000 orders

🔍 Query with partition pruning (Q1 2024 only):
   Found 109 orders in Q1 2024 (took 351.864µs)

📋 EXPLAIN output showing partition pruning:
   Seq Scan on orders_2024_q1 orders  (cost=0.00..16.60 rows=2 width=154)
     Filter: ((order_date >= '2024-01-01'::date) AND (order_date < '2024-04-01'::date))

Step 5: Cleanup

docker compose down -v

Removes container and deletes data volume.

Key Takeaways

Partition key must be in PRIMARY KEY - PostgreSQL requires this for uniqueness enforcement
Partition pruning is automatic - PostgreSQL optimizer skips irrelevant partitions
DEFAULT partitions catch unmatched values - Prevents insert failures for list partitioning
Hash partitioning distributes evenly - Good for load balancing across partitions
Range partitioning is ideal for time-series - Easy to archive old partitions

FilesExpand file tree

DOCUMENTATION.md

Latest commit

History

DOCUMENTATION.md

File metadata and controls

Database Partitioning Documentation

Table of Contents

Overview

What is Database Partitioning?

Benefits

Partitioning Types in PostgreSQL

Project Architecture

Docker Compose Setup

File: docker compose.yml

Line-by-Line Explanation

SQL Schema Explanation

File: init/01_create_partitioned_table.sql

1. Range Partitioning (Orders Table)

Line-by-Line Explanation

Creating Range Partitions

2. Hash Partitioning (Users Table)

Line-by-Line Explanation

Creating Hash Partitions

3. List Partitioning (Products Table)

Line-by-Line Explanation

Creating List Partitions

Go Application Walkthrough

File: main.go

Imports and Constants

Main Function

demonstrateRangePartitioning Function

demonstrateHashPartitioning Function

demonstrateListPartitioning Function

showPartitionStats Function

demonstratePartitionPruning Function

Running the Project

Step 1: Start PostgreSQL

Step 2: Verify Database is Ready

Step 3: Install Go Dependencies

Step 4: Run the Demo

Expected Output

Step 5: Cleanup

Key Takeaways

File: `docker compose.yml`

File: `init/01_create_partitioned_table.sql`

File: `main.go`