diff --git a/docs/databases/database-engineering/acid.md b/docs/databases/database-engineering/acid.md index dd4e4c78..6fff4a6c 100644 --- a/docs/databases/database-engineering/acid.md +++ b/docs/databases/database-engineering/acid.md @@ -452,7 +452,7 @@ Rather than blocking operations to guarantee consistency, BASE systems accept te - **Application-Level Resolution**: Let the application decide how to merge conflicting versions - **CRDTs (Conflict-free Replicated Data Types)**: Data structures that guarantee convergence without coordination -**Example: Adding an item to a set in a distributed system** +#### Example: Adding an item to a set in a distributed system - Node A adds item X - Node B adds item Y diff --git a/docs/databases/database-engineering/locks.md b/docs/databases/database-engineering/locks.md index 465530e0..bab4290b 100644 --- a/docs/databases/database-engineering/locks.md +++ b/docs/databases/database-engineering/locks.md @@ -2,253 +2,522 @@ sidebar_position: 5 --- -# Database Locking +# Locks -Database locking is a mechanism that ensures **data consistency** and **integrity** when multiple transactions access the database concurrently. Locks prevent conflicting operations from being executed simultaneously, ensuring that operations comply with ACID (Atomicity, Consistency, Isolation, Durability) principles. +:::tip[Status] -## What Do Shared and Exclusive Locks Mean? +This note is complete, reviewed, and considered stable. -### Shared Lock (S-Lock) +::: -- A shared lock allows **multiple transactions to read** the same data concurrently but prevents any transaction from **modifying** the data until all shared locks are released. -- Shared locks ensure **read consistency** but block write operations. +In multi-user database systems, multiple transactions execute concurrently to maximize throughput and resource utilization. Locks exist to **coordinate concurrent access to shared data** so that correctness is preserved. Without locks, concurrent reads and writes could corrupt data or expose inconsistent intermediate states. -**Example:** +Locks primarily protect against the following anomalies: -```sql -BEGIN; -SELECT * FROM employees WHERE id = 1 FOR SHARE; --- This transaction can read the row, but no other transaction can modify it. -COMMIT; -``` +- **Lost updates** – one transaction overwrites another’s changes +- **Dirty reads** – reading uncommitted data +- **Non-repeatable reads** – re-reading yields different results +- **Phantom reads** – new rows appear between reads -### Exclusive Lock (X-Lock) +Conceptually, a lock is a **contract** between a transaction and the database engine that grants controlled access to a data item. -- An exclusive lock allows **only one transaction to modify** the data. -- It blocks all other transactions, including both **read** and **write**, until the lock is released. -- Exclusive locks are critical for maintaining data integrity during update operations. +## Concurrency Control Overview -**Example:** +Concurrency control ensures that the outcome of concurrent execution is equivalent to some serial execution (serializability). -```sql -BEGIN; -UPDATE employees SET salary = salary + 500 WHERE id = 1; --- This transaction locks the row, preventing any other transaction from reading or writing to it. -COMMIT; -``` +Two dominant approaches: -## Lock Types in PostgreSQL +1. **Lock-based concurrency control** -PostgreSQL provides various locks to handle concurrent access effectively. Key lock types include: + - Transactions acquire locks before accessing data + - Conflicts are resolved by blocking or aborting transactions -### Row-Level Locks +2. **Optimistic / MVCC-based concurrency control** -- Lock individual rows to reduce contention and maximize concurrency. -- Used in operations like `SELECT ... FOR SHARE` or `SELECT ... FOR UPDATE`. -- `FOR SHARE` is a shared lock, so reads are alloWed. -- `FOR UPDATE` is exclusive lock, so reads and write are **NOT** alloWed. + - Readers do not block writers + - Conflicts are detected at commit time -**Example:** +Locks are still essential even in MVCC systems for: -```sql -BEGIN; -SELECT * FROM employees WHERE id = 1 FOR UPDATE; --- Locks only the row with id = 1, allowing other rows to be accessed concurrently. -COMMIT; -``` +- Writes +- Schema changes +- Certain isolation guarantees -### Table-Level Locks +## ACID Properties Relation -- Lock the entire table for operations affecting all rows or for schema modifications. -- Example: Acquired during `TRUNCATE` or `ALTER TABLE`. +Locks are primarily tied to **Isolation** and **Consistency**, but they indirectly support all ACID properties. -**Example:** +| ACID Property | Role of Locks | +| ------------- | -------------------------------------------- | +| Atomicity | Prevents partial visibility of changes | +| Consistency | Enforces integrity during concurrent updates | +| Isolation | Core purpose of locks | +| Durability | Locks ensure committed state is well-defined | -```sql -BEGIN; -LOCK TABLE employees IN ACCESS EXCLUSIVE MODE; --- Blocks all access to the table until the transaction is complete. -COMMIT; -``` +Isolation levels (Read Committed, Repeatable Read, Serializable) determine **how aggressively locks are used**. -### Advisory Locks +## Transactions and Lock Scope -- Custom, application-controlled locks that allow developers to implement business-specific locking logic. -- Advisory locks are not enforced by the database engine. +Locks are scoped to: -**Example: Complete Usage of Advisory Locks:** +- A **transaction** (released on commit/rollback) +- A **statement** (statement-level locks) -Imagine a scenario where multiple workers process tasks from a shared `tasks` table. Each worker should only process a task that is not being handled by another worker. +Scope dimensions: -```sql --- Worker 1 -BEGIN; +- Object scope: row, page, table, database +- Time scope: short-lived vs long-lived --- Try to acquire an advisory lock on the task ID (e.g., ID = 101) -SELECT pg_try_advisory_lock(101) AS lock_acquired; +Long-running transactions dramatically increase lock contention and risk of blocking. --- Check if the lock was acquired --- If lock_acquired is true, process the task -UPDATE tasks -SET status = 'in_progress' -WHERE id = 101 AND status = 'pending'; +## Lock Granularity --- Task processing logic here... +Lock granularity defines **how much data a lock protects**. --- Release the advisory lock after processing -SELECT pg_advisory_unlock(101); +### Database Level -COMMIT; +- Locks the entire database +- Rare, usually for maintenance operations +- Highest contention, lowest overhead + +Use cases: + +- Backup +- Restore +- Global configuration changes + +### Table Level + +- Locks the entire table +- Common for DDL and bulk operations + +Pros: + +- Simple +- Low lock manager overhead + +Cons: + +- Poor concurrency + +### Page Level + +- Locks a fixed-size block (page) of data +- Balance between concurrency and overhead + +Often used internally by storage engines where row-level locks are too expensive. + +### Row Level + +- Locks individual rows +- Highest concurrency +- Highest lock bookkeeping cost + +Used heavily in OLTP systems. + +#### Granularity Hierarchy + +
+ +```mermaid +graph TD + DB[Database Lock] + T[Table Lock] + P[Page Lock] + R[Row Lock] + + DB --> T + T --> P + P --> R ``` -**Explanation:** +
-- `pg_try_advisory_lock` tries to acquire a lock without blocking. If the lock is already held, it does not wait. -- Once the lock is acquired, the worker updates the task's status and begins processing. -- After completing the task, the worker releases the advisory lock with `pg_advisory_unlock`. +## Lock Types and Modes -This ensures that no two workers process the same task simultaneously. +### Shared Locks (S) -## Deadlocks +- Allows multiple readers +- Prevents writers -A **deadlock** occurs when two or more transactions block each other by holding locks and waiting for resources locked by the other transactions. PostgreSQL automatically detects deadlocks and resolves them by aborting one of the transactions. +Used for SELECT queries under stronger isolation levels. **Example:** ```sql -- Transaction 1 -BEGIN; -UPDATE employees SET salary = salary + 500 WHERE id = 1; +BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; +SELECT * FROM users WHERE id = 1; -- Acquires S lock on row -- Transaction 2 -BEGIN; -UPDATE employees SET salary = salary + 500 WHERE id = 2; +BEGIN TRANSACTION; +SELECT * FROM users WHERE id = 1; -- Can acquire S lock (compatible) +UPDATE users SET name = 'Jane' WHERE id = 1; -- Blocked (X lock incompatible with S) +``` --- Transaction 1 tries to lock row 2 -UPDATE employees SET salary = salary + 500 WHERE id = 2; +### Exclusive Locks (X) --- Transaction 2 tries to lock row 1, causing a deadlock -UPDATE employees SET salary = salary + 500 WHERE id = 1; +- Allows a single writer +- Blocks all other access + +Required for UPDATE, DELETE, INSERT. + +**Example:** + +```sql +-- Transaction 1 +BEGIN TRANSACTION; +UPDATE users SET name = 'John' WHERE id = 1; -- Acquires X lock on row + +-- Transaction 2 (blocked) +BEGIN TRANSACTION; +SELECT * FROM users WHERE id = 1; -- Blocked waiting for X lock release +UPDATE users SET age = 30 WHERE id = 1; -- Blocked (X locks are mutually exclusive) ``` -In this case, PostgreSQL will abort one of the transactions to resolve the deadlock. +### Intent Locks -## How Databases Handle Locked Rows +Intent locks signal **future locking intentions** at a finer granularity. -When a transaction attempts to access a row that is already locked by another transaction, databases use different strategies to handle the conflict: +Examples: -### Blocking Until the Lock Is Released +- IS (Intent Shared) – indicates that shared locks will be acquired on child objects +- IX (Intent Exclusive) – indicates that exclusive locks will be acquired on child objects -- By default, PostgreSQL waits for the lock to be released. -- The waiting transaction is blocked but remains in the queue to acquire the lock. -- We can also have a timeout to fail the transaction after a certain period. -- For timeout, We can use the `lock_timeout` to set timeout. +They enable efficient multi-granularity locking without requiring the database to check every child object. **Example:** ```sql -BEGIN; -SET lock_timeout = '5s'; -- Set the lock acquisition timeout to 5 seconds. +-- Transaction holding multiple row locks +BEGIN TRANSACTION; +LOCK TABLE users IN INTENT EXCLUSIVE MODE; -- Table gets IX lock +UPDATE users SET status = 'active' WHERE id IN (1, 2, 3); -- Row-level X locks acquired +``` -SELECT * FROM employees WHERE id = 1 FOR UPDATE; --- Another transaction attempting the same lock will wait until this transaction is committed or rolled back. -COMMIT; +
+ +```mermaid +graph LR + T[Table] + R1[Row 1] + R2[Row 2] + + T -->|IX| R1 + T -->|IX| R2 ``` -### NOWAIT +
+ +### Update Locks (U) + +Used to avoid deadlocks during read-modify-write cycles. + +Behavior: + +- Initially behaves like Shared (allows multiple readers) +- Converts to Exclusive when the update happens -- If the row is locked, the transaction **fails immediately** with an error instead of waiting for the lock to be released. -- Useful for applications where blocking is unacceptable. +Common in SQL Server-style engines. **Example:** ```sql -BEGIN; -SELECT * FROM employees WHERE id = 1 FOR UPDATE NOWAIT; --- If the row is already locked, this query fails with an error. +-- SQL Server: read-modify-write with U lock +BEGIN TRANSACTION; +SELECT * FROM accounts WHERE id = 1 WITH (UPDLOCK); -- Acquires U lock +-- Other transactions can read but cannot acquire U or X locks +UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- Upgrades U to X COMMIT; ``` -### SKIP LOCKED +### Schema Locks -- If a row is locked, the query **skips the locked rows** and processes only the unlocked rows. -- Useful for task queues where workers can skip locked tasks and process available ones. +Protect database metadata. -**Example:** +Types: -```sql -BEGIN; -SELECT * FROM tasks WHERE status = 'pending' FOR UPDATE SKIP LOCKED; --- Processes only unlocked rows, ignoring locked rows. -COMMIT; +- Schema Stability (allows queries) +- Schema Modification (blocks everything) + +DDL statements rely heavily on schema locks. + +## Lock Compatibility + +Two locks are compatible if they can be held simultaneously on the same data object by different transactions. + +### Compatibility Matrix + +| Requested \ Held | S (Shared) | X (Exclusive) | IS (Intent Shared) | IX (Intent Exclusive) | +| --------------------- | ---------- | ------------- | ------------------ | --------------------- | +| S (Shared) | ✓ | ✗ | ✓ | ✗ | +| X (Exclusive) | ✗ | ✗ | ✗ | ✗ | +| IS (Intent Shared) | ✓ | ✗ | ✓ | ✓ | +| IX (Intent Exclusive) | ✗ | ✗ | ✓ | ✓ | + +### Lock Acquisition Rules + +- Stronger locks cannot be granted if weaker incompatible locks exist +- Upgrades must re-check compatibility against all existing locks + +Lock upgrades are a frequent source of deadlocks because they can create wait-for cycles. + +### Multi-Granularity Locking + +Transactions lock higher-level objects with intent locks before locking finer objects. + +
+ +```mermaid +sequenceDiagram + participant Tx as Transaction + participant T as Table + participant R as Row + + Tx->>T: Acquire IX + Tx->>R: Acquire X ``` -## Transaction Isolation Levels and Locking +
-PostgreSQL supports four standard isolation levels that define how transactions interact with locks: +## Acquisition Strategies -### Read Uncommitted +### Two-Phase Locking (2PL) -- No locking; allows dirty reads. -- Rarely used in PostgreSQL. +Phases: -### Read Committed +1. **Growing phase** – acquire locks, no releases allowed +2. **Shrinking phase** – release locks, no acquisitions allowed -- Ensures no dirty reads by acquiring shared or exclusive locks as needed. -- Default isolation level in PostgreSQL. +Guarantees serializability when strictly enforced. -**Example:** +**Strict 2PL** releases locks only at transaction commit or rollback, ensuring no uncommitted changes are visible. -```sql -BEGIN; -UPDATE employees SET salary = salary + 500 WHERE id = 1; --- Other transactions cannot read or modify the locked row until committed. -COMMIT; +### Three-Phase Locking (3PL) + +Adds an intermediate phase to avoid blocking anomalies. + +Rarely used in real systems due to complexity. + +### Lock Escalation + +Automatic promotion of many fine-grained locks into a coarser lock. + +Trade-off: + +- Reduced overhead +- Reduced concurrency + +### Deadlock Prevention + +Common strategies: + +- **Timeout-based abort** – abort if wait exceeds threshold +- **Wait-die** – older transactions wait for newer ones; younger transactions die and retry +- **Wound-wait** – older transactions preempt younger ones; younger transactions wait + +Each strategy balances fairness, throughput, and restart overhead differently. + +## Lock Management + +### Acquisition and Release + +Locks are: + +- Acquired on demand +- Released at commit/rollback or earlier + +Incorrect release timing breaks isolation. + +### Lock Timeouts + +Transactions waiting beyond a configured threshold are automatically aborted. + +Prevents infinite waits but may abort valid long-running transactions under contention. + +### Lock Waiting Queues + +Blocked transactions wait in queues per lock object. + +
+ +```mermaid +graph TD + L[Lock] + T1[Tx1 Holding] + T2[Tx2 Waiting] + T3[Tx3 Waiting] + + T1 --> L + T2 --> L + T3 --> L ``` -### Repeatable Read +
-- Prevents non-repeatable reads by locking all rows read during a transaction. -- Ensures consistent results for all queries within the transaction. +### Lock Monitoring Queries -**Example:** +Databases expose system views for us to inspect: -```sql -BEGIN ISOLATION LEVEL REPEATABLE READ; -SELECT * FROM employees WHERE department = 'Sales'; --- Ensures no other transaction can modify these rows until committed. -COMMIT; +- Current locks held +- Waiting transactions +- Blocking chains + +Essential for diagnosing and debugging production contention issues. + +## Problems and Solutions + +### Blocking Transactions + +Occurs when incompatible locks collide and one transaction must wait. + +Mitigation strategies: + +- Keep transactions short to minimize lock hold times +- Use proper indexing to reduce the number of rows accessed +- Optimize query execution plans + +### Deadlocks (Detection/Resolution) + +Circular wait condition. + +
+ +```mermaid +graph LR + T1 -->|waits| T2 + T2 -->|waits| T1 ``` -### Serializable +
-- The strictest isolation level, ensuring transactions appear to execute serially. -- May block or fail transactions to maintain serializability. +Resolved by aborting one participant. -## Best Practices +### Livelocks -### Keep Transactions Short +Transactions repeatedly abort and retry without progress, wasting resources. -- Minimize transaction duration to reduce lock contention. +Solved via exponential backoff or priority-based scheduling adjustments. -### Use Appropriate Isolation Levels +### Starvation -- Choose the least restrictive isolation level that satisfies Our requirements. +Low-priority transactions never acquire locks. -### Indexing +Solved via fairness policies. -- Proper indexing reduces the number of rows locked during queries, improving concurrency. +## PostgreSQL Implementation -### Handle Deadlocks Gracefully +### MVCC Integration -- Design transactions to access resources in a consistent order to avoid deadlocks. +PostgreSQL uses MVCC, so: -### Monitor Locks +- Reads do not block writes +- Writes still require locks -- Use PostgreSQL system views like `pg_locks` to monitor and troubleshoot locks. +Locks coordinate visibility and structural safety. -**Example:** +### Lock Modes (AccessShare to AccessExclusive) + +Ordered from weakest to strongest: + +- AccessShare (SELECT) +- RowShare +- RowExclusive +- ShareUpdateExclusive +- Share +- ShareRowExclusive +- Exclusive +- AccessExclusive (DDL) + +### pg_locks System View + +Provides visibility into: + +- Lock type +- Lock mode +- Granted vs waiting + +Critical for diagnosing contention. + +### Explicit Locking Commands ```sql -SELECT * FROM pg_locks; +LOCK TABLE users IN ACCESS EXCLUSIVE MODE; ``` + +Used sparingly for critical sections. + +## PostgreSQL Internals + +### LockManager Architecture + +Centralized lock manager per instance. + +### Hash Table Storage + +Locks stored in shared memory hash tables keyed by object ID. + +### Lightweight Locks + +Protect internal data structures. + +- Very fast +- Not user-visible + +### Advisory Locks + +Application-defined locks. + +- Not tied to table rows +- Used for coordination + +### Wait Graph Analysis + +Deadlock detector builds wait-for graphs periodically. + +### Backend Lock Handling + +Each backend process: + +- Requests locks +- Sleeps when blocked +- Wakes on release + +## Advanced Topics + +### Predicate vs Key-Range Locks + +Predicate locks protect logical conditions (e.g., "salary > 100000"). +Key-range locks protect physical index ranges and prevent phantom reads. + +Serializable isolation levels rely on these mechanisms. + +### Lock-Free Alternatives + +Optimistic concurrency control avoids locks by detecting conflicts at commit time. + +Most effective when conflicts are rare and transaction throughput is a priority. + +### Distributed Locks + +Required in distributed systems. + +Challenges: + +- Clock skew +- Network partitions + +Examples: + +- ZooKeeper +- etcd +- Redis Redlock + +### Performance Tuning + +Best practices: + +- Keep transactions short +- Index properly +- Avoid unnecessary explicit locks +- Monitor lock contention continuously