Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 33 additions & 33 deletions .cursor-plugin/plugin.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,38 +22,38 @@
"multi-region"
],
"skills": [
"./skills/application-development/benchmarking-transaction-patterns",
"./skills/application-development/designing-application-transactions",
"./skills/application-development/designing-multi-region-applications",
"./skills/observability-and-diagnostics/analyzing-range-distribution",
"./skills/observability-and-diagnostics/analyzing-schema-change-storage-risk",
"./skills/observability-and-diagnostics/auditing-table-statistics",
"./skills/observability-and-diagnostics/monitoring-background-jobs",
"./skills/observability-and-diagnostics/profiling-statement-fingerprints",
"./skills/observability-and-diagnostics/profiling-transaction-fingerprints",
"./skills/observability-and-diagnostics/triaging-live-sql-activity",
"./skills/onboarding-and-migrations/molt-fetch",
"./skills/onboarding-and-migrations/molt-replicator",
"./skills/onboarding-and-migrations/molt-verify",
"./skills/onboarding-and-migrations/setting-up-local-cluster",
"./skills/operations-and-lifecycle/managing-certificates-and-encryption",
"./skills/operations-and-lifecycle/managing-cluster-capacity",
"./skills/operations-and-lifecycle/managing-cluster-settings",
"./skills/operations-and-lifecycle/performing-cluster-maintenance",
"./skills/operations-and-lifecycle/provisioning-cluster-for-production",
"./skills/operations-and-lifecycle/reviewing-cluster-health",
"./skills/operations-and-lifecycle/upgrading-cluster-version",
"./skills/query-and-schema-design/cockroachdb-sql",
"./skills/security-and-governance/auditing-cloud-cluster-security",
"./skills/security-and-governance/configuring-audit-logging",
"./skills/security-and-governance/configuring-ip-allowlists",
"./skills/security-and-governance/configuring-log-export",
"./skills/security-and-governance/configuring-private-connectivity",
"./skills/security-and-governance/configuring-sso-and-scim",
"./skills/security-and-governance/enabling-cmek-encryption",
"./skills/security-and-governance/enforcing-password-policies",
"./skills/security-and-governance/hardening-user-privileges",
"./skills/security-and-governance/managing-tls-certificates",
"./skills/security-and-governance/preparing-compliance-documentation"
"./skills/cockroachdb-application-development/benchmarking-transaction-patterns",
"./skills/cockroachdb-application-development/designing-application-transactions",
"./skills/cockroachdb-application-development/designing-multi-region-applications",
"./skills/cockroachdb-observability-and-diagnostics/analyzing-range-distribution",
"./skills/cockroachdb-observability-and-diagnostics/analyzing-schema-change-storage-risk",
"./skills/cockroachdb-observability-and-diagnostics/auditing-table-statistics",
"./skills/cockroachdb-observability-and-diagnostics/monitoring-background-jobs",
"./skills/cockroachdb-observability-and-diagnostics/profiling-statement-fingerprints",
"./skills/cockroachdb-observability-and-diagnostics/profiling-transaction-fingerprints",
"./skills/cockroachdb-observability-and-diagnostics/triaging-live-sql-activity",
"./skills/cockroachdb-onboarding-and-migrations/molt-fetch",
"./skills/cockroachdb-onboarding-and-migrations/molt-replicator",
"./skills/cockroachdb-onboarding-and-migrations/molt-verify",
"./skills/cockroachdb-onboarding-and-migrations/setting-up-local-cluster",
"./skills/cockroachdb-operations-and-lifecycle/managing-certificates-and-encryption",
"./skills/cockroachdb-operations-and-lifecycle/managing-cluster-capacity",
"./skills/cockroachdb-operations-and-lifecycle/managing-cluster-settings",
"./skills/cockroachdb-operations-and-lifecycle/performing-cluster-maintenance",
"./skills/cockroachdb-operations-and-lifecycle/provisioning-cluster-for-production",
"./skills/cockroachdb-operations-and-lifecycle/reviewing-cluster-health",
"./skills/cockroachdb-operations-and-lifecycle/upgrading-cluster-version",
"./skills/cockroachdb-query-and-schema-design/cockroachdb-sql",
"./skills/cockroachdb-security-and-governance/auditing-cloud-cluster-security",
"./skills/cockroachdb-security-and-governance/configuring-audit-logging",
"./skills/cockroachdb-security-and-governance/configuring-ip-allowlists",
"./skills/cockroachdb-security-and-governance/configuring-log-export",
"./skills/cockroachdb-security-and-governance/configuring-private-connectivity",
"./skills/cockroachdb-security-and-governance/configuring-sso-and-scim",
"./skills/cockroachdb-security-and-governance/enabling-cmek-encryption",
"./skills/cockroachdb-security-and-governance/enforcing-password-policies",
"./skills/cockroachdb-security-and-governance/hardening-user-privileges",
"./skills/cockroachdb-security-and-governance/managing-tls-certificates",
"./skills/cockroachdb-security-and-governance/preparing-compliance-documentation"
]
}
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ metadata:

Guides users through benchmarking, explaining, and comparing two formulations of the same transactional business workflow in CockroachDB: explicit multi-statement transactions versus single-statement CTE transactions. Focuses on performance under contention, fair test methodology, and result interpretation.

**Complement to design skills:** For general transaction design principles, see [designing-application-transactions](../designing-application-transactions/SKILL.md). For SQL syntax and query patterns, see [cockroachdb-sql](../../query-and-schema-design/cockroachdb-sql/SKILL.md).
**Complement to design skills:** For general transaction design principles, see [designing-application-transactions](../designing-application-transactions/SKILL.md). For SQL syntax and query patterns, see [cockroachdb-sql](../../cockroachdb-query-and-schema-design/cockroachdb-sql/SKILL.md).

## Core Concept

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ metadata:

Guides application developers through the design principles and implementation patterns needed to build correct, performant, and resilient applications on CockroachDB. Covers the full spectrum from transaction scoping and retry logic to connection pooling and observability.

**Complement to SQL skills:** For SQL syntax, schema design, and query optimization, see [cockroachdb-sql](../../query-and-schema-design/cockroachdb-sql/SKILL.md). For benchmarking transaction formulations under contention, see [benchmarking-transaction-patterns](../benchmarking-transaction-patterns/SKILL.md).
**Complement to SQL skills:** For SQL syntax, schema design, and query optimization, see [cockroachdb-sql](../../cockroachdb-query-and-schema-design/cockroachdb-sql/SKILL.md). For benchmarking transaction formulations under contention, see [benchmarking-transaction-patterns](../benchmarking-transaction-patterns/SKILL.md).

## When to Use This Skill

Expand Down Expand Up @@ -292,16 +292,21 @@ WHERE u.id = incoming.id;
```sql
DELETE FROM sessions
WHERE expires_at < now()
ORDER BY expires_at
LIMIT 10000;
```

`ORDER BY` keeps the batch deterministic so successive runs make forward progress; without it, CockroachDB may pick a different subset each iteration.

**JDBC batching (Java):** Use `addBatch`/`executeBatch` instead of per-row `executeUpdate`. This sends all rows in a single network round trip rather than N individual round trips, eliminating idle time that can account for ~50% of transaction latency in chatty workloads.

**Declarative TTL:**

```sql
-- created_at must be TIMESTAMPTZ; the expression's result type must also be TIMESTAMPTZ.
-- Cast if the source column is plain TIMESTAMP.
ALTER TABLE events
SET (ttl_expiration_expression = 'created_at + INTERVAL ''7 DAY''');
SET (ttl_expiration_expression = '(created_at + INTERVAL ''7 DAY'')::TIMESTAMPTZ');
```

### 8. Use Follower Reads for Non-Critical Queries
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ metadata:

Guides developers through selecting the right multi-region pattern for their CockroachDB application and implementing it with proper validation. Covers the decision model for choosing between regular regional tables, `REGIONAL BY ROW`, `GLOBAL` tables, and manual geo-partitioning, plus a hands-on demo framework for comparing approaches.

**Complement to other skills:** For transaction design patterns, see [designing-application-transactions](../designing-application-transactions/SKILL.md). For SQL syntax and schema design, see [cockroachdb-sql](../../query-and-schema-design/cockroachdb-sql/SKILL.md).
**Complement to other skills:** For transaction design patterns, see [designing-application-transactions](../designing-application-transactions/SKILL.md). For SQL syntax and schema design, see [cockroachdb-sql](../../cockroachdb-query-and-schema-design/cockroachdb-sql/SKILL.md).

## When to Use This Skill

Expand All @@ -29,7 +29,13 @@ Guides developers through selecting the right multi-region pattern for their Coc
## Prerequisites

- Understanding of CockroachDB range architecture and leaseholder concepts
- Multi-region cluster or `cockroach demo` with locality flags for testing
- A **multi-region cluster** with nodes started using `--locality=region=...,zone=...` matching the regions used in the examples below. Without matching localities the DDL errors with `region "..." does not exist` and constraints like `+region=...` match no nodes. Quickest path locally:
```bash
# 9-node demo with three regions, three AZs each — note --no-example-database
cockroach demo --no-example-database --nodes=9 \
--demo-locality=region=NA-NE,az=1:region=NA-NE,az=2:region=NA-NE,az=3:region=NA-MW,az=1:region=NA-MW,az=2:region=NA-MW,az=3:region=EU-DE,az=1:region=EU-DE,az=2:region=EU-DE,az=3
```
For long-running clusters, see [setting-up-local-cluster](../../cockroachdb-onboarding-and-migrations/setting-up-local-cluster/SKILL.md) and add `--locality=region=...,zone=...` to each `cockroach start` invocation.
- Knowledge of application write patterns (single-region vs multi-region)

## Pattern Selection
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,12 @@ Analyzes CockroachDB range distribution, leaseholder placement, and zone configu

- SQL connection to CockroachDB cluster
- Admin role OR `ZONECONFIG` system privilege
- Understanding of CockroachDB range architecture (64MB default max size)
- Understanding of CockroachDB range architecture (default 512MB max size; verify with `SHOW ZONE CONFIGURATION FOR RANGE default`)
- Knowledge of cluster topology (node IDs, regions, availability zones)

**Check your privileges:**
```sql
SHOW GRANTS ON SYSTEM FOR current_user; -- Should show admin or ZONECONFIG
SHOW SYSTEM GRANTS FOR <username>; -- Should show admin or ZONECONFIG
```

See [permissions reference](references/permissions.md) for RBAC setup.
Expand All @@ -43,11 +43,11 @@ See [permissions reference](references/permissions.md) for RBAC setup.

### Ranges: Units of Data Distribution

**Range:** Contiguous key space segment (default 64MB max size, configurable via zone config `range_max_bytes`)
**Range:** Contiguous key space segment (default 512MB max size, configurable via zone config `range_max_bytes`)
**Raft group:** Each range replicated across nodes (default 3 replicas)
**Leaseholder:** Single replica handling reads and coordinating writes for a range

**Critical:** Ranges split automatically at 64MB by default, but can fragment further due to load-based splitting during high write traffic.
**Critical:** Ranges split automatically at `range_max_bytes` (default 512MB), but can fragment further due to load-based splitting during high write traffic.

### Leaseholders and Hotspots

Expand All @@ -61,7 +61,7 @@ See [permissions reference](references/permissions.md) for RBAC setup.
**Causes:** High write throughput, sequential inserts (timestamp-based primary keys), load-based splitting
**Symptoms:** High range count relative to data size, increased latency from Raft overhead

**Fragmentation metric:** Ranges per GB (healthy: 1-15, fragmented: 50+)
**Fragmentation metric:** Ranges per GB. With the 512MB default `range_max_bytes`, a fully-grown range covers 0.5 GB — so ~2 ranges/GB is the natural floor. Anything well above that (e.g., 10+ ranges/GB) suggests load-based splits or many small ranges; tune to your workload.

### Zone Configurations

Expand Down Expand Up @@ -113,7 +113,7 @@ ORDER BY (span_stats->>'approximate_disk_bytes')::INT DESC
LIMIT 50;
```

**Interpretation:** Large ranges (>64MB) indicate split lag; many small ranges (<10MB) indicate fragmentation.
**Interpretation:** Ranges close to or above `range_max_bytes` (default 512MB) indicate split lag; many small ranges (<10MB) indicate fragmentation.

**CRITICAL:** Always include `LIMIT` and target specific tables. Never run `SHOW RANGES WITH DETAILS` on entire database.

Expand Down Expand Up @@ -239,7 +239,7 @@ ALTER TABLE hot_table CONFIGURE ZONE USING lease_preferences = '[[+region=us-wes
**Steps:**
1. **Review intended configs:** Run Query 5 (SHOW ZONE CONFIGURATIONS)
2. **Check actual replica placement:** Run Query 4 on critical tables, inspect `replicas` array for node IDs
3. **Map node IDs to regions:** Cross-reference with `SHOW REGIONS` or `crdb_internal.gossip_nodes`
3. **Map node IDs to regions:** Use `SHOW REGIONS` (cluster-wide) or read the `locality` column of `cockroach node status`
4. **Identify mismatches:** Ranges not matching constraints indicate rebalancing in progress or misconfiguration

**Example:**
Expand All @@ -250,8 +250,10 @@ SHOW ZONE CONFIGURATION FOR TABLE multi_region_table;
-- Check replica placement
SELECT range_id, replicas FROM [SHOW RANGES FROM TABLE multi_region_table] LIMIT 20;

-- Map node IDs to regions
SELECT node_id, locality FROM crdb_internal.gossip_nodes;
-- Map node IDs to regions (cluster-level view)
SHOW REGIONS;
-- For per-node locality strings, use the CLI:
-- cockroach node status --certs-dir=<certs-dir> --host=<any-live-node>
```

### Workflow 3: Fragmentation Diagnosis
Expand All @@ -264,7 +266,7 @@ SELECT node_id, locality FROM crdb_internal.gossip_nodes;
3. **Determine if expected:** Fragmentation may be intentional for load distribution
4. **Remediate if excessive:** Increase `range_max_bytes` (with caution - larger ranges = slower splits), or investigate reducing write hotspots

**CRITICAL:** Never increase `range_max_bytes` above 512MB without understanding impact on split/rebalance performance.
**CRITICAL:** `range_max_bytes` defaults to 512MB. Raising it further without understanding the impact on split/rebalance performance is risky.

## Safety Considerations

Expand Down Expand Up @@ -320,10 +322,10 @@ See [permissions reference](references/permissions.md) for granting minimal priv
- **DETAILS option:** Expensive operation - always use with LIMIT and targeted scope
- **Fragmentation is sometimes intentional:** Load-based splitting improves concurrency
- **Leaseholder concentration:** Check zone configs (lease_preferences) before assuming hotspot
- **Range size target:** Default 64MB max (not 512MB as in older versions)
- **Range size target:** Default `range_max_bytes` is 512MB (verify with `SHOW ZONE CONFIGURATION FOR RANGE default`)
- **Replication lag:** Range placement may not immediately reflect zone config changes (rebalancing takes time)
- **Cross-reference queries:** Combine range analysis with zone configs for complete picture
- **Node mapping:** Use `crdb_internal.gossip_nodes` to map node IDs to regions/zones
- **Node mapping:** Use `SHOW REGIONS` for cluster-level locality, or `cockroach node status` for per-node locality

## References

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,12 @@ foreground writes on the affected store may already be unhealthy.
The minimum free space across stores is what bounds the schema change, not the
total cluster free space (replicas are spread across nodes).

```sql
SELECT
node_id,
store_id,
ROUND((capacity - used) / 1073741824.0, 2) AS free_gb,
ROUND((used::FLOAT / capacity) * 100, 2) AS used_pct
FROM crdb_internal.kv_store_status
ORDER BY free_gb ASC;
No production-safe SQL view exposes per-store capacity. Use the DB Console
**Overview** → **Storage** page (sorts per-store usage), or scrape the
per-node Prometheus endpoint and look at the smallest `capacity_available`:

```bash
curl -ks https://<node>:8080/_status/vars | grep -E '^capacity( |_used|_available)'
```

### Step 2 — Estimate the affected table/index size
Expand Down Expand Up @@ -134,13 +132,16 @@ indexes, expand storage) before issuing the DDL.
`InsufficientSpaceError`, free disk on the affected store and resume the
paused schema change job. Check with:
```sql
WITH j AS (SHOW JOBS)
SELECT job_id, status, error
FROM crdb_internal.jobs
FROM j
WHERE job_type = 'SCHEMA CHANGE' AND status = 'paused';
```
- **Drop unused indexes first.** Often the cheapest way to free headroom
before a large backfill is to drop indexes that
`crdb_internal.index_usage_statistics` shows are unused.
`crdb_internal.index_usage_statistics` shows are unused (this is one of the
12 production-safe `crdb_internal` views, per the
[docs](https://www.cockroachlabs.com/docs/stable/crdb-internal)).
- **Statistics lag.** `range_size_mb` is approximate and can lag actual disk
usage; treat estimates as conservative ballparks, not exact figures.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,15 +52,15 @@ See [triaging-live-sql-activity permissions reference](../triaging-live-sql-acti
### Time-Series Bucketing

**aggregated_ts:** Hourly UTC buckets (e.g., `2026-02-21 14:00:00` = 14:00-14:59 executions)
**Data retention:** Default ~7 days (check `sql.stats.persisted_rows.max`)
**Data retention:** Capped by row count, not time. `sql.stats.persisted_rows.max` (default 1,000,000) bounds the persisted statement+transaction rows; older buckets are compacted once the cap is reached. Effective wall-clock window depends on workload diversity.
**Best practice:** Always filter by time window: `WHERE aggregated_ts > now() - INTERVAL '24 hours'`

### Aggregated vs Sampled Metrics

| Metric Category | JSON Path | Scope | Use Case |
|-----------------|-----------|-------|----------|
| **Aggregated** | `statistics.statistics.*` | All executions | Latency, row counts, execution counts |
| **Sampled** | `statistics.execution_statistics.*` | ~10% sample | CPU, contention, admission wait, memory/disk |
| **Sampled** | `statistics.execution_statistics.*` | Probabilistic sample governed by `sql.txn_stats.sample_rate` (default 0.01) | CPU, contention, admission wait, memory/disk |

**Critical:** Always check sampled metrics presence: `WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULL`

Expand Down Expand Up @@ -307,7 +307,7 @@ LIMIT 20;
- **Privacy:** Use VIEWACTIVITYREDACTED in production
- **Performance:** Always include time filters and LIMIT
- **Complement to live triage:** Use together for complete coverage (historical + real-time)
- **Data retention:** Default ~7 days; verify with `sql.stats.persisted_rows.max`
- **Data retention:** Bounded by the row-count cap `sql.stats.persisted_rows.max` (default 1,000,000), not a TTL; effective time window varies with workload diversity
- **Plan instability:** Multiple plan hashes indicate optimizer/schema changes

## References
Expand Down
Loading