Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 17 additions & 36 deletions plugins/databases-on-aws/skills/dsql/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: dsql
description: "Build with Aurora DSQL — manage schemas, execute queries, handle migrations, diagnose query plans, and develop applications with a serverless, distributed SQL database. Covers IAM auth, multi-tenant patterns, MySQL-to-DSQL migration, DDL operations, query plan explainability, and SQL compatibility validation. Triggers on phrases like: DSQL, Aurora DSQL, create DSQL table, DSQL schema, migrate to DSQL, distributed SQL database, serverless PostgreSQL-compatible database, DSQL query plan, DSQL EXPLAIN ANALYZE, why is my DSQL query slow."
description: "Build with Aurora DSQL — manage schemas, execute queries, handle migrations, diagnose query plans, and develop applications with a serverless, distributed SQL database. Covers IAM auth, multi-tenant patterns, MySQL-to-DSQL migration, DDL operations, query plan explainability, and SQL compatibility validation. Triggers on phrases like: DSQL, Aurora DSQL, create DSQL table, DSQL schema, migrate to DSQL, distributed SQL database, serverless PostgreSQL-compatible database, DSQL query plan, DSQL EXPLAIN ANALYZE, why is my DSQL query slow, DSQL query performance, DSQL full scan, DSQL DPU, DSQL query cost, DSQL latency, optimize this query, this query is slow, explain this plan, query performance, high DPU, make this faster, why is this doing a full scan."
license: Apache-2.0
metadata:
tags: aws, aurora, dsql, distributed-sql, distributed, distributed-database, database, serverless, serverless-database, postgresql, postgres, sql, schema, migration, multi-tenant, iam-auth, aurora-dsql, mcp, orm
Expand Down Expand Up @@ -35,7 +35,7 @@ Load these files as needed for detailed guidance:

**When:** Always load for guidance using or updating the DSQL MCP server
**Contains:** Instructions for setting up the DSQL MCP server with 2 configuration options as
sampled in [.mcp.json](../../.mcp.json)
sampled in [mcp/.mcp.json](mcp/.mcp.json)

1. Documentation-Tools Only
2. Database Operations (requires a cluster endpoint)
Expand Down Expand Up @@ -111,8 +111,8 @@ sampled in [.mcp.json](../../.mcp.json)

### Query Plan Explainability (modular):

**When:** MUST load all four at Workflow 8 Phase 0 — [query-plan/plan-interpretation.md](references/query-plan/plan-interpretation.md), [query-plan/catalog-queries.md](references/query-plan/catalog-queries.md), [query-plan/guc-experiments.md](references/query-plan/guc-experiments.md), [query-plan/report-format.md](references/query-plan/report-format.md)
**Contains:** DSQL node types + Node Duration math + estimation-error bands, pg_class/pg_stats/pg_indexes SQL + correlated-predicate verification, GUC experiment procedures + 30-second skip protocol, required report structure + element checklist + support request template
**When:** MUST load [query-plan/workflow.md](references/query-plan/workflow.md) at Workflow 8 entry — it gates the remaining files
**Contains:** Trigger criteria, context disambiguation, routing, phased workflow, and references to: [plan-interpretation.md](references/query-plan/plan-interpretation.md), [catalog-queries.md](references/query-plan/catalog-queries.md), [guc-experiments.md](references/query-plan/guc-experiments.md), [report-format.md](references/query-plan/report-format.md), [query-rewrites-generic.md](references/query-plan/query-rewrites-generic.md), [query-rewrites-dsql-specific.md](references/query-plan/query-rewrites-dsql-specific.md)

### SQL Compatibility Validation:

Expand Down Expand Up @@ -164,16 +164,18 @@ defaults that may change — when a user's decision depends on an exact limit, v
| Max indexes per table | 24 | `aurora dsql index limits` |
| Max columns per index | 8 | `aurora dsql index limits` |
| IDENTITY/SEQUENCE CACHE values | 1 or >= 65536 | `aurora dsql sequence cache` |
| Supported column data types | See docs | `aurora dsql supported data types` |

**When to verify:** Before recommending batch sizes, connection pool settings, or schema designs where hitting a limit would cause failures; any time the exact number can affect user decision.
**When to verify:** Before recommending batch sizes, connection pool settings, or schema designs
where hitting a limit would cause failures. No need to verify for general guidance or when
the exact number doesn't affect the user's decision.

**Fallback:** If `awsknowledge` is unavailable, use the defaults above and flag that limits should be verified against [DSQL documentation](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/).
**Fallback:** If `awsknowledge` is unavailable, use the defaults above and note to the user
that limits should be verified against [DSQL documentation](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/).

## CLI Scripts Available

Bash scripts in [scripts/](../../scripts/) for cluster management (create, delete, list, cluster info), psql connection, and bulk data loading from local/s3 csv/tsv/parquet files.
See [scripts/README.md](../../scripts/README.md) for usage and hook configuration.
Bash scripts in [scripts/](scripts/) for cluster management (create, delete, list, cluster info), psql connection, and bulk data loading from local/s3 csv/tsv/parquet files.
See [scripts/README.md](scripts/README.md) for usage.

---

Expand All @@ -197,7 +199,7 @@ See [scripts/README.md](../../scripts/README.md) for usage and hook configuratio
- MUST include tenant_id in all tables
- MUST use `CREATE INDEX ASYNC` exclusively
- MUST issue each DDL in its own transact call: `transact(["CREATE TABLE ..."])`
- MUST serialize arrays as TEXT or JSON; cast back at query time (`string_to_array(text, ',')` or `jsonb_array_elements_text(json::jsonb)`)
- MUST store arrays/JSON as TEXT

### Workflow 2: Safe Data Migration

Expand All @@ -215,7 +217,10 @@ Every DDL statement generated in this workflow MUST be validated with `dsql_lint
- MUST batch updates under 3,000 rows in separate transact calls
- MUST issue each ALTER TABLE in its own transaction

**Recovery — batch fails midway:** Rows already updated keep their new value (each batch committed independently). Resume by filtering on the unset state (`WHERE new_column IS NULL`) and continue. Re-running is safe because the filter naturally excludes completed rows.
**Recovery — batch fails midway:** Rows already updated keep their new value (each batch committed
in its own transaction). Resume by filtering on the unset state — e.g. add
`WHERE new_column IS NULL` (or the sentinel value) to the next UPDATE — and continue from there.
Re-running the entire migration is safe because the filter naturally excludes completed rows.

### Workflow 3: Application-Layer Referential Integrity

Expand Down Expand Up @@ -252,31 +257,7 @@ Run `dsql_lint(sql=source_sql, fix=true)` to validate and auto-convert PostgreSQ

### Workflow 8: Query Plan Explainability

Explains why the DSQL optimizer chose a particular plan. Triggered by slow queries, high DPU, unexpected Full Scans, or plans the user doesn't understand. **REQUIRES a structured Markdown diagnostic report is the deliverable** beyond conversation — run the workflow end-to-end before answering. Use the `aurora-dsql` MCP when connected; fall back to raw `psql` with a generated IAM token (see the fallback block below) otherwise.

**Phase 0 — Load reference material.** Read all four before starting — each has content later phases need verbatim (node-type math, exact catalog SQL, the `>30s` skip protocol, required report elements):

1. [query-plan/plan-interpretation.md](references/query-plan/plan-interpretation.md) — node types, duration math, anomalous values
2. [query-plan/catalog-queries.md](references/query-plan/catalog-queries.md) — pg_class / pg_stats / pg_indexes SQL
3. [query-plan/guc-experiments.md](references/query-plan/guc-experiments.md) — GUC procedures and `>30s` skip protocol
4. [query-plan/report-format.md](references/query-plan/report-format.md) — required report structure

**Phase 1 — Capture the plan.** **ALWAYS** run `readonly_query("EXPLAIN ANALYZE VERBOSE …")` on the user's query verbatim (SELECT form) — **ALWAYS** capture a fresh plan from the cluster, even when the user describes the plan or reports an anomaly. **MAY** leverage `get_schema` or `information_schema` for schema sanity checks. When EXPLAIN errors (`relation does not exist`, `column does not exist`), **MUST** report the error verbatim — **MUST NOT** invent DSQL-specific semantics (e.g., case sensitivity, identifier quoting) as the root cause. Extract Query ID, Planning Time, Execution Time, DPU Estimate. **SELECT** runs as-is. **UPDATE/DELETE** rewrite to the equivalent SELECT (same join chain + WHERE) — the optimizer picks the same plan shape. **INSERT**, pl/pgsql, DO blocks, and functions **MUST** be rejected. **MUST NOT** use `transact --allow-writes` for plan capture; it bypasses MCP safety.

**Phase 2 — Gather evidence.** Using SQL from `catalog-queries.md`, query `pg_class`, `pg_stats`, `pg_indexes`, `COUNT(*)`, `COUNT(DISTINCT)`. Classify estimation errors per `plan-interpretation.md` (2x–5x minor, 5x–50x significant, 50x+ severe). Detect correlated predicates and data skew.

**Phase 3 — Experiment (conditional).** ≤30s: run GUC experiments per `guc-experiments.md` (default + merge-join-only) plus optional redundant-predicate test. >30s: skip experiments, include the manual GUC testing SQL verbatim in the report, and do not re-run for redundant-predicate testing. Anomalous values (impossible row counts): confirm query results are correct despite the anomalous EXPLAIN, flag as a potential DSQL bug, and produce the Support Request Template from `report-format.md`.

**Phase 4 — Produce the report, invite reassessment.** Produce the full diagnostic report per the "Required Elements Checklist" in [query-plan/report-format.md](references/query-plan/report-format.md) — structure is non-negotiable. End with the "Next Steps" block from that reference so the user can ask for a reassessment after applying a recommendation. When the user says "reassess" (or equivalent), re-run Phase 1–2 and **append an "Addendum: After-Change Performance"** to the original report (before/after table, match against expected impact) rather than producing a new report.

**psql fallback (MCP unavailable).** Pipe statements into `psql` via heredoc and check `$?`; report failures without proceeding on partial evidence:

```bash
TOKEN=$(aws dsql generate-db-connect-admin-auth-token --hostname "$HOST" --region "$REGION")
PGPASSWORD="$TOKEN" psql "host=$HOST port=5432 user=admin dbname=postgres sslmode=require" <<<"EXPLAIN ANALYZE VERBOSE <sql>;"
```

**Safety.** Plan capture uses `readonly_query` exclusively — it rejects INSERT/UPDATE/DELETE/DDL at the MCP layer. Rewrite DML to SELECT (Phase 1) rather than asking `transact --allow-writes` to run it; write-mode `transact` bypasses all MCP safety checks. **MUST NOT** run arbitrary DDL/DML or pl/pgsql.
Explains why the DSQL optimizer chose a particular plan. **REQUIRES a structured Markdown diagnostic report as the deliverable.** MUST load [query-plan/workflow.md](references/query-plan/workflow.md) for trigger criteria, context disambiguation, routing, and the full phased workflow (Phase 0–4).

---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,82 @@ Compare against `pg_stats.n_distinct`:
- If `n_distinct` is positive: compare directly
- If `n_distinct` is negative: multiply absolute value by actual row count to get estimated distinct count

## Column Types for Predicate Columns

Retrieve the declared types for columns used in WHERE predicates and JOIN conditions, to detect type coercion index bypass (see plan-interpretation.md):

```sql
SELECT
c.table_name,
c.column_name,
c.data_type,
c.udt_name,
c.is_nullable
FROM information_schema.columns c
WHERE c.table_schema = '{schema}'
AND c.table_name IN ('{table1}', '{table2}')
AND c.column_name IN ('{col1}', '{col2}');
```

Cross-reference the column type against predicate literals visible in the EXPLAIN output. When the types differ, check the implicit cast compatibility matrix in plan-interpretation.md to determine whether the mismatch prevents index usage.

## B-Tree Cross-Type Operator Support

Determine which type pairs the DSQL B-Tree access method supports for index scans. If a (predicate-type, column-type) pair has no registered operator, the index cannot be used for that comparison:

```sql
SELECT DISTINCT
lt.typname AS left_type,
rt.typname AS right_type
FROM pg_amop ao
JOIN pg_type lt ON lt.oid = ao.amoplefttype
JOIN pg_type rt ON rt.oid = ao.amoprighttype
WHERE ao.amopmethod = 10003
AND ao.amoplefttype != ao.amoprighttype
ORDER BY lt.typname, rt.typname;
```

This returns only the cross-type pairs (where left and right types differ). Same-type pairs are always supported. Use this to confirm whether a suspected type mismatch actually prevents index usage — if the pair appears in the result, the index CAN be used and the issue lies elsewhere.

To check a specific pair:

```sql
SELECT EXISTS (
SELECT 1
FROM pg_amop ao
JOIN pg_type lt ON lt.oid = ao.amoplefttype
JOIN pg_type rt ON rt.oid = ao.amoprighttype
WHERE ao.amopmethod = 10003
AND lt.typname = '{predicate_type}'
AND rt.typname = '{column_type}'
) AS index_usable;
```

## Indexed Column Types

Retrieve index definitions together with their column types to identify type coercion bypass candidates:

```sql
SELECT
i.indexname,
i.tablename,
a.attname AS column_name,
t.typname AS column_type,
i.indexdef
FROM pg_indexes i
JOIN pg_class ic ON ic.relname = i.indexname
JOIN pg_index ix ON ix.indexrelid = ic.oid
JOIN pg_attribute a ON a.attrelid = ix.indrelid
AND a.attnum = ANY(ix.indkey)
JOIN pg_type t ON t.oid = a.atttypid
JOIN pg_namespace n ON n.oid = ic.relnamespace
WHERE n.nspname = '{schema}'
AND i.tablename IN ('{table1}', '{table2}')
ORDER BY i.tablename, i.indexname, a.attnum;
```

Use this when a Full Scan appears despite an apparently usable index — compare the index column's `column_type` against the predicate literal's inferred type.

## Value Distribution Analysis

For columns with suspected data skew, retrieve the actual top-N value frequencies:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,63 @@ Detect physically impossible row counts in DSQL plan nodes:

These anomalous values do not affect query correctness — only diagnostic output accuracy.

## Type Coercion and Index Bypass
Comment thread
Morlej marked this conversation as resolved.

An index may exist on a column yet not be used when the predicate value's type does not match the column's declared type and no implicit cast exists between the two types.

### Detection Pattern

Flag this condition when **all** of the following are true:

1. An index exists whose leading column matches a WHERE predicate column
2. The plan uses a Full Scan or Seq Scan on that table instead of an Index Scan
3. The predicate literal's type differs from the indexed column's declared type
4. The type pair is **not** in the implicit cast compatibility matrix below

### Why It Happens

DSQL (like PostgreSQL) can only use a B-Tree index when the comparison operator's input types match the index's operator class. When a predicate supplies a value of a different type:

- If an implicit cast exists from the predicate type to the column type, the planner applies it transparently and can still use the index
- If no implicit cast exists, the planner must apply a per-row cast or comparison function that cannot use the index's ordering — resulting in a full scan

This is particularly surprising to users because the query returns correct results (the cast happens at execution time, row by row) but performance degrades dramatically on large tables.

### Determining Index-Compatible Type Pairs

Rather than relying on a static matrix, query `pg_amop` directly on the cluster to determine which cross-type comparisons the DSQL B-Tree index access method supports. See catalog-queries.md for the exact SQL.

The key insight: DSQL's B-Tree access method (amopmethod `10003`) only supports index scans when a registered operator exists for the specific (left-type, right-type) pair. If no operator is registered for the pair, the index cannot be used — regardless of whether a general-purpose implicit cast exists in `pg_cast`.

In practice, cross-type index support is limited to the integer family (smallint, integer, bigint — all combinations). All other indexed types (text, numeric, uuid, timestamp, date, boolean, etc.) require an exact type match between the predicate and the indexed column for the index to be usable.

### Quantifying Impact

When this pattern is detected:

```
Full Scan rows processed = actual_rows from Full Scan node
Index Scan rows (expected) = estimated rows matching the predicate (from pg_stats selectivity)
Scan amplification = Full Scan rows / Index Scan rows (expected)
```

### Recommendation Template

When a type coercion bypass is confirmed:

- **Explicit cast in the predicate:** Rewrite `WHERE col = '42'` as `WHERE col = 42::float` (cast the literal to the column type)
- **Application-layer fix:** Ensure the application passes parameters with the correct type rather than relying on implicit conversion
- **MUST keep the column type unchanged** — changing it to accommodate mismatched predicates masks the real issue and MAY break other queries

### Evidence Gathering

To confirm this pattern, cross-reference:

1. The column type from `pg_attribute` or `information_schema.columns` (see catalog-queries.md)
2. The index definition from `pg_indexes`
3. The predicate literal in the EXPLAIN output (visible in `Filter:` or `Index Cond:` lines)
4. The implicit cast matrix above

## Projections and Row Width

Capture Projections lists from Storage Scan and Storage Lookup nodes:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Query Rewrites — DSQL-Specific

SQL rewrites that address Aurora DSQL-specific behaviors and optimizer constraints. These SHOULD be recommended when the plan reveals inefficiency unique to DSQL's distributed architecture.

## Available Rewrites

| Pattern Detected | Reference File |
| ------------------------------- | ------------------------------------------------------------- |
| COUNT(*) timeout on large table | [reltuples-estimate.md](query-rewrites/reltuples-estimate.md) |
| Join count exceeds DP threshold | [split-large-joins.md](query-rewrites/split-large-joins.md) |
Loading
Loading