Active Session History for Postgres — lightweight wait event sampling with zero bloat.
The anti-extension. Pure SQL + PL/pgSQL that works on any Postgres 14+ — including RDS, Cloud SQL, AlloyDB, Supabase, Neon, and every other managed provider. No C extension, no shared_preload_libraries, no provider approval, no restart. Just \i and go.
Short walkthrough of the LLM-assisted investigation flow against a live row-lock spike in Postgres 18. Source: demos/.
Postgres has no built-in session history. When something was slow an hour ago, there is nothing to look at. pg_ash samples pg_stat_activity every second and stores the results in a compact format queryable with plain SQL.
| pg_ash | pg_wait_sampling | pgsentinel | External sampling | |
|---|---|---|---|---|
| Install | \i (pure SQL) |
shared_preload_libraries | shared_preload_libraries (package or compile) | Separate infra |
| Works on managed (RDS, Cloud SQL, Supabase, ...) | Yes | Cloud SQL only (limited managed support) | Not known to be supported | Yes, with effort |
| Sampling rate | 1s (via pg_cron, system cron, or any scheduler) | 10ms (in-process) | 10ms (in-process) | 15-60s typical |
| Visibility | Inside Postgres | Inside Postgres | Inside Postgres | Outside only |
| Storage | Disk (~30 MiB/day) | Memory only | Memory only | External store |
| Historical queries | Yes (persistent) | Ring buffer (lost on restart) | Ring buffer (lost on restart) | Depends on setup |
| Pure SQL | Yes | No (C extension) | No (C extension) | No |
| Maintenance overhead | None | None | None | High |
| Requirements | None (pg_cron optional) | shared_preload_libraries (restart required) | shared_preload_libraries (restart required) | Agent + storage |
-- prerequisites (optional but recommended)
create extension if not exists pg_stat_statements; -- enables query text + execution metrics
-- pg_cron is optional: if installed, ash.start() uses it; otherwise see "Scheduling without pg_cron"
-- install (just run the SQL file — works on RDS, Cloud SQL, AlloyDB, etc.)
\i sql/ash-install.sql
-- start sampling (1 sample/second — uses pg_cron if available, otherwise prints external scheduling instructions)
select ash.start('1 second');
-- wait a few minutes, then query
select * from ash.top_waits('1 hour');
select * from ash.top_queries_with_text('1 hour');
select * from ash.top_by_type('1 hour');
-- stop sampling
select ash.stop();
-- uninstall (drops the ash schema and pg_cron jobs)
select ash.uninstall('yes');ash.start(interval) accepts PostgreSQL interval values. The interval is converted to a pg_cron schedule:
| Interval range | Example input | pg_cron schedule | Description |
|---|---|---|---|
| 1–59 seconds | '1 second' .. '59 seconds' |
N seconds |
Every N seconds (native pg_cron format) |
| 1–59 minutes | '1 minute' .. '59 minutes' |
*/N * * * * |
Every N minutes via cron syntax |
| 1–23 hours | '1 hour' .. '23 hours' |
0 */N * * * (or 0 * * * * for 1h) |
Every N hours via cron syntax (max 23h) |
Notes:
- Sub-minute intervals must be exact seconds (e.g.,
'30 seconds') - Minute and hour intervals must be exact (e.g.,
'5 minutes'works,'90 seconds'does not — use'1 minute'instead) - Hour intervals are limited to 23 hours maximum (cron syntax limitation). For daily sampling, use
'23 hours'or consider a different approach - The default and recommended interval is
'1 second'for high-resolution sampling - See
select * from ash.status()for the current sampling interval
The role that runs sampling (the owner of ash.take_sample(), or the pg_cron
job owner) should be a superuser or a member of the built-in
pg_read_all_stats role. Without it, pg_stat_activity.query_id is visible
only for activity owned by the sampling role; queries run by other users come
back with query_id = NULL, which ash records under the sentinel value 0.
This silently skews ash.top_queries*, ash.query_waits, and any per-query
drill-downs — all of that "other-user" traffic collapses into a single
query_id = 0 bucket. To grant the role:
-- as a superuser
grant pg_read_all_stats to <sampling_role>;On managed services where pg_read_all_stats is already granted to the
primary admin (RDS rds_superuser, Cloud SQL cloudsqlsuperuser, Supabase
postgres), installing and running ash as that role is sufficient. Always
verify with select pg_has_role(current_user, 'pg_read_all_stats', 'MEMBER');.
If the privilege probe itself errors (e.g. missing pg_roles access on a
locked-down managed service), ash.start() does not abort — it emits a
RAISE NOTICE 'privilege probe failed: ...' so the skipped check remains
visible in server / CI logs.
-- from 1.0 to 1.1
\i sql/ash-1.0-to-1.1.sql
-- from 1.1 to 1.2
\i sql/ash-1.1-to-1.2.sql
-- from 1.2 to 1.3
\i sql/ash-1.2-to-1.3.sql
-- from 1.3 to 1.4
\i sql/ash-1.3-to-1.4.sql
-- check version
select * from ash.status();| Function | Description |
|---|---|
ash.start(interval) |
Start sampling (default: '1 second'). Uses pg_cron if available, otherwise prints external scheduling instructions. Also schedules rollup jobs |
ash.stop() |
Stop sampling and rollups (removes pg_cron jobs, sets sampling_enabled = false) |
ash.status() |
Sampling status, version, partition info, rollup metrics, debug_logging state |
ash.take_sample() |
Take one sample manually (called automatically by the scheduler) |
ash.rotate() |
Rotate sample partitions (called automatically, or manually for external schedulers). Runs pre-truncation rollup to prevent data loss |
ash.rebuild_partitions(N, 'yes') |
Change partition count (3–32). Destructive — all raw sample data is lost; requires 'yes' confirmation token. Rollup tables survive. Call ash.start() after to resume |
ash.rollup_minute([batch]) |
Aggregate raw samples into per-minute rollups. Watermark-based with catch-up. Default batch: 60 minutes |
ash.rollup_hour() |
Aggregate minute rollups into hourly rollups. Watermark-based |
ash.rollup_cleanup() |
Delete expired rollup rows per retention config |
ash.set_debug_logging([bool]) |
Enable/disable per-session RAISE LOG in take_sample() for diagnostics. Call with no argument to check current state |
ash.uninstall('yes') |
Drop the ash schema and remove pg_cron jobs |
| Function | Description |
|---|---|
ash.top_waits(interval, limit, width) |
Top wait events ranked by sample count, with bar chart |
ash.top_queries(interval, limit) |
Top queries ranked by sample count |
ash.top_queries_with_text(interval, limit) |
Same as top_queries, with pg_stat_statements join (requires pg_stat_statements) |
ash.query_waits(query_id, interval, width, color) |
Wait profile for a specific query |
ash.top_by_type(interval, width, color) |
Breakdown by wait event type |
ash.wait_timeline(interval, bucket) |
Wait events bucketed over time |
ash.samples_by_database(interval) |
Per-database activity |
ash.activity_summary(interval) |
One-call overview: samples, peak backends, top waits, top queries |
ash.timeline_chart(interval, bucket, top, width) |
Stacked bar chart of wait events over time |
ash.event_queries(event, interval, limit) |
Top queries for a specific wait event (requires pg_stat_statements) |
ash.samples(interval, limit) |
Fully decoded raw samples with timestamps and query text |
All interval-based functions default to '1 hour'. Limit defaults to 10 (top 9 + "Other" rollup row).
| Function | Description |
|---|---|
ash.top_waits_at(start, end, limit, width) |
Top waits in a time range, with bar chart |
ash.top_queries_at(start, end, limit) |
Top queries in a time range |
ash.query_waits_at(query_id, start, end, width, color) |
Query wait profile in a time range |
ash.event_queries_at(event, start, end, limit) |
Top queries for a wait event in a time range (requires pg_stat_statements) |
ash.samples_at(start, end, limit) |
Fully decoded raw samples in a time range |
ash.top_by_type_at(start, end, width, color) |
Breakdown by wait event type in a time range |
ash.wait_timeline_at(start, end, bucket) |
Wait timeline in a time range |
ash.timeline_chart_at(start, end, bucket, top, width) |
Stacked bar chart in a time range |
Start and end are timestamptz. Bucket defaults to '1 minute'.
| Function | Description |
|---|---|
ash.minute_waits(interval, limit) |
Top wait events from minute rollups (default: last 1 hour) |
ash.minute_waits_at(start, end, limit) |
Same, absolute time range |
ash.hourly_queries(interval, limit) |
Top queries from hourly rollups with pg_stat_statements text (default: last 1 day) |
ash.hourly_queries_at(start, end, limit) |
Same, absolute time range |
ash.daily_peak_backends(interval) |
Peak and average backends per day (default: last 7 days) |
ash.daily_peak_backends_at(start, end) |
Same, absolute time range |
Rollup readers query rollup_1m / rollup_1h tables — they work even after raw samples have rotated away.
| Function | Description |
|---|---|
ash.ts_from_timestamptz(timestamptz) |
Convert timestamptz to internal int4 epoch offset (useful for querying rollup tables directly) |
ash.ts_to_timestamptz(int4) |
Convert int4 epoch offset back to timestamptz |
ash.decode_sample(integer[], smallint) |
Decode a single packed ash.sample.data array. Pass slot for unambiguous query_id resolution |
ash.decode_sample(int4) |
Convenience: decode every ash.sample row at the given sample_ts (across all datids/slots). Returns (datid, wait_event, query_id, count) |
ash.decode_sample_at(timestamptz) |
Same as above but accepts timestamptz (converted via ts_from_timestamptz). Named with the _at suffix (consistent with samples_at / top_waits_at) to avoid decode_sample(unknown) overload ambiguity |
decode_sample / decode_sample_at have EXECUTE revoked from PUBLIC (per the privilege hardening in #45). Grant explicitly to roles that need them, e.g. grant execute on function ash.decode_sample(int4) to my_reader;.
-- All decoded backends recorded at a specific moment, by database:
select db.datname, d.wait_event, d.query_id, d.count
from ash.decode_sample_at('2026-04-19 14:30:00+00'::timestamptz) d
join pg_database db on db.oid = d.datid
order by db.datname, d.wait_event;select * from ash.status(); metric | value
----------------------------+-------------------------------
version | 1.4
color | off
num_partitions | 3
sampling_enabled | true
skipped_samples | 0
current_slot | 0
sample_interval | 00:00:01
rotation_period | 1 day
raw_retention | 1 day + current partial
include_bg_workers | false
debug_logging | false
installed_at | 2026-02-16 08:30:00.000000+00
rotated_at | 2026-02-16 08:30:00.000000+00
time_since_rotation | 00:09:03.123456
last_sample_ts | 2026-02-16 08:39:03+00
samples_in_current_slot | 56
samples_total | 56
wait_event_map_count | 11
wait_event_map_utilization | 0.03%
query_map_count | 8
rollup_1m_rows | 540
rollup_1m_oldest | 2026-02-16 08:30:00+00
rollup_1m_newest | 2026-02-16 08:39:00+00
rollup_1m_retention | 30 days
rollup_1h_rows | 0
rollup_1h_retention | 1825 days
pg_cron_available | yes
-- morning coffee: what happened overnight?
select * from ash.activity_summary('5 minutes'); metric | value
----------------------+---------------------------------------------------------------------------------------------
time_range | 00:05:00
total_samples | 56
avg_active_backends | 6.6
peak_active_backends | 10
peak_time | 2026-02-16 08:38:16+00
databases_active | 1
top_wait_1 | Client:ClientRead (46.77%)
top_wait_2 | Timeout:PgSleep (11.83%)
top_wait_3 | Lock:transactionid (9.68%)
top_query_1 | -2835399305386018931 — COMMIT (29.73%)
top_query_2 | 3365820675399133794 — UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE b (23.24%)
top_query_3 | -4378765880691287891 — UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE t (11.35%)
-- top wait events (default: top 10 + Other)
select * from ash.top_waits('5 minutes'); wait_event | samples | pct | bar
--------------------+---------+-------+---------------------------------------
Client:ClientRead | 174 | 46.77 | ██████████████████████████████ 46.77%
Timeout:PgSleep | 44 | 11.83 | ████████ 11.83%
Lock:transactionid | 36 | 9.68 | ██████ 9.68%
CPU* | 35 | 9.41 | ██████ 9.41%
LWLock:WALWrite | 31 | 8.33 | █████ 8.33%
IdleTx | 26 | 6.99 | ████ 6.99%
IO:WalSync | 19 | 5.11 | ███ 5.11%
Lock:tuple | 5 | 1.34 | █ 1.34%
LWLock:LockManager | 2 | 0.54 | █ 0.54%
-- top queries with text from pg_stat_statements
select * from ash.top_queries_with_text('5 minutes', 5); query_id | samples | pct | calls | total_exec_time_ms | mean_exec_time_ms | query_text
----------------------+---------+-------+--------+--------------------+-------------------+---------------------------------------------------------------------
-2835399305386018931 | 110 | 29.73 | 283202 | 1234.56 | 0.00 | commit
3365820675399133794 | 86 | 23.24 | 283195 | 518349.35 | 1.83 | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2
5457019535816659310 | 44 | 11.89 | 11 | 195225.25 | 17747.75 | select pg_sleep($1)
-4378765880691287891 | 42 | 11.35 | 283195 | 113278.00 | 0.40 | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2
-- breakdown by wait event type
select * from ash.top_by_type('5 minutes'); wait_event_type | samples | pct | bar
-----------------+---------+-------+-------------------------------------------------
Client | 174 | 46.77 | ████████████████████████████████████████ 46.77%
Timeout | 44 | 11.83 | ██████████ 11.83%
Lock | 41 | 11.02 | █████████ 11.02%
CPU* | 35 | 9.41 | ████████ 9.41%
LWLock | 33 | 8.87 | ████████ 8.87%
IdleTx | 26 | 6.99 | ██████ 6.99%
IO | 19 | 5.11 | ████ 5.11%
-- what is query 3365820675399133794 waiting on?
select * from ash.query_waits(3365820675399133794, '5 minutes'); wait_event | samples | pct | bar
--------------------+---------+-------+-------------------------------------------------
Client:ClientRead | 32 | 54.24 | ████████████████████████████████████████ 54.24%
Lock:transactionid | 12 | 20.34 | ███████████████ 20.34%
LWLock:WALWrite | 6 | 10.17 | ████████ 10.17%
CPU* | 4 | 6.78 | █████ 6.78%
IO:WalSync | 3 | 5.08 | ████ 5.08%
IdleTx | 2 | 3.39 | ██ 3.39%
-- same, but during a specific time window
select * from ash.query_waits_at(3365820675399133794, '2026-02-16 08:38', '2026-02-16 08:40');-- which queries are stuck on Lock:transactionid?
select * from ash.event_queries('Lock:transactionid', '1 hour');
-- or by wait type (matches all events of that type)
select * from ash.event_queries('IO', '1 hour');-- see the last 20 decoded samples with query text
select * from ash.samples('10 minutes', 20); sample_time | database_name | active_backends | wait_event | query_id | query_text
------------------------+---------------+-----------------+--------------------+----------------------+--------------------------------------------------------------
2026-02-16 11:18:51+00 | postgres | 7 | CPU* | -2835399305386018931 | END
2026-02-16 11:18:51+00 | postgres | 7 | CPU* | 3365820675399133794 | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE ...
2026-02-16 11:18:49+00 | postgres | 5 | Client:ClientRead | 9144568883098003499 | SELECT abalance FROM pgbench_accounts WHERE aid = $1
2026-02-16 11:18:49+00 | postgres | 5 | IO:WalSync | -2835399305386018931 | END
2026-02-16 11:18:49+00 | postgres | 3 | Lock:transactionid | -2835399305386018931 | END
2026-02-16 11:18:49+00 | postgres | 5 | LWLock:WALWrite | -2835399305386018931 | END
-- raw samples during an incident
select * from ash.samples_at('2026-02-14 03:00', '2026-02-14 03:05', 50);Always go through ash.samples() / ash.samples_at() — the underlying ash.sample
table stores a packed integer[] and cannot be joined directly. The defaults for
ash.samples() are p_interval => '1 hour' and p_limit => 100; pass a large
p_limit when dumping.
-- dump every sample from the last hour
\copy (select * from ash.samples('1 hour'::interval, 10000000)) to '/tmp/ash.csv' csv header
-- dump a specific incident window
\copy (select * from ash.samples_at('2026-02-14 03:00', '2026-02-14 03:05', 10000000)) to '/tmp/incident.csv' csv headerUse \copy (psql) rather than server-side COPY TO if /tmp isn't writable by
the Postgres user (managed services), and check the exit status — silently
redirecting stderr to /dev/null will hide errors like typos in table names.
Visualize wait event patterns over time — spot spikes, correlate with deployments, see what changed.
select bucket_start, active, detail, chart
from ash.timeline_chart('5 minutes', '30 seconds', 3, 40); bucket_start | active | detail | chart
-------------------------+--------+----------------------------------------------------------------+-----------------------------------------------------------
| | | █ Client:ClientRead ▓ LWLock:WALWrite ░ IdleTx · Other
2026-02-16 08:37:30+00 | 2.0 | Other=2.0 | ···········
2026-02-16 08:38:00+00 | 7.0 | Client:ClientRead=2.3 LWLock:WALWrite=0.8 IdleTx=0.4 Other=3.5 | █████████████▓▓▓▓▓░░····················
2026-02-16 08:38:30+00 | 6.6 | Client:ClientRead=4.0 LWLock:WALWrite=0.4 IdleTx=0.5 Other=1.7 | ███████████████████████▓▓░░░··········
2026-02-16 08:39:00+00 | 5.3 | Client:ClientRead=3.3 LWLock:WALWrite=0.3 IdleTx=0.7 Other=1.0 | ███████████████████▓▓░░░░······
Each rank gets a distinct character — █ (rank 1), ▓ (rank 2), ░ (rank 3), ▒ (rank 4+), · (Other) — so the breakdown is visible without color.
-- zoom into a specific time window
select * from ash.timeline_chart_at(
now() - interval '10 minutes', now(),
'1 minute', 3, 50
);Experimental: ANSI colors. Enable per-session or per-call — green = CPU*, blue = IO, red = Lock, pink = LWLock, cyan = IPC, yellow = Client, orange = Timeout, teal = BufferPin, purple = Activity, light purple = Extension, light yellow = IdleTx.
-- Option 1: enable once for the session (recommended)
set ash.color = on;
-- Option 2: per-call
select * from ash.top_waits('1 hour', p_color => true);psql's table formatter escapes ANSI codes — to render colors, pipe through sed:
-- add to ~/.psqlrc for a reusable :color command
\set color '\\g | sed ''s/\\\\x1B/\\x1b/g'' | less -R'
-- then use it
select * from ash.top_waits('1 hour') :color
select * from ash.timeline_chart('1 hour') :color
Colors also render natively in pgcli, DataGrip, and other clients that pass raw bytes.
top_waits with colors:
timeline_chart with colors:
Example data generated with pgbench -c 8 -T 65 on Postgres 17 with concurrent lock contention and idle-in-transaction sessions.
Use the _at functions with absolute timestamps to zoom into a specific time window:
-- what happened between 3:00 and 3:10 am?
select * from ash.top_waits_at('2026-02-14 03:00', '2026-02-14 03:10');
-- which queries were running during the incident?
select * from ash.top_queries_at('2026-02-14 03:00', '2026-02-14 03:10');
-- minute-by-minute timeline of the incident
select * from ash.wait_timeline_at(
'2026-02-14 03:00',
'2026-02-14 03:10',
'1 minute'
);pg_ash functions chain naturally for how an LLM investigates a problem — each answer tells it what to ask next.
Prompt: "There was a performance issue about 5 minutes ago. Investigate."
Step 1 — the LLM checks the big picture:
select * from ash.activity_summary('10 minutes'); metric | value
---------------------+---------
samples | 600
avg_active_sessions | 4.2
max_active_sessions | 12
top_wait_event | Lock:tuple
top_query_id | 7283901445
"Average 4.2 active sessions but peak 12 — something spiked. And Lock:tuple is the top wait event."
Step 2 — drill into the waits:
select * from ash.top_waits('10 minutes'); wait_event | samples | pct | bar
----------------+---------+------+---------------------
Lock:tuple | 2810 | 68% | ████████████████████
CPU* | 830 | 20% | ██████
IO:DataFileRead| 290 | 7% | ██
Client:ClientRe| 125 | 3% | █
Other | 83 | 2% |
"Lock:tuple is 68% of all waits. Multiple sessions fighting over the same rows."
Step 3 — see the timeline:
select * from ash.timeline_chart('10 minutes', '30 seconds', 5, 60); bucket_start | active | chart | detail
-------------------------+--------+-----------------------------------------------------------------+----------------------------------
| | █ Lock:tuple ▓ CPU* ░ IO:DataFileRead ▒ Client:ClientRead |
2026-02-17 14:00:00+00 | 2.1 | ▓▓▓▓░░··· | CPU*=1.2 IO=0.5 Other=0.4
2026-02-17 14:00:30+00 | 2.3 | ▓▓▓▓▓░░··· | CPU*=1.4 IO=0.5 Other=0.4
2026-02-17 14:01:00+00 | 3.8 | ██▓▓▓▓░░···· | Lock=0.6 CPU*=1.5 IO=0.5 Other=1.2
2026-02-17 14:01:30+00 | 8.5 | █████████████████▓▓▓░░··· | Lock=5.8 CPU*=1.3 IO=0.5 Other=0.9
2026-02-17 14:02:00+00 | 12.0 | █████████████████████████████▓▓░░·· | Lock=9.8 CPU*=1.1 IO=0.4 Other=0.7
2026-02-17 14:02:30+00 | 11.2 | ███████████████████████████▓▓░░·· | Lock=9.0 CPU*=1.0 IO=0.5 Other=0.7
2026-02-17 14:03:00+00 | 4.1 | ███▓▓▓▓░░···· | Lock=1.2 CPU*=1.5 IO=0.6 Other=0.8
2026-02-17 14:03:30+00 | 2.0 | ▓▓▓▓░··· | CPU*=1.1 IO=0.4 Other=0.5
2026-02-17 14:04:00+00 | 2.2 | ▓▓▓▓░░··· | CPU*=1.3 IO=0.5 Other=0.4
"The spike is clearly 14:01:30 to 14:02:30 — Lock:tuple dominates. Let me find which queries."
Step 4 — find the guilty queries:
select * from ash.event_queries_at(
'Lock:tuple',
'2026-02-17 14:01:30', '2026-02-17 14:02:30'
); query_id | samples | pct | bar
-------------+---------+------+---------------------
7283901445 | 412 | 85% | ████████████████████
9102384756 | 53 | 11% | ███
Other | 19 | 4% | █
"Query 7283901445 accounts for 85% of the lock waits."
Step 5 — get the SQL text:
select * from ash.top_queries_with_text('10 minutes'); query_id | samples | pct | mean_exec_time_ms | total_exec_time_ms | query_text
-------------+---------+------+-------------------+--------------------+--------------------------------------------------
7283901445 | 2395 | 58% | 842.30 | 534120.50 | UPDATE orders SET status = $1 WHERE id = $2
9102384756 | 530 | 13% | 215.60 | 42347.20 | UPDATE orders SET shipped_at = $1 WHERE id = $2
LLM's conclusion:
Root cause: multiple concurrent
UPDATE orders ... WHERE id = $2statements are contending on the same rows (Lock:tuple). Two different update patterns hit theorderstable — status updates and shipping updates — and when they target overlapping rows, they serialize on tuple locks.Mitigation options:
- Use
SELECT ... FOR UPDATE SKIP LOCKEDto skip already-locked rows and process them later- Batch the status and shipping updates into a single statement to reduce lock duration
- If these run from a queue worker, reduce concurrency or partition the work by order ID range
ash.take_sample() runs every second via pg_cron. It reads pg_stat_activity, groups active backends by (wait_event_type, wait_event, state), and encodes the result into a single integer[] per database:
{-5, 3, 101, 102, 103, -1, 2, 104, 105, -8, 1, 106}
│ │ │ │ │ │ │ │ │
│ │ └─ query_ids │ │ └─ qids │ │ └─ qid
│ └─ count=3 │ └─ count=2 │ └─ count=1
└─ wait_event_id=5 └─ weid=1 └─ weid=8
6 active backends across 3 wait events = 1 row, 12 array elements. Each query_id is one backend — if two backends run the same query, the same map_id appears twice (the count reflects total backends, not distinct queries). Full row size: 24 (tuple header) + 4 (sample_ts) + 4 (datid) + 2 (active_count) + 2 (slot) + 68 (array: 20-byte header + 12 × 4) + alignment = 106 bytes (measured with pg_column_size).
| Table | Purpose |
|---|---|
ash.wait_event_map |
Maps (state, wait_event_type, wait_event) to integer IDs |
ash.query_map_0..{N-1} |
Maps query_id (from pg_stat_activity) to integer IDs (one per sample partition, truncated on rotation) |
ash.rollup_1m |
Per-minute aggregated samples (30-day retention) |
ash.rollup_1h |
Per-hour aggregated rollups (5-year retention) |
Dictionaries are auto-populated by the sampler. Wait events are stable (~600 entries max across all Postgres versions). Query map grows as new queries appear and is garbage-collected based on last_seen.
Encoding version is tracked in ash.config.encoding_version, not in the array itself — zero per-row overhead.
Note on CPU*: When wait_event_type and wait_event are both NULL in pg_stat_activity, the backend is active but not in a known wait state. This is either genuine CPU work or an uninstrumented code path where Postgres does not report a wait event. The asterisk signals this ambiguity. See gaps.wait.events for details on uninstrumented wait events in Postgres — these gaps are being closed over time, making CPU* increasingly accurate.
Skytools PGQ-style N-partition ring buffer (default N=3, configurable 3–32 via ash.rebuild_partitions(N, 'yes')). Physical tables (sample_0 through sample_{N-1}) rotate at rotation_period intervals. TRUNCATE replaces the oldest partition — zero dead tuples, zero bloat, no VACUUM needed for sample tables.
N-1 partitions hold data at any time. One is always empty, ready for the next rotation. Before truncation, rotate() calls rollup_minute() to aggregate endangered samples into rollup tables.
┌──────────┐ ┌───────────┐ ┌──────────┐ ┌───────────────┐
│ sample_0 │ │ sample_1 │ │ sample_2 │ │ sample_{N-1} │
│ (today) │ │(yesterday)│ │ (empty) │... │ (readable) │
│ writing │ │ readable │ │ next │ │ │
└──────────┘ └───────────┘ └──────────┘ └───────────────┘
↑ TRUNCATE + rotate
Reader functions decode arrays inline using generate_subscripts() with direct array subscript access. This avoids per-row plpgsql function calls and is 9-17x faster than the CROSS JOIN LATERAL decode_sample() approach.
| Active backends | Storage/day | Max on disk (N-1 partitions, default N=3) |
|---|---|---|
| 10 | 11 MiB | 22 MiB |
| 50 | 30 MiB | 60 MiB |
| 100 | 50 MiB | 100 MiB |
| 200 | 100 MiB | 200 MiB |
| 500 | 245 MiB | 490 MiB |
At 500+ backends, TOAST LZ4 compression reduces actual storage. Increasing num_partitions increases the number of days kept, not the daily rate.
| Level | Retention | Rows/db | Storage/db |
|---|---|---|---|
1-minute (rollup_1m) |
30 days | ~43,200 | ~43 MiB |
1-hour (rollup_1h) |
5 years | ~43,800 | ~77 MiB |
Total: ~120 MiB per database for 5 years of trend data.
Measured on Postgres 17, 50 backends, 1s sampling, jit = off (median of 10 runs, warm cache):
| Metric | Result |
|---|---|
top_waits('1 hour') |
30 ms |
top_waits('24 hours') |
6.1 s |
top_queries_with_text('1 hour') |
31 ms |
take_sample() overhead |
53 ms |
| WAL per sample | ~29 KiB (~2.4 GiB/day) |
| Rotation (1-day partition) | 9 ms |
| Dead tuples after rotation | 0 |
See issue #1 for full benchmarks — EXPLAIN ANALYZE output, backend scaling, multi-database tests, WAL analysis, and concurrency testing.
- Postgres 14+ (requires
query_idinpg_stat_activity) - pg_cron 1.5+ (optional — for built-in scheduling; see Scheduling without pg_cron for alternatives)
- pg_stat_statements (optional but recommended — enables query text and execution metrics; without it,
top_queries_with_text(),event_queries(), andevent_queries_at()will error, andtop_queries(),samples()will return NULL forquery_text)
Note on query_id: The default compute_query_id = auto only populates query_id when pg_stat_statements is in shared_preload_libraries. If query_id is NULL in pg_stat_activity, set:
alter system set compute_query_id = 'on';
-- requires reload: select pg_reload_conf();-- change sampling interval (default: 1 second)
select ash.stop();
select ash.start('5 seconds');
-- change rotation interval (default: 1 day)
update ash.config set rotation_period = '12 hours';
-- check current configuration
select * from ash.status();All configuration is in the ash.config singleton table:
| Setting | Default | Description |
|---|---|---|
sample_interval |
1 second |
Time between samples |
rotation_period |
1 day |
How often partitions rotate |
num_partitions |
3 |
Number of sample partitions (3–32) |
include_bg_workers |
false |
Sample autovacuum, logical replication, parallel workers |
debug_logging |
false |
RAISE LOG for every sampled session |
rollup_1m_retention_days |
30 |
How long to keep minute-level rollups |
rollup_1h_retention_days |
1825 |
How long to keep hourly rollups (5 years) |
rollup_min_backend_seconds |
3 |
Minimum backend-seconds for a query to appear in rollup query_counts |
By default, pg_ash uses 3 partitions (1 day of history + current partial). To keep more raw sample history, increase the partition count:
-- keep 7 days of raw samples (9 partitions × 1-day rotation = 7 readable days + current)
-- 'yes' is required because the call drops all raw sample data
select ash.rebuild_partitions(9, 'yes');
-- resume sampling after rebuild
select ash.start();
-- verify
select * from ash.status();
-- num_partitions | 9
-- raw_retention | 7 days + current partialThe retention formula is (N - 2) × rotation_period. The minimum is 3 (current + previous + one being truncated), the maximum is 32.
rebuild_partitions() is destructive — all raw samples are lost. To prevent accidents, the call requires a 'yes' confirmation token (e.g. ash.rebuild_partitions(9, 'yes')); calling it without 'yes' raises an error and changes nothing. Rollup tables survive. You must call ash.start() afterward to resume sampling.
Raw samples rotate away after (N-2) × rotation_period. Rollup tables preserve aggregated data for long-term trend analysis:
rollup_1m: per-minute aggregates, kept for 30 days (~43 MiB/db)rollup_1h: per-hour aggregates, kept for 5 years (~77 MiB/db)
Rollups are populated automatically when pg_cron is available (ash.start() schedules them). Without pg_cron, schedule externally:
# Every minute: aggregate raw samples into minute rollups
* * * * * psql -qAtX -d mydb -c "SELECT ash.rollup_minute();"
# Every hour: aggregate minutes into hourly rollups
0 * * * * psql -qAtX -d mydb -c "SELECT ash.rollup_hour();"
# Daily at 3am: delete expired rollup rows
0 3 * * * psql -qAtX -d mydb -c "SELECT ash.rollup_cleanup();"Query rollup data with the rollup reader functions:
-- what were the top wait events in the last 6 hours? (from minute rollups)
select * from ash.minute_waits('6 hours');
-- top queries over the last week (from hourly rollups)
select * from ash.hourly_queries('7 days');
-- peak concurrency trend over the last 30 days
select * from ash.daily_peak_backends('30 days');
-- investigate a specific time range (even if raw samples are gone)
select * from ash.minute_waits_at('2026-03-01 02:00', '2026-03-01 03:00');Rollups use backend-seconds as the count unit (Oracle ASH-compatible). Each sample appearance = 1 backend-second at 1s sampling interval.
To change retention:
update ash.config set rollup_1m_retention_days = 14 where singleton; -- keep 2 weeks
update ash.config set rollup_1h_retention_days = 365 where singleton; -- keep 1 yearEnable per-session RAISE LOG output from take_sample() — useful for diagnosing connection pooler issues (e.g., PgBouncer mapping client_addr to pooler sessions):
-- check current state
select ash.set_debug_logging();
-- enable: each take_sample() call logs every active session to the Postgres log
select ash.set_debug_logging(true);
-- sample output in the Postgres server log:
-- LOG: ash.take_sample: pid=107 state=active wait_type=CPU* wait_event=CPU* backend_type=client backend query_id=-5287352711091412819
-- LOG: ash.take_sample: pid=108 state=idle in transaction wait_type=Client wait_event=ClientRead backend_type=client backend query_id=-6949053775937549307
-- disable
select ash.set_debug_logging(false);pg_cron logs every job execution to cron.job_run_details. At 1-second sampling, this adds ~12 MiB/day of unbounded growth with no built-in purge.
Recommended: disable cron.log_run. Errors from failed jobs still appear in the Postgres server log (cron.log_min_messages defaults to WARNING) — you lose nothing important, only the job_run_details table entries.
alter system set cron.log_run = off;
-- requires Postgres restart (postmaster context)If you need run history for other pg_cron jobs (unfortunately, as of pg_cron 1.6, per-job logging configuration is not supported), schedule periodic cleanup instead:
select cron.schedule(
'ash_purge_cron_log',
'0 * * * *',
$$delete from cron.job_run_details where end_time < now() - interval '1 day'$$
);ash.start() will warn about this overhead.
pg_cron is optional. All core functions — ash.take_sample(), ash.rotate(), and all reporting — work without it. When pg_cron is not installed, ash.start('1 second') records the intended interval in ash.config and prints instructions for external scheduling.
You can call ash.take_sample() from any external scheduler:
System cron (1-minute minimum granularity):
# Every minute
* * * * * psql -qAtX -d mydb -c "SET statement_timeout='500ms'; SELECT ash.take_sample();"
# Every second (cron launches a loop each minute)
* * * * * for i in $(seq 1 59); do psql -qAtX -d mydb -c "SET statement_timeout='500ms'; SELECT ash.take_sample();"; sleep 1; doneDedicated loop script (most reliable for 1-second sampling):
#!/bin/bash
# ash_sampler.sh — run via systemd, screen, tmux, or nohup
while true; do
psql -qAtX -d mydb -c "SET statement_timeout='500ms'; SELECT ash.take_sample();" 2>/dev/null
sleep 1
donesystemd timer (Linux, precise 1-second ticking):
# /etc/systemd/system/ash-sampler.service
[Service]
Type=oneshot
ExecStart=psql -qAtX -d mydb -c "SET statement_timeout='500ms'; SELECT ash.take_sample();"
User=postgres
# /etc/systemd/system/ash-sampler.timer
[Timer]
OnActiveSec=0
OnUnitActiveSec=1s
AccuracySec=100ms
[Install]
WantedBy=timers.targetpsql \watch (quick ad-hoc testing):
SELECT ash.take_sample() \watch 1Any language (Python example):
import psycopg2, time
conn = psycopg2.connect("dbname=mydb")
conn.autocommit = True
while True:
with conn.cursor() as cur:
cur.execute("SET statement_timeout='500ms'; SELECT ash.take_sample()")
time.sleep(1)Don't forget to also schedule rotation and rollups:
# System cron: rotate daily at midnight
0 0 * * * psql -qAtX -d mydb -c "SELECT ash.rotate();"
# Rollup: every minute, every hour, daily cleanup
* * * * * psql -qAtX -d mydb -c "SELECT ash.rollup_minute();"
0 * * * * psql -qAtX -d mydb -c "SELECT ash.rollup_hour();"
0 3 * * * psql -qAtX -d mydb -c "SELECT ash.rollup_cleanup();"pg_ash installs with a locked-down privilege model: admin functions (ash.start(), ash.stop(), ash.rotate(), ash.take_sample(), ash.set_debug_logging(), ash.uninstall()) are restricted to the schema owner, and EXECUTE on all reader functions plus SELECT on reader tables (ash.sample, ash.query_map_all, ash.config, ash.wait_event_map, and per-slot partitions) is revoked from PUBLIC. The installing role retains full access.
Grant access to a monitoring or read-only role with the convenience helpers:
-- one call, minimum privileges: USAGE on schema ash, EXECUTE on every
-- public reader function, SELECT on the tables readers depend on
-- (sample + partitions, query_map_all + partitions, config, wait_event_map,
-- rollup_1m, rollup_1h). Idempotent.
create role grafana login password 'xxx';
select ash.grant_reader('grafana');
-- ...later, take it back. Symmetric undo of grant_reader().
select ash.revoke_reader('grafana');Both helpers are owner-only, validate the role exists in pg_roles,
quote the role name, and emit a RAISE NOTICE summarizing what changed.
Note: If you subsequently change the partition count via ash.rebuild_partitions(N, 'yes'), previously-granted reader roles will lose access to the new partition tables. Re-run ash.grant_reader(...) for each monitoring role after any rebuild_partitions call.
Note on pg_cron visibility: ash.grant_reader() does not grant USAGE ON SCHEMA cron, since pg_cron is not an ash object. When pg_cron is loaded but the monitoring role lacks USAGE on schema cron, ash.status() emits a single fallback row of the form cron_jobs = '<no cron.job access; grant USAGE ON SCHEMA cron TO <role>>' instead of per-job cron_job_* rows. To surface real cron job details, either run grant usage on schema cron to <role> (and grant select on cron.job to <role>) once, or simply ignore the row.
If you prefer manual control, the equivalent explicit grants are:
-- allow a monitoring role to call the readers
grant usage on schema ash to my_monitor_role;
grant execute on function ash.top_waits(interval, int, int) to my_monitor_role;
grant execute on function ash.top_queries(interval, int) to my_monitor_role;
grant execute on function ash.top_queries_with_text(interval, int) to my_monitor_role;
grant execute on function ash.samples(interval, int) to my_monitor_role;
grant execute on function ash.status() to my_monitor_role;
-- or grant all reader functions at once
grant execute on all functions in schema ash to my_monitor_role;
-- grant direct read on raw tables for ad-hoc SQL (optional)
grant select on all tables in schema ash to my_monitor_role;pgss reader functions (top_queries, top_queries_at, top_queries_with_text, samples, samples_at, event_queries, event_queries_at) need the pg_stat_statements schema on their search_path. Install detects it automatically. If you install pg_stat_statements after pg_ash, or move it to a non-default schema, re-apply:
-- detect the pgss schema and re-apply search_path on pgss readers
select ash._apply_pgss_search_path();- Primary only — pg_ash requires writes (
INSERTinto sample tables,TRUNCATEon rotation), so it cannot run on physical standbys or read replicas. Install it on the primary; it samples all databases from there. - Observer-effect protection — the sampler pg_cron command includes
SET statement_timeout = '500ms'to preventtake_sample()from becoming a problem on overloaded servers. Ifpg_stat_activityis slow (thousands of backends), the sample is canceled rather than piling up. Normal execution is ~50ms — the 500ms cap gives 10× headroom. Adjust incron.jobif needed. - Sampling gaps under heavy load — pg_cron runs in a single background worker and under heavy load (lock storms, many concurrent sessions) it can't always keep up with the 1-second schedule. You may see gaps of 8s, 13s, or even 30s+ between samples — ironically during the most interesting moments. This is a fundamental pg_cron limitation, not a bug. If precise 1-second sampling matters, use an external sampler which is more reliable under load.
- 24-hour raw sample queries are slow (~6s for full-day scan) — use rollup reader functions (
ash.minute_waits,ash.hourly_queries,ash.daily_peak_backends) for queries over long time ranges. - JIT protection built in — all reader functions use
SET jit = offto prevent JIT compilation overhead (which can be 10-750x slower depending on Postgres version and dataset size). No global configuration needed. - Single-database install — pg_ash installs in one database and samples all databases from there. Per-database filtering works via the
datidcolumn. - query_map hard cap at 50k entries — on Postgres 14-15, volatile SQL comments (e.g.,
marginalia,sqlcommenterwith session IDs or timestamps) produce uniquequery_idvalues that are not normalized. This can flood the query_map partitions. A hard cap of 50,000 entries per partition prevents unbounded growth — queries beyond the cap are tracked as "unknown." PG16+ normalizes comments, so this is rarely hit. Checkquery_map_countinash.status()to monitor. - Parallel query workers counted individually — parallel workers share the same
query_idas the leader but are counted as separate backends. This inflates the apparent "weight" of parallel queries intop_queries().leader_pidgrouping is not yet implemented. - WAL overhead — 1-second sampling generates ~29 KiB WAL per sample (~2.4 GiB/day), dominated by
full_page_writes. This is significant for WAL-sensitive replication setups. Consider 5-second or 10-second sampling intervals (ash.start('5 seconds')) if WAL volume is a concern. The overhead scales linearly with sampling frequency. - Epoch overflow horizon (~2094) —
sample_tsis stored asint4seconds since 2026-01-01 UTC andint4is exhausted around 2094-01-19. Past that point, the::int4cast inash.take_sample()raisesERROR: integer out of rangeand sampling hard-fails (it does NOT silently wrap).ash.status()exposesepoch_seconds_remainingso operators can plan abigintmigration of the column well before the horizon. See issue #37.
pg_ash is part of SAMO — self-driving Postgres.


