Skip to content

Comments

Add amcheck corruption checks (c1/c2), rename buffercache to m1#89

Merged
NikolayS merged 1 commit intomasterfrom
rename-c1-to-m1-and-amcheck
Feb 10, 2026
Merged

Add amcheck corruption checks (c1/c2), rename buffercache to m1#89
NikolayS merged 1 commit intomasterfrom
rename-c1-to-m1-and-amcheck

Conversation

@NikolayS
Copy link
Owner

Changes

New reports

  • c1 — B-tree index integrity check (non-blocking, safe for production)
    • Uses bt_index_check() — lightweight, only takes AccessShareLock
    • On PG14+, also runs verify_heapam() for heap corruption detection
  • c2 — Full B-tree + heap integrity check (takes locks — use on standby!)
    • Uses bt_index_parent_check(heapallindexed := true) — checks parent-child consistency, sibling pointers, all heap tuples indexed
    • checkunique on PG14+, rootdescend on PG11+
    • On PG14+, runs verify_heapam() with full TOAST checking

Rename

  • c1_buffercache.sqlm1_buffercache.sql (new "memory" group, frees c prefix for corruption)

Robustness

  • Graceful handling when amcheck extension is not installed
  • Handles insufficient privileges (non-superuser) without false corruption reports
  • Version-conditional: uses appropriate function signatures for PG11-18
  • Skips pg_toast, pg_catalog, temp tables, invalid indexes

CI

  • Added amcheck extension to test matrix

Testing

  • Verified on PG13 (no verify_heapam) and PG17 (full feature set)
  • Tested superuser and non-superuser scenarios
  • Tested missing extension scenario

- c1: B-tree index integrity check (bt_index_check, non-blocking, safe for production)
- c2: Full B-tree + heap integrity check (bt_index_parent_check + verify_heapam, takes locks)
- Both handle: missing extension, insufficient privileges, PG version differences
- bt_index_parent_check(checkunique) on PG14+, rootdescend on PG11+
- verify_heapam with TOAST checking on PG14+
- Rename c1_buffercache.sql -> m1_buffercache.sql (memory group)
- Add amcheck extension to CI test matrix
- Tested on PG13 and PG17, superuser and non-superuser
@NikolayS NikolayS merged commit 124a2bd into master Feb 10, 2026
6 checks passed
psql -h localhost -U postgres -d test -c 'CREATE EXTENSION IF NOT EXISTS pg_buffercache;'
psql -h localhost -U postgres -d test -c 'CREATE EXTENSION IF NOT EXISTS amcheck;'
# amcheck needs execute privileges for non-superusers
psql -h localhost -U postgres -d test -c 'GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO PUBLIC;'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granting execute to public in CI feels broader than needed and can mask permission issues. Since you already create dba_user, I'd target that role (and move this grant after the user is created).

Suggested change
psql -h localhost -U postgres -d test -c 'GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO PUBLIC;'
psql -h localhost -U postgres -d test -c 'GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO dba_user;'

order by n.nspname, t.relname, c.relname
loop
begin
perform bt_index_check(rec.index_oid);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: casting to regclass here avoids any ambiguity around the function signature.

Suggested change
perform bt_index_check(rec.index_oid);
perform bt_index_check(rec.index_oid::regclass);

Comment on lines +90 to +98
where c.relkind = 'r'
and n.nspname not in ('pg_catalog', 'information_schema')
and c.relpersistence != 't'
order by n.nspname, c.relname
loop
has_errors := false;
begin
for corruption in
select * from verify_heapam(rec.table_oid, check_toast := true)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heap loop currently includes pg_toast tables, and verify_heapam(..., check_toast := true) already traverses TOAST. Excluding the pg_toast schema avoids redundant work/noise. Also, explicit regclass cast keeps the call unambiguous.

Suggested change
where c.relkind = 'r'
and n.nspname not in ('pg_catalog', 'information_schema')
and c.relpersistence != 't'
order by n.nspname, c.relname
loop
has_errors := false;
begin
for corruption in
select * from verify_heapam(rec.table_oid, check_toast := true)
where c.relkind = 'r'
and n.nspname not in ('pg_catalog', 'information_schema')
and n.nspname !~ '^pg_toast'
and c.relpersistence != 't'
order by n.nspname, c.relname
loop
has_errors := false;
begin
for corruption in
select * from verify_heapam(rec.table_oid::regclass, check_toast := true)

Comment on lines +53 to +71
order by pg_relation_size(c.oid) asc -- smallest first
loop
begin
if pg_version >= 140000 then
perform bt_index_parent_check(
rec.index_oid,
heapallindexed := true,
rootdescend := true,
checkunique := true
);
elsif pg_version >= 110000 then
perform bt_index_parent_check(
rec.index_oid,
heapallindexed := true,
rootdescend := true
);
else
perform bt_index_parent_check(rec.index_oid, heapallindexed := true);
end if;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small things here: order by pg_relation_size(...) recalculates work you already selected as index_size, and casting to regclass makes the amcheck calls a bit more robust.

Suggested change
order by pg_relation_size(c.oid) asc -- smallest first
loop
begin
if pg_version >= 140000 then
perform bt_index_parent_check(
rec.index_oid,
heapallindexed := true,
rootdescend := true,
checkunique := true
);
elsif pg_version >= 110000 then
perform bt_index_parent_check(
rec.index_oid,
heapallindexed := true,
rootdescend := true
);
else
perform bt_index_parent_check(rec.index_oid, heapallindexed := true);
end if;
order by index_size asc -- smallest first
loop
begin
if pg_version >= 140000 then
perform bt_index_parent_check(
rec.index_oid::regclass,
heapallindexed := true,
rootdescend := true,
checkunique := true
);
elsif pg_version >= 110000 then
perform bt_index_parent_check(
rec.index_oid::regclass,
heapallindexed := true,
rootdescend := true
);
else
perform bt_index_parent_check(rec.index_oid::regclass, heapallindexed := true);
end if;

Comment on lines +110 to +119
where c.relkind = 'r'
and n.nspname not in ('pg_catalog', 'information_schema')
and c.relpersistence != 't'
order by n.nspname, c.relname
loop
has_errors := false;
begin
for corruption in
select * from verify_heapam(
rec.table_oid,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to c1: this will also iterate over pg_toast tables directly (while check_toast := true already checks TOAST). Excluding pg_toast reduces redundant work/noise, and regclass cast makes the call unambiguous.

Suggested change
where c.relkind = 'r'
and n.nspname not in ('pg_catalog', 'information_schema')
and c.relpersistence != 't'
order by n.nspname, c.relname
loop
has_errors := false;
begin
for corruption in
select * from verify_heapam(
rec.table_oid,
where c.relkind = 'r'
and n.nspname not in ('pg_catalog', 'information_schema')
and n.nspname !~ '^pg_toast'
and c.relpersistence != 't'
order by n.nspname, c.relname
loop
has_errors := false;
begin
for corruption in
select * from verify_heapam(
rec.table_oid::regclass,

Comment on lines +73 to +83
exception
when insufficient_privilege then
raise notice '⚠️ Permission denied for %.% — need superuser or amcheck privileges', rec.schema_name, rec.index_name;
skip_count := skip_count + 1;
when others then
raise warning '❌ CORRUPTION in %.% (table %.%, size %): %',
rec.schema_name, rec.index_name,
rec.schema_name, rec.table_name,
pg_size_pretty(rec.index_size),
sqlerrm;
err_count := err_count + 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when others treats any failure as corruption. For c2 (explicitly lock-heavy), it might be nicer to treat lock/cancel/timeout cases as “skipped” so they don’t look like integrity failures.

Suggested change
exception
when insufficient_privilege then
raise notice '⚠️ Permission denied for %.% — need superuser or amcheck privileges', rec.schema_name, rec.index_name;
skip_count := skip_count + 1;
when others then
raise warning '❌ CORRUPTION in %.% (table %.%, size %): %',
rec.schema_name, rec.index_name,
rec.schema_name, rec.table_name,
pg_size_pretty(rec.index_size),
sqlerrm;
err_count := err_count + 1;
exception
when insufficient_privilege then
raise notice '⚠️ Permission denied for %.% — need superuser or amcheck privileges', rec.schema_name, rec.index_name;
skip_count := skip_count + 1;
when lock_not_available or deadlock_detected or query_canceled then
raise notice '⚠️ Skipping %.% due to lock/cancel: %', rec.schema_name, rec.index_name, sqlerrm;
skip_count := skip_count + 1;
when others then
raise warning '❌ CORRUPTION in %.% (table %.%, size %): %',
rec.schema_name, rec.index_name,
rec.schema_name, rec.table_name,
pg_size_pretty(rec.index_size),
sqlerrm;
err_count := err_count + 1;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant