Add amcheck corruption checks (c1/c2), rename buffercache to m1#89
Add amcheck corruption checks (c1/c2), rename buffercache to m1#89
Conversation
- c1: B-tree index integrity check (bt_index_check, non-blocking, safe for production) - c2: Full B-tree + heap integrity check (bt_index_parent_check + verify_heapam, takes locks) - Both handle: missing extension, insufficient privileges, PG version differences - bt_index_parent_check(checkunique) on PG14+, rootdescend on PG11+ - verify_heapam with TOAST checking on PG14+ - Rename c1_buffercache.sql -> m1_buffercache.sql (memory group) - Add amcheck extension to CI test matrix - Tested on PG13 and PG17, superuser and non-superuser
| psql -h localhost -U postgres -d test -c 'CREATE EXTENSION IF NOT EXISTS pg_buffercache;' | ||
| psql -h localhost -U postgres -d test -c 'CREATE EXTENSION IF NOT EXISTS amcheck;' | ||
| # amcheck needs execute privileges for non-superusers | ||
| psql -h localhost -U postgres -d test -c 'GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO PUBLIC;' |
There was a problem hiding this comment.
Granting execute to public in CI feels broader than needed and can mask permission issues. Since you already create dba_user, I'd target that role (and move this grant after the user is created).
| psql -h localhost -U postgres -d test -c 'GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO PUBLIC;' | |
| psql -h localhost -U postgres -d test -c 'GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO dba_user;' |
| order by n.nspname, t.relname, c.relname | ||
| loop | ||
| begin | ||
| perform bt_index_check(rec.index_oid); |
There was a problem hiding this comment.
Minor: casting to regclass here avoids any ambiguity around the function signature.
| perform bt_index_check(rec.index_oid); | |
| perform bt_index_check(rec.index_oid::regclass); |
| where c.relkind = 'r' | ||
| and n.nspname not in ('pg_catalog', 'information_schema') | ||
| and c.relpersistence != 't' | ||
| order by n.nspname, c.relname | ||
| loop | ||
| has_errors := false; | ||
| begin | ||
| for corruption in | ||
| select * from verify_heapam(rec.table_oid, check_toast := true) |
There was a problem hiding this comment.
Heap loop currently includes pg_toast tables, and verify_heapam(..., check_toast := true) already traverses TOAST. Excluding the pg_toast schema avoids redundant work/noise. Also, explicit regclass cast keeps the call unambiguous.
| where c.relkind = 'r' | |
| and n.nspname not in ('pg_catalog', 'information_schema') | |
| and c.relpersistence != 't' | |
| order by n.nspname, c.relname | |
| loop | |
| has_errors := false; | |
| begin | |
| for corruption in | |
| select * from verify_heapam(rec.table_oid, check_toast := true) | |
| where c.relkind = 'r' | |
| and n.nspname not in ('pg_catalog', 'information_schema') | |
| and n.nspname !~ '^pg_toast' | |
| and c.relpersistence != 't' | |
| order by n.nspname, c.relname | |
| loop | |
| has_errors := false; | |
| begin | |
| for corruption in | |
| select * from verify_heapam(rec.table_oid::regclass, check_toast := true) |
| order by pg_relation_size(c.oid) asc -- smallest first | ||
| loop | ||
| begin | ||
| if pg_version >= 140000 then | ||
| perform bt_index_parent_check( | ||
| rec.index_oid, | ||
| heapallindexed := true, | ||
| rootdescend := true, | ||
| checkunique := true | ||
| ); | ||
| elsif pg_version >= 110000 then | ||
| perform bt_index_parent_check( | ||
| rec.index_oid, | ||
| heapallindexed := true, | ||
| rootdescend := true | ||
| ); | ||
| else | ||
| perform bt_index_parent_check(rec.index_oid, heapallindexed := true); | ||
| end if; |
There was a problem hiding this comment.
Two small things here: order by pg_relation_size(...) recalculates work you already selected as index_size, and casting to regclass makes the amcheck calls a bit more robust.
| order by pg_relation_size(c.oid) asc -- smallest first | |
| loop | |
| begin | |
| if pg_version >= 140000 then | |
| perform bt_index_parent_check( | |
| rec.index_oid, | |
| heapallindexed := true, | |
| rootdescend := true, | |
| checkunique := true | |
| ); | |
| elsif pg_version >= 110000 then | |
| perform bt_index_parent_check( | |
| rec.index_oid, | |
| heapallindexed := true, | |
| rootdescend := true | |
| ); | |
| else | |
| perform bt_index_parent_check(rec.index_oid, heapallindexed := true); | |
| end if; | |
| order by index_size asc -- smallest first | |
| loop | |
| begin | |
| if pg_version >= 140000 then | |
| perform bt_index_parent_check( | |
| rec.index_oid::regclass, | |
| heapallindexed := true, | |
| rootdescend := true, | |
| checkunique := true | |
| ); | |
| elsif pg_version >= 110000 then | |
| perform bt_index_parent_check( | |
| rec.index_oid::regclass, | |
| heapallindexed := true, | |
| rootdescend := true | |
| ); | |
| else | |
| perform bt_index_parent_check(rec.index_oid::regclass, heapallindexed := true); | |
| end if; |
| where c.relkind = 'r' | ||
| and n.nspname not in ('pg_catalog', 'information_schema') | ||
| and c.relpersistence != 't' | ||
| order by n.nspname, c.relname | ||
| loop | ||
| has_errors := false; | ||
| begin | ||
| for corruption in | ||
| select * from verify_heapam( | ||
| rec.table_oid, |
There was a problem hiding this comment.
Similar to c1: this will also iterate over pg_toast tables directly (while check_toast := true already checks TOAST). Excluding pg_toast reduces redundant work/noise, and regclass cast makes the call unambiguous.
| where c.relkind = 'r' | |
| and n.nspname not in ('pg_catalog', 'information_schema') | |
| and c.relpersistence != 't' | |
| order by n.nspname, c.relname | |
| loop | |
| has_errors := false; | |
| begin | |
| for corruption in | |
| select * from verify_heapam( | |
| rec.table_oid, | |
| where c.relkind = 'r' | |
| and n.nspname not in ('pg_catalog', 'information_schema') | |
| and n.nspname !~ '^pg_toast' | |
| and c.relpersistence != 't' | |
| order by n.nspname, c.relname | |
| loop | |
| has_errors := false; | |
| begin | |
| for corruption in | |
| select * from verify_heapam( | |
| rec.table_oid::regclass, |
| exception | ||
| when insufficient_privilege then | ||
| raise notice '⚠️ Permission denied for %.% — need superuser or amcheck privileges', rec.schema_name, rec.index_name; | ||
| skip_count := skip_count + 1; | ||
| when others then | ||
| raise warning '❌ CORRUPTION in %.% (table %.%, size %): %', | ||
| rec.schema_name, rec.index_name, | ||
| rec.schema_name, rec.table_name, | ||
| pg_size_pretty(rec.index_size), | ||
| sqlerrm; | ||
| err_count := err_count + 1; |
There was a problem hiding this comment.
when others treats any failure as corruption. For c2 (explicitly lock-heavy), it might be nicer to treat lock/cancel/timeout cases as “skipped” so they don’t look like integrity failures.
| exception | |
| when insufficient_privilege then | |
| raise notice '⚠️ Permission denied for %.% — need superuser or amcheck privileges', rec.schema_name, rec.index_name; | |
| skip_count := skip_count + 1; | |
| when others then | |
| raise warning '❌ CORRUPTION in %.% (table %.%, size %): %', | |
| rec.schema_name, rec.index_name, | |
| rec.schema_name, rec.table_name, | |
| pg_size_pretty(rec.index_size), | |
| sqlerrm; | |
| err_count := err_count + 1; | |
| exception | |
| when insufficient_privilege then | |
| raise notice '⚠️ Permission denied for %.% — need superuser or amcheck privileges', rec.schema_name, rec.index_name; | |
| skip_count := skip_count + 1; | |
| when lock_not_available or deadlock_detected or query_canceled then | |
| raise notice '⚠️ Skipping %.% due to lock/cancel: %', rec.schema_name, rec.index_name, sqlerrm; | |
| skip_count := skip_count + 1; | |
| when others then | |
| raise warning '❌ CORRUPTION in %.% (table %.%, size %): %', | |
| rec.schema_name, rec.index_name, | |
| rec.schema_name, rec.table_name, | |
| pg_size_pretty(rec.index_size), | |
| sqlerrm; | |
| err_count := err_count + 1; |
Changes
New reports
bt_index_check()— lightweight, only takes AccessShareLockverify_heapam()for heap corruption detectionbt_index_parent_check(heapallindexed := true)— checks parent-child consistency, sibling pointers, all heap tuples indexedcheckuniqueon PG14+,rootdescendon PG11+verify_heapam()with full TOAST checkingRename
c1_buffercache.sql→m1_buffercache.sql(new "memory" group, freescprefix for corruption)Robustness
CI
Testing