Skip to content

Commit b142c27

Browse files
committed
Enable HOT updates for expression and partial indexes
Currently, PostgreSQL conservatively prevents HOT (Heap-Only Tuple) updates whenever any indexed column changes, even if the indexed portion of that column remains identical. This is overly restrictive for expression indexes (where f(column) might not change even when column changes) and partial indexes (where both old and new tuples might fall outside the predicate). Finally, index AMs play no role in deciding when they need a new index entry on update, the rules regarding that are based on binary equality and the HEAP's model for MVCC and related HOT optimization. Here we open that door a bit so as to enable more nuanced control over the process. This enables index AMs that require binary equality (as is the case for nbtree) to do that without disallowing type-specific equality checking for other indexes. This patch introduces several improvements to enable HOT updates in these cases: Add amcomparedatums() callback to IndexAmRoutine. This allows index access methods like GIN to provide custom logic for comparing datums by extracting and comparing index keys rather than comparing the raw datums. GIN indexes now implement gincomparedatums() which extracts keys from both datums and compares the resulting key sets. Also, as mentioned earlier nbtree implements this API and uses datumIsEqual() for equality so that the manner in which it deduplicates TIDs on page split doesn't have to change. This is not a required API, when not implemented the executor will compare TupleTableSlot datum for equality using type-specific operators and take into account collation so that an update from "Apple" to "APPLE" on a case insensitive index can now be HOT. ExecWhichIndexesRequireUpdates() is re-written to find the set of modified indexed attributes that trigger new index tuples on updated. For partial indexes, this checks whether both old and new tuples satisfy or fail the predicate. For expression indexes, this uses type-specific equality operators to compare computed values. For extraction-based indexes (GIN/RUM) that implement amcomparedatums() it uses that. Importantly, table access methods can still signal using TU_Update if all, none, or only summarizing indexes should be updated. While the executor layer now owns determining what has changed due to an update and is interested in only updating the minimum number of indexes possible, the table AM can override that while performing table_tuple_update(), which is what heap does. While this signal is very specific to how the heap implements MVCC and its HOT optimization, we'll leave replacing that for another day. This optimization trades off some new overhead for the potential for more updates to use the HOT optimized path and avoid index and heap bloat. This should significantly improve update performance for tables with expression indexes, partial indexes, and GIN/GiST indexes on complex data types like JSONB and tsvector, while maintaining correct index semantics. Minimal additional overhead due to type-specific equality checking should be washed out by the benefits of updating indexes fewer times. One notable trade-off is that there are more calls to FormIndexDatum() as a result. Caching these might reduce some of that overhead, but not all. This lead to the change in the frequency for expressions in the spec update test to output notice messages, but does not impact correctness.
1 parent b9ea61b commit b142c27

27 files changed

Lines changed: 4015 additions & 125 deletions

File tree

src/backend/access/brin/brin.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,7 @@ brinhandler(PG_FUNCTION_ARGS)
290290
amroutine->amproperty = NULL;
291291
amroutine->ambuildphasename = NULL;
292292
amroutine->amvalidate = brinvalidate;
293+
amroutine->amcomparedatums = NULL;
293294
amroutine->amadjustmembers = NULL;
294295
amroutine->ambeginscan = brinbeginscan;
295296
amroutine->amrescan = brinrescan;

src/backend/access/gin/ginutil.c

Lines changed: 85 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
#include "storage/indexfsm.h"
2727
#include "utils/builtins.h"
2828
#include "utils/index_selfuncs.h"
29+
#include "utils/memutils.h"
2930
#include "utils/rel.h"
3031
#include "utils/typcache.h"
3132

@@ -78,6 +79,7 @@ ginhandler(PG_FUNCTION_ARGS)
7879
amroutine->amproperty = NULL;
7980
amroutine->ambuildphasename = ginbuildphasename;
8081
amroutine->amvalidate = ginvalidate;
82+
amroutine->amcomparedatums = gincomparedatums;
8183
amroutine->amadjustmembers = ginadjustmembers;
8284
amroutine->ambeginscan = ginbeginscan;
8385
amroutine->amrescan = ginrescan;
@@ -477,13 +479,6 @@ cmpEntries(const void *a, const void *b, void *arg)
477479
return res;
478480
}
479481

480-
481-
/*
482-
* Extract the index key values from an indexable item
483-
*
484-
* The resulting key values are sorted, and any duplicates are removed.
485-
* This avoids generating redundant index entries.
486-
*/
487482
Datum *
488483
ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
489484
Datum value, bool isNull,
@@ -729,3 +724,86 @@ ginbuildphasename(int64 phasenum)
729724
return NULL;
730725
}
731726
}
727+
728+
/*
729+
* gincomparedatums - Compare datums to determine if they produce identical keys
730+
*
731+
* This function extracts keys from both old_datum and new_datum using the
732+
* opclass's extractValue function, then compares the extracted key arrays.
733+
* Returns true if the key sets are identical (same keys, same counts).
734+
*
735+
* This enables HOT updates for GIN indexes when the indexed portions of a
736+
* value haven't changed, even if the value itself has changed.
737+
*
738+
* Example: JSONB column with GIN index. If an update changes a non-indexed
739+
* key in the JSONB document, the extracted keys are identical and we can
740+
* do a HOT update.
741+
*/
742+
bool
743+
gincomparedatums(Relation index, int attnum,
744+
Datum old_datum, bool old_isnull,
745+
Datum new_datum, bool new_isnull)
746+
{
747+
GinState ginstate;
748+
Datum *old_keys;
749+
Datum *new_keys;
750+
GinNullCategory *old_categories;
751+
GinNullCategory *new_categories;
752+
int32 old_nkeys;
753+
int32 new_nkeys;
754+
MemoryContext tmpcontext;
755+
MemoryContext oldcontext;
756+
bool result = true;
757+
758+
/* Handle NULL cases */
759+
if (old_isnull != new_isnull)
760+
return false;
761+
if (old_isnull)
762+
return true;
763+
764+
/* Create temporary context for extraction work */
765+
tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
766+
"GIN datum comparison",
767+
ALLOCSET_DEFAULT_SIZES);
768+
oldcontext = MemoryContextSwitchTo(tmpcontext);
769+
770+
initGinState(&ginstate, index);
771+
772+
/* Extract keys from both datums using existing GIN infrastructure */
773+
old_keys = ginExtractEntries(&ginstate, attnum, old_datum, old_isnull,
774+
&old_nkeys, &old_categories);
775+
new_keys = ginExtractEntries(&ginstate, attnum, new_datum, new_isnull,
776+
&new_nkeys, &new_categories);
777+
778+
/* Different number of keys, definitely different */
779+
if (old_nkeys != new_nkeys)
780+
{
781+
result = false;
782+
goto cleanup;
783+
}
784+
785+
/*
786+
* Compare the sorted key arrays element-by-element. Since both arrays are
787+
* already sorted by ginExtractEntries, we can do a simple O(n)
788+
* comparison.
789+
*/
790+
for (int i = 0; i < old_nkeys; i++)
791+
{
792+
int cmp = ginCompareEntries(&ginstate, attnum,
793+
old_keys[i], old_categories[i],
794+
new_keys[i], new_categories[i]);
795+
796+
if (cmp != 0)
797+
{
798+
result = false;
799+
break;
800+
}
801+
}
802+
803+
cleanup:
804+
/* Clean up */
805+
MemoryContextSwitchTo(oldcontext);
806+
MemoryContextDelete(tmpcontext);
807+
808+
return result;
809+
}

src/backend/access/hash/hash.c

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,10 @@ static void hashbuildCallback(Relation index,
5050
void *state);
5151

5252

53+
static bool hashcomparedatums(Relation index, int attnum,
54+
Datum old_datum, bool old_isnull,
55+
Datum new_datum, bool new_isnull);
56+
5357
/*
5458
* Hash handler function: return IndexAmRoutine with access method parameters
5559
* and callbacks.
@@ -98,6 +102,7 @@ hashhandler(PG_FUNCTION_ARGS)
98102
amroutine->amproperty = NULL;
99103
amroutine->ambuildphasename = NULL;
100104
amroutine->amvalidate = hashvalidate;
105+
amroutine->amcomparedatums = hashcomparedatums;
101106
amroutine->amadjustmembers = hashadjustmembers;
102107
amroutine->ambeginscan = hashbeginscan;
103108
amroutine->amrescan = hashrescan;
@@ -944,3 +949,42 @@ hashtranslatecmptype(CompareType cmptype, Oid opfamily)
944949
return HTEqualStrategyNumber;
945950
return InvalidStrategy;
946951
}
952+
953+
/*
954+
* hashcomparedatums - Compare datums to determine if they produce identical keys
955+
*
956+
* Returns true if the hash values are identical (index doesn't need update).
957+
*/
958+
bool
959+
hashcomparedatums(Relation index, int attnum,
960+
Datum old_datum, bool old_isnull,
961+
Datum new_datum, bool new_isnull)
962+
{
963+
uint32 old_hashkey;
964+
uint32 new_hashkey;
965+
966+
/* If both are NULL, they're equal */
967+
if (old_isnull && new_isnull)
968+
return true;
969+
970+
/* If NULL status differs, they're not equal */
971+
if (old_isnull != new_isnull)
972+
return false;
973+
974+
/*
975+
* _hash_datum2hashkey() is used because we know this can't be a cross
976+
* type comparison.
977+
*/
978+
old_hashkey = _hash_datum2hashkey(index, old_datum);
979+
new_hashkey = _hash_datum2hashkey(index, new_datum);
980+
981+
/*
982+
* If hash keys are identical, the index entry would be the same. Return
983+
* true to indicate no index update needed.
984+
*
985+
* Note: Hash collisions are rare but possible. If hash(x) == hash(y) but
986+
* x != y, the hash index still treats them identically, so we correctly
987+
* return true.
988+
*/
989+
return (old_hashkey == new_hashkey);
990+
}

src/backend/access/heap/heapam.c

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3268,7 +3268,7 @@ heap_update(Relation relation, HeapTupleData *oldtup, HeapTuple newtup,
32683268
TM_FailureData *tmfd, LockTupleMode *lockmode,
32693269
Buffer buffer, Page page, BlockNumber block, ItemId lp,
32703270
Bitmapset *hot_attrs, Bitmapset *sum_attrs, Bitmapset *pk_attrs,
3271-
Bitmapset *rid_attrs, Bitmapset *mix_attrs, Buffer *vmbuffer,
3271+
Bitmapset *rid_attrs, const Bitmapset *mix_attrs, Buffer *vmbuffer,
32723272
bool rep_id_key_required, TU_UpdateIndexes *update_indexes)
32733273
{
32743274
TM_Result result;
@@ -4337,8 +4337,9 @@ HeapDetermineColumnsInfo(Relation relation,
43374337
* This routine may be used to update a tuple when concurrent updates of the
43384338
* target tuple are not expected (for example, because we have a lock on the
43394339
* relation associated with the tuple). Any failure is reported via ereport().
4340+
* Returns the set of modified indexed attributes.
43404341
*/
4341-
void
4342+
Bitmapset *
43424343
simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tuple,
43434344
TU_UpdateIndexes *update_indexes)
43444345
{
@@ -4467,7 +4468,7 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
44674468

44684469
elog(ERROR, "tuple concurrently deleted");
44694470

4470-
return;
4471+
return NULL;
44714472
}
44724473

44734474
/*
@@ -4500,7 +4501,6 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
45004501
bms_free(sum_attrs);
45014502
bms_free(pk_attrs);
45024503
bms_free(rid_attrs);
4503-
bms_free(mix_attrs);
45044504
bms_free(idx_attrs);
45054505

45064506
switch (result)
@@ -4526,6 +4526,8 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
45264526
elog(ERROR, "unrecognized heap_update status: %u", result);
45274527
break;
45284528
}
4529+
4530+
return mix_attrs;
45294531
}
45304532

45314533

src/backend/access/heap/heapam_handler.c

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -319,7 +319,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
319319
Snapshot crosscheck, bool wait,
320320
TM_FailureData *tmfd,
321321
LockTupleMode *lockmode,
322-
Bitmapset *mix_attrs,
322+
const Bitmapset *mix_attrs,
323323
TU_UpdateIndexes *update_indexes)
324324
{
325325
bool rep_id_key_required = false;
@@ -407,10 +407,6 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
407407

408408
Assert(ItemIdIsNormal(lp));
409409

410-
/*
411-
* Partially construct the oldtup for HeapDetermineColumnsInfo to work and
412-
* then pass that on to heap_update.
413-
*/
414410
oldtup.t_tableOid = RelationGetRelid(relation);
415411
oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
416412
oldtup.t_len = ItemIdGetLength(lp);

src/backend/access/nbtree/nbtree.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,7 @@ bthandler(PG_FUNCTION_ARGS)
155155
amroutine->amproperty = btproperty;
156156
amroutine->ambuildphasename = btbuildphasename;
157157
amroutine->amvalidate = btvalidate;
158+
amroutine->amcomparedatums = NULL;
158159
amroutine->amadjustmembers = btadjustmembers;
159160
amroutine->ambeginscan = btbeginscan;
160161
amroutine->amrescan = btrescan;

src/backend/access/table/tableam.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -336,7 +336,7 @@ void
336336
simple_table_tuple_update(Relation rel, ItemPointer otid,
337337
TupleTableSlot *slot,
338338
Snapshot snapshot,
339-
Bitmapset *modified_indexed_cols,
339+
const Bitmapset *mix_attrs,
340340
TU_UpdateIndexes *update_indexes)
341341
{
342342
TM_Result result;
@@ -348,7 +348,7 @@ simple_table_tuple_update(Relation rel, ItemPointer otid,
348348
snapshot, InvalidSnapshot,
349349
true /* wait for commit */ ,
350350
&tmfd, &lockmode,
351-
modified_indexed_cols,
351+
mix_attrs,
352352
update_indexes);
353353

354354
switch (result)

src/backend/bootstrap/bootstrap.c

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -961,10 +961,18 @@ index_register(Oid heap,
961961
newind->il_info->ii_Expressions =
962962
copyObject(indexInfo->ii_Expressions);
963963
newind->il_info->ii_ExpressionsState = NIL;
964+
/* expression attrs will likely be null, but may as well copy it */
965+
newind->il_info->ii_ExpressionsAttrs =
966+
copyObject(indexInfo->ii_ExpressionsAttrs);
964967
/* predicate will likely be null, but may as well copy it */
965968
newind->il_info->ii_Predicate =
966969
copyObject(indexInfo->ii_Predicate);
967970
newind->il_info->ii_PredicateState = NULL;
971+
/* predicate attrs will likely be null, but may as well copy it */
972+
newind->il_info->ii_PredicateAttrs =
973+
copyObject(indexInfo->ii_PredicateAttrs);
974+
newind->il_info->ii_CheckedPredicate = false;
975+
newind->il_info->ii_PredicateSatisfied = false;
968976
/* no exclusion constraints at bootstrap time, so no need to copy */
969977
Assert(indexInfo->ii_ExclusionOps == NULL);
970978
Assert(indexInfo->ii_ExclusionProcs == NULL);

src/backend/catalog/index.c

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
#include "access/heapam.h"
2828
#include "access/multixact.h"
2929
#include "access/relscan.h"
30+
#include "access/sysattr.h"
3031
#include "access/tableam.h"
3132
#include "access/toast_compression.h"
3233
#include "access/transam.h"
@@ -58,6 +59,7 @@
5859
#include "commands/trigger.h"
5960
#include "executor/executor.h"
6061
#include "miscadmin.h"
62+
#include "nodes/execnodes.h"
6163
#include "nodes/makefuncs.h"
6264
#include "nodes/nodeFuncs.h"
6365
#include "optimizer/optimizer.h"
@@ -2414,6 +2416,58 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
24142416
* ----------------------------------------------------------------
24152417
*/
24162418

2419+
/* ----------------
2420+
* BuildUpdateIndexInfo
2421+
*
2422+
* For expression indexes updates may not change the indexed value allowing
2423+
* for a HOT update. Add information to the IndexInfo to allow for checking
2424+
* if the indexed value has changed.
2425+
*
2426+
* Do this processing here rather than in BuildIndexInfo() to not incur the
2427+
* overhead in the common non-expression cases.
2428+
* ----------------
2429+
*/
2430+
void
2431+
BuildUpdateIndexInfo(ResultRelInfo *resultRelInfo)
2432+
{
2433+
for (int j = 0; j < resultRelInfo->ri_NumIndices; j++)
2434+
{
2435+
int i;
2436+
int indnatts;
2437+
Bitmapset *attrs = NULL;
2438+
IndexInfo *ii = resultRelInfo->ri_IndexRelationInfo[j];
2439+
2440+
indnatts = ii->ii_NumIndexAttrs;
2441+
2442+
/* Collect key attributes used by the index, key and including */
2443+
for (i = 0; i < indnatts; i++)
2444+
{
2445+
AttrNumber attnum = ii->ii_IndexAttrNumbers[i];
2446+
2447+
if (attnum != 0)
2448+
attrs = bms_add_member(attrs, attnum - FirstLowInvalidHeapAttributeNumber);
2449+
}
2450+
2451+
/* Collect attributes used in the expression */
2452+
if (ii->ii_Expressions)
2453+
pull_varattnos((Node *) ii->ii_Expressions,
2454+
resultRelInfo->ri_RangeTableIndex,
2455+
&ii->ii_ExpressionsAttrs);
2456+
2457+
/* Collect attributes used in the predicate */
2458+
if (ii->ii_Predicate)
2459+
pull_varattnos((Node *) ii->ii_Predicate,
2460+
resultRelInfo->ri_RangeTableIndex,
2461+
&ii->ii_PredicateAttrs);
2462+
2463+
/* Combine key, including, and expression attributes, but not predicate */
2464+
ii->ii_IndexedAttrs = bms_union(attrs, ii->ii_ExpressionsAttrs);
2465+
2466+
/* All indexes should index *something*! */
2467+
Assert(!bms_is_empty(ii->ii_IndexedAttrs));
2468+
}
2469+
}
2470+
24172471
/* ----------------
24182472
* BuildIndexInfo
24192473
* Construct an IndexInfo record for an open index

0 commit comments

Comments
 (0)