aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPeter Geoghegan <pg@bowt.ie>2020-03-24 14:58:27 -0700
committerPeter Geoghegan <pg@bowt.ie>2020-03-24 14:58:27 -0700
commitb150a76793109b00ea43c26e0006b3cad9030096 (patch)
tree17b5d8557604d9c7e1f8dcffa57739fcb6cc3f3f
parent112b006fe79b6fe84647df84a07c201e5a7b1635 (diff)
downloadpostgresql-b150a76793109b00ea43c26e0006b3cad9030096.tar.gz
postgresql-b150a76793109b00ea43c26e0006b3cad9030096.zip
Fix nbtree deduplication README commentary.
Descriptions of some aspects of how deduplication works were unclear in a couple of places.
-rw-r--r--src/backend/access/nbtree/README45
1 files changed, 24 insertions, 21 deletions
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index 6499f5adb79..2d0f8f4b79a 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -780,20 +780,21 @@ order. Delaying deduplication minimizes page level fragmentation.
Deduplication in unique indexes
-------------------------------
-Very often, the range of values that can be placed on a given leaf page in
-a unique index is fixed and permanent. For example, a primary key on an
-identity column will usually only have page splits caused by the insertion
-of new logical rows within the rightmost leaf page. If there is a split
-of a non-rightmost leaf page, then the split must have been triggered by
-inserts associated with an UPDATE of an existing logical row. Splitting a
-leaf page purely to store multiple versions should be considered
-pathological, since it permanently degrades the index structure in order
-to absorb a temporary burst of duplicates. Deduplication in unique
-indexes helps to prevent these pathological page splits. Storing
-duplicates in a space efficient manner is not the goal, since in the long
-run there won't be any duplicates anyway. Rather, we're buying time for
-standard garbage collection mechanisms to run before a page split is
-needed.
+Very often, the number of distinct values that can ever be placed on
+almost any given leaf page in a unique index is fixed and permanent. For
+example, a primary key on an identity column will usually only have leaf
+page splits caused by the insertion of new logical rows within the
+rightmost leaf page. If there is a split of a non-rightmost leaf page,
+then the split must have been triggered by inserts associated with UPDATEs
+of existing logical rows. Splitting a leaf page purely to store multiple
+versions is a false economy. In effect, we're permanently degrading the
+index structure just to absorb a temporary burst of duplicates.
+
+Deduplication in unique indexes helps to prevent these pathological page
+splits. Storing duplicates in a space efficient manner is not the goal,
+since in the long run there won't be any duplicates anyway. Rather, we're
+buying time for standard garbage collection mechanisms to run before a
+page split is needed.
Unique index leaf pages only get a deduplication pass when an insertion
(that might have to split the page) observed an existing duplicate on the
@@ -838,13 +839,15 @@ list splits.
Only a few isolated extra steps are required to preserve the illusion that
the new item never overlapped with an existing posting list in the first
-place: the heap TID of the incoming tuple is swapped with the rightmost/max
-heap TID from the existing/originally overlapping posting list. Also, the
-posting-split-with-page-split case must generate a new high key based on
-an imaginary version of the original page that has both the final new item
-and the after-list-split posting tuple (page splits usually just operate
-against an imaginary version that contains the new item/item that won't
-fit).
+place: the heap TID of the incoming tuple has its TID replaced with the
+rightmost/max heap TID from the existing/originally overlapping posting
+list. Similarly, the original incoming item's TID is relocated to the
+appropriate offset in the posting list (we usually shift TIDs out of the
+way to make a hole for it). Finally, the posting-split-with-page-split
+case must generate a new high key based on an imaginary version of the
+original page that has both the final new item and the after-list-split
+posting tuple (page splits usually just operate against an imaginary
+version that contains the new item/item that won't fit).
This approach avoids inventing an "eager" atomic posting split operation
that splits the posting list without simultaneously finishing the insert