Improve speed of hash index build.

In the initial data sort, if the bucket numbers are the same then next sort on the hash value. Because index pages are kept in hash value order, this gains a little speed by allowing the eventual tuple insertions to be done sequentially, avoiding repeated data movement within PageAddItem. This seems to be good for overall speedup of 5%-9%, depending on the incoming data. Simon Riggs, reviewed by Amit Kapila Discussion: https://postgr.es/m/CANbhV-FG-1ZNMBuwhUF7AxxJz3u5137dYL-o6hchK1V_dMw86g@mail.gmail.com
author: Tom Lane <tgl@sss.pgh.pa.us> 2022-07-28 14:34:32 -0400
committer: Tom Lane <tgl@sss.pgh.pa.us> 2022-07-28 14:34:32 -0400
commit: e09d7a1262c659578065eaf7edafe606d2c8ebf2 (patch)
tree: a20bedfb3536af659716e32d756a9103954ab732 /src/backend/utils
parent: 70a437aa45b6dcacc2ad894f95ef5bb46b26035f (diff)
download: postgresql-e09d7a1262c659578065eaf7edafe606d2c8ebf2.tar.gz
postgresql-e09d7a1262c659578065eaf7edafe606d2c8ebf2.zip
1 files changed, 17 insertions, 2 deletions
diff --git a/src/backend/utils/sort/tuplesortvariants.c b/src/backend/utils/sort/tuplesortvariants.c
index 2933020dcc8..7ad4429ad3d 100644
--- a/src/backend/utils/sort/tuplesortvariants.c
+++ b/src/backend/utils/sort/tuplesortvariants.c
@@ -1387,14 +1387,17 @@ comparetup_index_hash(const SortTuple *a, const SortTuple *b,
 {
 	Bucket		bucket1;
 	Bucket		bucket2;
+	uint32		hash1;
+	uint32		hash2;
 	IndexTuple	tuple1;
 	IndexTuple	tuple2;
 	TuplesortPublic *base = TuplesortstateGetPublic(state);
 	TuplesortIndexHashArg *arg = (TuplesortIndexHashArg *) base->arg;
 
 	/*
-	 * Fetch hash keys and mask off bits we don't want to sort by. We know
-	 * that the first column of the index tuple is the hash key.
+	 * Fetch hash keys and mask off bits we don't want to sort by, so that the
+	 * initial sort is just on the bucket number.  We know that the first
+	 * column of the index tuple is the hash key.
 	 */
 	Assert(!a->isnull1);
 	bucket1 = _hash_hashkey2bucket(DatumGetUInt32(a->datum1),
@@ -1410,6 +1413,18 @@ comparetup_index_hash(const SortTuple *a, const SortTuple *b,
 		return -1;
 
 	/*
+	 * If bucket values are equal, sort by hash values.  This allows us to
+	 * insert directly onto bucket/overflow pages, where the index tuples are
+	 * stored in hash order to allow fast binary search within each page.
+	 */
+	hash1 = DatumGetUInt32(a->datum1);
+	hash2 = DatumGetUInt32(b->datum1);
+	if (hash1 > hash2)
+		return 1;
+	else if (hash1 < hash2)
+		return -1;
+
+	/*
 	 * If hash values are equal, we sort on ItemPointer.  This does not affect
 	 * validity of the finished index, but it may be useful to have index
 	 * scans in physical order.
author	Tom Lane <tgl@sss.pgh.pa.us>	2022-07-28 14:34:32 -0400
committer	Tom Lane <tgl@sss.pgh.pa.us>	2022-07-28 14:34:32 -0400
commit	e09d7a1262c659578065eaf7edafe606d2c8ebf2 (patch)
tree	a20bedfb3536af659716e32d756a9103954ab732 /src/backend/utils
parent	70a437aa45b6dcacc2ad894f95ef5bb46b26035f (diff)
download	postgresql-e09d7a1262c659578065eaf7edafe606d2c8ebf2.tar.gz postgresql-e09d7a1262c659578065eaf7edafe606d2c8ebf2.zip