Fix array size allocation for HashAggregate hash keys.

When there were duplicate columns in the hash key list, the array sizes could be miscomputed, resulting in access off the end of the array. Adjust the computation to ensure the array is always large enough. (I considered whether the duplicates could be removed in planning, but I can't rule out the possibility that duplicate columns might have different hash functions assigned. Simpler to just make sure it works at execution time regardless.) Bug apparently introduced in fc4b3dea2 as part of narrowing down the tuples stored in the hashtable. Reported by Colm McHugh of Salesforce, though I didn't use their patch. Backpatch back to version 10 where the bug was introduced. Discussion: https://postgr.es/m/CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
author: Andrew Gierth <rhodiumtoad@postgresql.org> 2019-05-23 15:26:01 +0100
committer: Andrew Gierth <rhodiumtoad@postgresql.org> 2019-05-23 15:26:01 +0100
commit: 44e95b5728a4569c494fa4ea4317f8a2f50a206b (patch)
tree: 3b5097bdb9274baae68c998b111a3314e6415f3e /src/backend/executor/nodeAgg.c
parent: 156c0c2dff403fd115f3a5f6d73ab80959c84129 (diff)
download: postgresql-44e95b5728a4569c494fa4ea4317f8a2f50a206b.tar.gz
postgresql-44e95b5728a4569c494fa4ea4317f8a2f50a206b.zip
1 files changed, 22 insertions, 7 deletions
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 5b4a6029529..6b8ef40599b 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -1312,9 +1312,14 @@ build_hash_table(AggState *aggstate)
  * by themselves, and secondly ctids for row-marks.
  *
  * To eliminate duplicates, we build a bitmapset of the needed columns, and
- * then build an array of the columns included in the hashtable.  Note that
- * the array is preserved over ExecReScanAgg, so we allocate it in the
- * per-query context (unlike the hash table itself).
+ * then build an array of the columns included in the hashtable. We might
+ * still have duplicates if the passed-in grpColIdx has them, which can happen
+ * in edge cases from semijoins/distinct; these can't always be removed,
+ * because it's not certain that the duplicate cols will be using the same
+ * hash function.
+ *
+ * Note that the array is preserved over ExecReScanAgg, so we allocate it in
+ * the per-query context (unlike the hash table itself).
  */
 static void
 find_hash_columns(AggState *aggstate)
@@ -1335,6 +1340,7 @@ find_hash_columns(AggState *aggstate)
 		AttrNumber *grpColIdx = perhash->aggnode->grpColIdx;
 		List	   *hashTlist = NIL;
 		TupleDesc	hashDesc;
+		int			maxCols;
 		int			i;
 
 		perhash->largestGrpColIdx = 0;
@@ -1359,15 +1365,24 @@ find_hash_columns(AggState *aggstate)
 					colnos = bms_del_member(colnos, attnum);
 			}
 		}
-		/* Add in all the grouping columns */
-		for (i = 0; i < perhash->numCols; i++)
-			colnos = bms_add_member(colnos, grpColIdx[i]);
+
+		/*
+		 * Compute maximum number of input columns accounting for possible
+		 * duplications in the grpColIdx array, which can happen in some edge
+		 * cases where HashAggregate was generated as part of a semijoin or a
+		 * DISTINCT.
+		 */
+		maxCols = bms_num_members(colnos) + perhash->numCols;
 
 		perhash->hashGrpColIdxInput =
-			palloc(bms_num_members(colnos) * sizeof(AttrNumber));
+			palloc(maxCols * sizeof(AttrNumber));
 		perhash->hashGrpColIdxHash =
 			palloc(perhash->numCols * sizeof(AttrNumber));
 
+		/* Add all the grouping columns to colnos */
+		for (i = 0; i < perhash->numCols; i++)
+			colnos = bms_add_member(colnos, grpColIdx[i]);
+
 		/*
 		 * First build mapping for columns directly hashed. These are the
 		 * first, because they'll be accessed when computing hash values and
author	Andrew Gierth <rhodiumtoad@postgresql.org>	2019-05-23 15:26:01 +0100
committer	Andrew Gierth <rhodiumtoad@postgresql.org>	2019-05-23 15:26:01 +0100
commit	44e95b5728a4569c494fa4ea4317f8a2f50a206b (patch)
tree	3b5097bdb9274baae68c998b111a3314e6415f3e /src/backend/executor/nodeAgg.c
parent	156c0c2dff403fd115f3a5f6d73ab80959c84129 (diff)
download	postgresql-44e95b5728a4569c494fa4ea4317f8a2f50a206b.tar.gz postgresql-44e95b5728a4569c494fa4ea4317f8a2f50a206b.zip