Make the planner assume that the entries in a VALUES list are distinct.

Previously, if we had to estimate the number of distinct values in a VALUES column, we fell back on the default behavior used whenever we lack statistics, which effectively is that there are Min(# of entries, 200) distinct values. This can be very badly off with a large VALUES list, as noted by Jeff Janes. We could consider actually running an ANALYZE-like scan on the VALUES, but that seems unduly expensive, and anyway it could not deliver reliable info if the entries are not all constants. What seems like a better choice is to assume that the values are all distinct. This will sometimes be just as wrong as the old code, but it seems more likely to be more nearly right in many common cases. Also, it is more consistent with what happens in some related cases, for example WHERE x = ANY(ARRAY[1,2,3,...,n]) and WHERE x = ANY(VALUES (1),(2),(3),...,(n)) now are estimated similarly. This was discussed some time ago, but consensus was it'd be better to slip it in at the start of a development cycle not near the end. (It should've gone into v10, really, but I forgot about it.) Discussion: https://postgr.es/m/CAMkU=1xHkyPa8VQgGcCNg3RMFFvVxUdOpus1gKcFuvVi0w6Acg@mail.gmail.com
author: Tom Lane <tgl@sss.pgh.pa.us> 2017-08-16 15:37:14 -0400
committer: Tom Lane <tgl@sss.pgh.pa.us> 2017-08-16 15:37:20 -0400
commit: 2b74303637edc09cf692fbfab3fd93a5e47ccabf (patch)
tree: 8d176b6bbc4c3ea149ab15d21a4f3c832ff1ace1 /src/backend/utils/adt/selfuncs.c
parent: ac883ac453e9c479f397780918f235c440b7a02f (diff)
download: postgresql-2b74303637edc09cf692fbfab3fd93a5e47ccabf.tar.gz
postgresql-2b74303637edc09cf692fbfab3fd93a5e47ccabf.zip
1 files changed, 11 insertions, 0 deletions
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index a7a06146a06..23e5526a8e1 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5009,6 +5009,17 @@ get_variable_numdistinct(VariableStatData *vardata, bool *isdefault)
 		 */
 		stadistinct = 2.0;
 	}
+	else if (vardata->rel && vardata->rel->rtekind == RTE_VALUES)
+	{
+		/*
+		 * If the Var represents a column of a VALUES RTE, assume it's unique.
+		 * This could of course be very wrong, but it should tend to be true
+		 * in well-written queries.  We could consider examining the VALUES'
+		 * contents to get some real statistics; but that only works if the
+		 * entries are all constants, and it would be pretty expensive anyway.
+		 */
+		stadistinct = -1.0;		/* unique (and all non null) */
+	}
 	else
 	{
 		/*
author	Tom Lane <tgl@sss.pgh.pa.us>	2017-08-16 15:37:14 -0400
committer	Tom Lane <tgl@sss.pgh.pa.us>	2017-08-16 15:37:20 -0400
commit	2b74303637edc09cf692fbfab3fd93a5e47ccabf (patch)
tree	8d176b6bbc4c3ea149ab15d21a4f3c832ff1ace1 /src/backend/utils/adt/selfuncs.c
parent	ac883ac453e9c479f397780918f235c440b7a02f (diff)
download	postgresql-2b74303637edc09cf692fbfab3fd93a5e47ccabf.tar.gz postgresql-2b74303637edc09cf692fbfab3fd93a5e47ccabf.zip