Distinguish selectivity of < from <= and > from >=.

Historically, the selectivity functions have simply not distinguished < from <=, or > from >=, arguing that the fraction of the population that satisfies the "=" aspect can be considered to be vanishingly small, if the comparison value isn't any of the most-common-values for the variable. (If it is, the code path that executes the operator against each MCV will take care of things properly.) But that isn't really true unless we're dealing with a continuum of variable values, and in practice we seldom are. If "x = const" would estimate a nonzero number of rows for a given const value, then it follows that we ought to estimate different numbers of rows for "x < const" and "x <= const", even if the const is not one of the MCVs. Handling this more honestly makes a significant difference in edge cases, such as the estimate for a tight range (x BETWEEN y AND z where y and z are close together). Hence, split scalarltsel into scalarltsel/scalarlesel, and similarly split scalargtsel into scalargtsel/scalargesel. Adjust <= and >= operator definitions to reference the new selectivity functions. Improve the core ineq_histogram_selectivity() function to make a correction for equality. (Along the way, I learned quite a bit about exactly why that function gives good answers, which I tried to memorialize in improved comments.) The corresponding join selectivity functions were, and remain, just stubs. But I chose to split them similarly, to avoid confusion and to prevent the need for doing this exercise again if someone ever makes them less stubby. In passing, change ineq_histogram_selectivity's clamp for extreme probability estimates so that it varies depending on the histogram size, instead of being hardwired at 0.0001. With the default histogram size of 100 entries, you still get the old clamp value, but bigger histograms should allow us to put more faith in edge values. Tom Lane, reviewed by Aleksander Alekseev and Kuntal Ghosh Discussion: https://postgr.es/m/12232.1499140410@sss.pgh.pa.us
author: Tom Lane <tgl@sss.pgh.pa.us> 2017-09-13 11:12:39 -0400
committer: Tom Lane <tgl@sss.pgh.pa.us> 2017-09-13 11:12:39 -0400
commit: 7d08ce286cd5854d58152e428c28636a616bdc42 (patch)
tree: 2e4f6f2ce25df95b86a1becf7a09935334ce5d90 /src/backend/optimizer/path/clausesel.c
parent: 089880ba9af5f95e1a3b050874a90dbe5c33fd61 (diff)
download: postgresql-7d08ce286cd5854d58152e428c28636a616bdc42.tar.gz
postgresql-7d08ce286cd5854d58152e428c28636a616bdc42.zip
1 files changed, 8 insertions, 6 deletions
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 9d340255c36..b4cbc34ef1d 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -71,7 +71,7 @@ static RelOptInfo *find_single_rel_for_clauses(PlannerInfo *root,
  *
  * We also recognize "range queries", such as "x > 34 AND x < 42".  Clauses
  * are recognized as possible range query components if they are restriction
- * opclauses whose operators have scalarltsel() or scalargtsel() as their
+ * opclauses whose operators have scalarltsel or a related function as their
  * restriction selectivity estimator.  We pair up clauses of this form that
  * refer to the same variable.  An unpairable clause of this kind is simply
  * multiplied into the selectivity product in the normal way.  But when we
@@ -92,8 +92,8 @@ static RelOptInfo *find_single_rel_for_clauses(PlannerInfo *root,
  * A free side-effect is that we can recognize redundant inequalities such
  * as "x < 4 AND x < 5"; only the tighter constraint will be counted.
  *
- * Of course this is all very dependent on the behavior of
- * scalarltsel/scalargtsel; perhaps some day we can generalize the approach.
+ * Of course this is all very dependent on the behavior of the inequality
+ * selectivity functions; perhaps some day we can generalize the approach.
  */
 Selectivity
 clauselist_selectivity(PlannerInfo *root,
@@ -218,17 +218,19 @@ clauselist_selectivity(PlannerInfo *root,
 			if (ok)
 			{
 				/*
-				 * If it's not a "<" or ">" operator, just merge the
+				 * If it's not a "<"/"<="/">"/">=" operator, just merge the
 				 * selectivity in generically.  But if it's the right oprrest,
 				 * add the clause to rqlist for later processing.
 				 */
 				switch (get_oprrest(expr->opno))
 				{
 					case F_SCALARLTSEL:
+					case F_SCALARLESEL:
 						addRangeClause(&rqlist, clause,
 									   varonleft, true, s2);
 						break;
 					case F_SCALARGTSEL:
+					case F_SCALARGESEL:
 						addRangeClause(&rqlist, clause,
 									   varonleft, false, s2);
 						break;
@@ -368,7 +370,7 @@ addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 				/*------
 				 * We have found two similar clauses, such as
-				 * x < y AND x < z.
+				 * x < y AND x <= z.
 				 * Keep only the more restrictive one.
 				 *------
 				 */
@@ -388,7 +390,7 @@ addRangeClause(RangeQueryClause **rqlist, Node *clause,
 
 				/*------
 				 * We have found two similar clauses, such as
-				 * x > y AND x > z.
+				 * x > y AND x >= z.
 				 * Keep only the more restrictive one.
 				 *------
 				 */
author	Tom Lane <tgl@sss.pgh.pa.us>	2017-09-13 11:12:39 -0400
committer	Tom Lane <tgl@sss.pgh.pa.us>	2017-09-13 11:12:39 -0400
commit	7d08ce286cd5854d58152e428c28636a616bdc42 (patch)
tree	2e4f6f2ce25df95b86a1becf7a09935334ce5d90 /src/backend/optimizer/path/clausesel.c
parent	089880ba9af5f95e1a3b050874a90dbe5c33fd61 (diff)
download	postgresql-7d08ce286cd5854d58152e428c28636a616bdc42.tar.gz postgresql-7d08ce286cd5854d58152e428c28636a616bdc42.zip