Clamp semijoin selectivity to be not more than inner-join selectivity.

We should never estimate the output of a semijoin to be more rows than we estimate for an inner join with the same input rels and join condition; it's obviously impossible for that to happen. However, given the relatively poor quality of our semijoin selectivity estimates --- particularly, but not only, in cases where we punt and return a default estimate --- we did often deliver such estimates. To improve matters, calculate both estimates inside eqjoinsel() and take the smaller one. The bulk of this patch is just mechanical refactoring to avoid repetitive information lookup when we call both eqjoinsel_semi and eqjoinsel_inner. The actual new behavior is just selec = Min(selec, inner_rel->rows * selec_inner); which looks a bit odd but is correct because of our different definitions for inner and semi join selectivity. There is one ensuing plan change in the regression tests, but it looks reasonable enough (and checking the actual row counts shows that the estimate moved closer to reality, not further away). Per bug #15160 from Alexey Ermakov. Although this is arguably a bug fix, I won't risk destabilizing plan choices in stable branches by back-patching. Tom Lane, reviewed by Melanie Plageman Discussion: https://postgr.es/m/152395805004.19366.3107109716821067806@wrigleys.postgresql.org
author: Tom Lane <tgl@sss.pgh.pa.us> 2018-11-23 12:48:49 -0500
committer: Tom Lane <tgl@sss.pgh.pa.us> 2018-11-23 12:48:49 -0500
commit: a314c34079cf06d05265623dd7c056f8fa9d577f (patch)
tree: 256c1da7e313d378f054934cf0a9b9a49387d04f /src/test
parent: 3be5fe2b107fae24e03c9d29d7bd7c7ad5345787 (diff)
download: postgresql-a314c34079cf06d05265623dd7c056f8fa9d577f.tar.gz
postgresql-a314c34079cf06d05265623dd7c056f8fa9d577f.zip
1 files changed, 11 insertions, 12 deletions
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 3ba3aaf2d86..c55de5d4765 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -801,8 +801,8 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHER
 
 EXPLAIN (COSTS OFF)
 SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
-                                  QUERY PLAN                                   
--------------------------------------------------------------------------------
+                               QUERY PLAN                                
+-------------------------------------------------------------------------
  Sort
    Sort Key: t1.a
    ->  Append
@@ -831,19 +831,18 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
                      Index Cond: (a = t1_4.b)
                      Filter: (b = 0)
          ->  Nested Loop
-               ->  Unique
-                     ->  Sort
-                           Sort Key: t1_5.b
-                           ->  Hash Semi Join
-                                 Hash Cond: (t1_5.b = ((t1_8.a + t1_8.b) / 2))
-                                 ->  Seq Scan on prt2_p3 t1_5
-                                 ->  Hash
-                                       ->  Seq Scan on prt1_e_p3 t1_8
-                                             Filter: (c = 0)
+               ->  HashAggregate
+                     Group Key: t1_5.b
+                     ->  Hash Semi Join
+                           Hash Cond: (t1_5.b = ((t1_8.a + t1_8.b) / 2))
+                           ->  Seq Scan on prt2_p3 t1_5
+                           ->  Hash
+                                 ->  Seq Scan on prt1_e_p3 t1_8
+                                       Filter: (c = 0)
                ->  Index Scan using iprt1_p3_a on prt1_p3 t1_2
                      Index Cond: (a = t1_5.b)
                      Filter: (b = 0)
-(40 rows)
+(39 rows)
 
 SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
   a  | b |  c
author	Tom Lane <tgl@sss.pgh.pa.us>	2018-11-23 12:48:49 -0500
committer	Tom Lane <tgl@sss.pgh.pa.us>	2018-11-23 12:48:49 -0500
commit	a314c34079cf06d05265623dd7c056f8fa9d577f (patch)
tree	256c1da7e313d378f054934cf0a9b9a49387d04f /src/test
parent	3be5fe2b107fae24e03c9d29d7bd7c7ad5345787 (diff)
download	postgresql-a314c34079cf06d05265623dd7c056f8fa9d577f.tar.gz postgresql-a314c34079cf06d05265623dd7c056f8fa9d577f.zip