src/backend/optimizer/README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145

Summary
-------

The optimizer generates optimial query plans by doing several steps:

1) Take each relation in a query, and make a RelOptInfo structure for it. 
Find each way of accessing the relation, called a Path, including
sequential and index scans, and add it to the RelOptInfo.path_order
list.

2) Join each RelOptInfo to each other RelOptInfo as specified in the
WHERE clause.  At this point each RelOptInfo is a single relation, so
you are joining every relation to every relation it is joined to in the
WHERE clause.

Joins occur using two RelOptInfos.  One is outer, the other inner. 
Outers drive lookups of values in the inner.  In a nested loop, lookups
of values in the inner occur by scanning to find each matching inner
row.  In a mergejoin, inner rows are ordered, and are accessed in order,
so only one scan of inner is required to perform the entire join.  In a
hashjoin, inner rows are hashed for lookups.

Each unique join combination becomes a new RelOptInfo.  The RelOptInfo
is now the joining of two relations.  RelOptInfo.path_order are various
paths to create the joined result, having different orderings depending
on the join method used.

3) At this point, every RelOptInfo is joined to each other again, with
a new relation added to each RelOptInfo.  This continues until all
relations have been joined into one RelOptInfo, and the cheapest Path is
chosen.

	SELECT 	*
	FROM 	tab1, tab2, tab3, tab4
	WHERE 	tab1.col = tab2.col AND
		tab2.col = tab3.col AND
		tab3.col = tab4.col

	Tables 1, 2, 3, and 4 are joined as:
	{1 2},{2 3},{3 4}
	{1 2 3},{2 3 4}
	{1 2 3 4}

	SELECT 	*
	FROM 	tab1, tab2, tab3, tab4
	WHERE 	tab1.col = tab2.col AND
		tab1.col = tab3.col AND
		tab1.col = tab4.col

	Tables 1, 2, 3, and 4 are joined as:
	{1 2},{1 3},{1 4}
	{1 2 3},{1 3 4},{1,2,4}
	{1 2 3 4}

Optimizer Functions
-------------------

These directories take the Query structure returned by the parser, and
generate a plan used by the executor.  The /plan directory generates the
plan, the /path generates all possible ways to join the tables, and
/prep handles special cases like inheritance.  /utils is utility stuff.

planner()
 handle inheritance by processing separately
-init_query_planner()
  preprocess target list
  preprocess qualifications(WHERE)
--query_planner()
   cnfify()
    Summary:

     Simple cases with all AND's are handled by removing the AND's:

     convert:   a = 1 AND b = 2 AND c = 3
     to:        a = 1, b = 2, c = 3

     Qualifications with OR's are handled differently.  OR's inside AND
     clauses are not modified drastically:

     convert:   a = 1 AND b = 2 AND (c = 3 OR d = 4)
     to:        a = 1, b = 2, c = 3 OR d = 4

     OR's in the upper level are more complex to handle:

     convert:   (a = 1 AND b = 2) OR c = 3
     to:        (a = 1 OR c = 3) AND (b = 2 OR c = 3)
     finally:   (a = 1 OR c = 3), (b = 2 OR c = 3)

     These clauses all have to be true for a result to be returned,
     so the optimizer can choose the most restrictive clauses.

   pull out constants from target list
   get a target list that only contains column names, no expressions
   if none, then return
---subplanner()
    make list of relations in target
    make list of relations in where clause
     split up the qual into restrictions (a=1) and joins (b=c)
    find which relations can do merge sort and hash joins
----find_paths()
     find scan and all index paths for each relation not yet joined
     one relation, return
     find selectivity of columns used in joins
-----find_join_paths()
      jump to geqo if needed
      again:
       find_join_rels():
        for each joinrel:
         find_clause_joins()
          for each join on joinrel:
           if a join from the join clause adds only one relation, do the join
         or find_clauseless_joins()
       find_all_join_paths()
        generate paths(nested,sortmerge) for joins found in find_join_rels()
       prune_joinrels()
        remove from the join list the relation we just added to each join
       prune_rel_paths()
        set cheapest and perhaps remove unordered path, recompute table sizes
       if we have not done all the tables, go to again:
   do group(GROUP)
   do aggregate
   put back constants
   re-flatten target list
 make unique(DISTINCT)
 make sort(ORDER BY)


Optimizer Structures
--------------------

RelOptInfo		- Every relation

 RestrictInfo	- restriction clauses
 JoinInfo		- join combinations

 Path			- every way to access a relation(sequential, index)
  IndexPath		- index scans

  JoinPath		- joins
   MergePath	- merge joins
   HashPath		- hash joins

 PathOrder		- every ordering type (sort, merge of relations)