doc/src/sgml/indexcost.sgml


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285

<!--
$PostgreSQL: pgsql/doc/src/sgml/indexcost.sgml,v 2.18 2003/11/29 19:51:37 pgsql Exp $
-->

 <chapter id="indexcost">
  <title>Index Cost Estimation Functions</title>

  <note>
   <title>Author</title>

   <para>
    Written by Tom Lane (<email>tgl@sss.pgh.pa.us</email>) on 2000-01-24
   </para>
  </note>

   <note>
    <para>
     This must eventually become part of a much larger chapter about
     writing new index access methods.
    </para>
   </note>

  <para>
   Every index access method must provide a cost estimation function for
   use by the planner/optimizer.  The procedure OID of this function is
   given in the <literal>amcostestimate</literal> field of the access
   method's <literal>pg_am</literal> entry.

   <note>
    <para>
     Prior to <productname>PostgreSQL</productname> 7.0, a different
     scheme was used for registering 
     index-specific cost estimation functions.
    </para>
   </note>
  </para>

  <para>
   The amcostestimate function is given a list of WHERE clauses that have
   been determined to be usable with the index.  It must return estimates
   of the cost of accessing the index and the selectivity of the WHERE
   clauses (that is, the fraction of main-table rows that will be
   retrieved during the index scan).  For simple cases, nearly all the
   work of the cost estimator can be done by calling standard routines
   in the optimizer; the point of having an amcostestimate function is
   to allow index access methods to provide index-type-specific knowledge,
   in case it is possible to improve on the standard estimates.
  </para>

  <para>
   Each amcostestimate function must have the signature:

   <programlisting>
void
amcostestimate (Query *root,
                RelOptInfo *rel,
                IndexOptInfo *index,
                List *indexQuals,
                Cost *indexStartupCost,
                Cost *indexTotalCost,
                Selectivity *indexSelectivity,
                double *indexCorrelation);
   </programlisting>

   The first four parameters are inputs:

   <variablelist>
    <varlistentry>
     <term>root</term>
     <listitem>
      <para>
       The query being processed.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term>rel</term>
     <listitem>
      <para>
       The relation the index is on.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term>index</term>
     <listitem>
      <para>
       The index itself.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term>indexQuals</term>
     <listitem>
      <para>
       List of index qual clauses (implicitly ANDed);
       a NIL list indicates no qualifiers are available.
      </para>
     </listitem>
    </varlistentry>
   </variablelist>
  </para>

  <para>
   The last four parameters are pass-by-reference outputs:

   <variablelist>
    <varlistentry>
     <term>*indexStartupCost</term>
     <listitem>
      <para>
       Set to cost of index start-up processing
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term>*indexTotalCost</term>
     <listitem>
      <para>
       Set to total cost of index processing
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term>*indexSelectivity</term>
     <listitem>
      <para>
       Set to index selectivity
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term>*indexCorrelation</term>
     <listitem>
      <para>
       Set to correlation coefficient between index scan order and
       underlying table's order
      </para>
     </listitem>
    </varlistentry>
   </variablelist>
  </para>

  <para>
   Note that cost estimate functions must be written in C, not in SQL or
   any available procedural language, because they must access internal
   data structures of the planner/optimizer.
  </para>

  <para>
   The index access costs should be computed in the units used by
   <filename>src/backend/optimizer/path/costsize.c</filename>: a sequential disk block fetch
   has cost 1.0, a nonsequential fetch has cost random_page_cost, and
   the cost of processing one index row should usually be taken as
   cpu_index_tuple_cost (which is a user-adjustable optimizer parameter).
   In addition, an appropriate multiple of cpu_operator_cost should be charged
   for any comparison operators invoked during index processing (especially
   evaluation of the indexQuals themselves).
  </para>

  <para>
   The access costs should include all disk and CPU costs associated with
   scanning the index itself, but NOT the costs of retrieving or processing
   the main-table rows that are identified by the index.
  </para>

  <para>
   The <quote>start-up cost</quote> is the part of the total scan cost that must be expended
   before we can begin to fetch the first row.  For most indexes this can
   be taken as zero, but an index type with a high start-up cost might want
   to set it nonzero.
  </para>

  <para>
   The indexSelectivity should be set to the estimated fraction of the main
   table rows that will be retrieved during the index scan.  In the case
   of a lossy index, this will typically be higher than the fraction of
   rows that actually pass the given qual conditions.
  </para>

  <para>
   The indexCorrelation should be set to the correlation (ranging between
   -1.0 and 1.0) between the index order and the table order.  This is used
   to adjust the estimate for the cost of fetching rows from the main
   table.
  </para>

  <procedure>
   <title>Cost Estimation</title>
   <para>
    A typical cost estimator will proceed as follows:
   </para>

   <step>
    <para>
     Estimate and return the fraction of main-table rows that will be visited
     based on the given qual conditions.  In the absence of any index-type-specific
     knowledge, use the standard optimizer function <function>clauselist_selectivity()</function>:

     <programlisting>
*indexSelectivity = clauselist_selectivity(root, indexQuals,
                                           rel->relid, JOIN_INNER);
     </programlisting>
    </para>
   </step>

   <step>
    <para>
     Estimate the number of index rows that will be visited during the
     scan.  For many index types this is the same as indexSelectivity times
     the number of rows in the index, but it might be more.  (Note that the
     index's size in pages and rows is available from the IndexOptInfo struct.)
    </para>
   </step>

   <step>
    <para>
     Estimate the number of index pages that will be retrieved during the scan.
     This might be just indexSelectivity times the index's size in pages.
    </para>
   </step>

   <step>
    <para>
     Compute the index access cost.  A generic estimator might do this:

     <programlisting>
    /*
     * Our generic assumption is that the index pages will be read
     * sequentially, so they have cost 1.0 each, not random_page_cost.
     * Also, we charge for evaluation of the indexquals at each index row.
     * All the costs are assumed to be paid incrementally during the scan.
     */
    cost_qual_eval(&amp;index_qual_cost, indexQuals);
    *indexStartupCost = index_qual_cost.startup;
    *indexTotalCost = numIndexPages +
        (cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
     </programlisting>
    </para>
   </step>

   <step>
    <para>
     Estimate the index correlation.  For a simple ordered index on a single
     field, this can be retrieved from pg_statistic.  If the correlation
     is not known, the conservative estimate is zero (no correlation).
    </para>
   </step>
  </procedure>

  <para>
   Examples of cost estimator functions can be found in
   <filename>src/backend/utils/adt/selfuncs.c</filename>.
  </para>

  <para>
   By convention, the <literal>pg_proc</literal> entry for an
   <literal>amcostestimate</literal> function should show
   eight arguments all declared as <type>internal</> (since none of them have
   types that are known to SQL), and the return type is <type>void</>.
  </para>
 </chapter>

<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->