Add a GUC parameter seq_page_cost, and use that everywhere we formerly

assumed that a sequential page fetch has cost 1.0. This patch doesn't in itself change the system's behavior at all, but it opens the door to people adopting other units of measurement for EXPLAIN costs. Also, if we ever decide it's worth inventing per-tablespace access cost settings, this change provides a workable intellectual framework for that.
author: Tom Lane <tgl@sss.pgh.pa.us> 2006-06-05 02:49:58 +0000
committer: Tom Lane <tgl@sss.pgh.pa.us> 2006-06-05 02:49:58 +0000
commit: eed6c9ed7e243948e440ed1388591b7fa28f5827 (patch)
tree: aef8821bdd57ed83e2c1b2ce27014e45beed0934 /doc/src
parent: a837851dc060515c1f7023f74d75156a80c2c936 (diff)
download: postgresql-eed6c9ed7e243948e440ed1388591b7fa28f5827.tar.gz
postgresql-eed6c9ed7e243948e440ed1388591b7fa28f5827.zip
3 files changed, 107 insertions, 65 deletions
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8ab95a608cd..03b47355f38 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.59 2006/05/21 20:10:42 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.60 2006/06/05 02:49:58 tgl Exp $ -->
 
 <chapter Id="runtime-config">
   <title>Server Configuration</title>
@@ -1739,40 +1739,39 @@ archive_command = 'copy "%p" /mnt/server/archivedir/"%f"'  # Windows
       Planner Cost Constants
      </title>
 
+    <para>
+     The <firstterm>cost</> variables described in this section are measured
+     on an arbitrary scale.  Only their relative values matter, hence
+     scaling them all up or down by the same factor will result in no change
+     in the planner's choices.  Traditionally, these variables have been
+     referenced to sequential page fetches as the unit of cost; that is,
+     <varname>seq_page_cost</> is conventionally set to <literal>1.0</>
+     and the other cost variables are set with reference to that.  But
+     you can use a different scale if you prefer, such as actual execution
+     times in milliseconds on a particular machine.
+    </para>
+
    <note>
     <para>
-     Unfortunately, there is no well-defined method for determining
-     ideal values for the family of <quote>cost</quote> variables that
-     appear below. You are encouraged to experiment and share
-     your findings.
+     Unfortunately, there is no well-defined method for determining ideal
+     values for the cost variables.  They are best treated as averages over
+     the entire mix of queries that a particular installation will get.  This
+     means that changing them on the basis of just a few experiments is very
+     risky.
     </para>
    </note>
 
      <variablelist>
-     
-     <varlistentry id="guc-effective-cache-size" xreflabel="effective_cache_size">
-      <term><varname>effective_cache_size</varname> (<type>floating point</type>)</term>
+
+     <varlistentry id="guc-seq-page-cost" xreflabel="seq_page_cost">
+      <term><varname>seq_page_cost</varname> (<type>floating point</type>)</term>
       <indexterm>
-       <primary><varname>effective_cache_size</> configuration parameter</primary>
+       <primary><varname>seq_page_cost</> configuration parameter</primary>
       </indexterm>
       <listitem>
        <para>
-        Sets the planner's assumption about the effective size of the
-        disk cache that is available to a single index scan.  This is
-        factored into estimates of the cost of using an index; a
-        higher value makes it more likely index scans will be used, a
-        lower value makes it more likely sequential scans will be
-        used. When setting this parameter you should consider both
-        <productname>PostgreSQL</productname>'s shared buffers and the
-        portion of the kernel's disk cache that will be used for
-        <productname>PostgreSQL</productname> data files.  Also, take
-        into account the expected number of concurrent queries using
-        different indexes, since they will have to share the available
-        space.  This parameter has no effect on the size of shared
-        memory allocated by <productname>PostgreSQL</productname>, nor
-        does it reserve kernel disk cache; it is used only for
-        estimation purposes.  The value is measured in disk pages,
-        which are normally 8192 bytes each. The default is 1000.
+        Sets the planner's estimate of the cost of a disk page fetch
+        that is part of a series of sequential fetches.  The default is 1.0.
        </para>
       </listitem>
      </varlistentry>
@@ -1785,12 +1784,27 @@ archive_command = 'copy "%p" /mnt/server/archivedir/"%f"'  # Windows
       <listitem>
        <para>
         Sets the planner's estimate of the cost of a
-        nonsequentially fetched disk page. This is measured as a
-        multiple of the cost of a sequential page fetch. A higher
-        value makes it more likely a sequential scan will be used, a
-        lower value makes it more likely an index scan will be
-        used. The default is four.
+        non-sequentially-fetched disk page.  The default is 4.0.
+        Reducing this value relative to <varname>seq_page_cost</>
+        will cause the system to prefer index scans; raising it will
+        make index scans look relatively more expensive.  You can raise
+        or lower both values together to change the importance of disk I/O
+        costs relative to CPU costs, which are described by the following
+        parameters.
        </para>
+
+       <tip>
+        <para>
+         Although the system will let you set <varname>random_page_cost</> to
+         less than <varname>seq_page_cost</>, it is not physically sensible
+         to do so.  However, setting them equal makes sense if the database
+         is entirely cached in RAM, since in that case there is no penalty
+         for touching pages out of sequence.  Also, in a heavily-cached
+         database you should lower both values relative to the CPU parameters,
+         since the cost of fetching a page already in RAM is much smaller
+         than it would normally be.
+        </para>
+       </tip>
       </listitem>
      </varlistentry>
 
@@ -1802,8 +1816,8 @@ archive_command = 'copy "%p" /mnt/server/archivedir/"%f"'  # Windows
       <listitem>
        <para>
         Sets the planner's estimate of the cost of processing
-        each row during a query. This is measured as a fraction of
-        the cost of a sequential page fetch. The default is 0.01.
+        each row during a query.
+        The default is 0.01.
        </para>
       </listitem>
      </varlistentry>
@@ -1816,9 +1830,8 @@ archive_command = 'copy "%p" /mnt/server/archivedir/"%f"'  # Windows
       <listitem>
        <para>
         Sets the planner's estimate of the cost of processing
-        each index row during an index scan. This is measured as a
-        fraction of the cost of a sequential page fetch. The default
-        is 0.001.
+        each index entry during an index scan.
+        The default is 0.001.
        </para>
       </listitem>
      </varlistentry>
@@ -1831,8 +1844,35 @@ archive_command = 'copy "%p" /mnt/server/archivedir/"%f"'  # Windows
       <listitem>
        <para>
         Sets the planner's estimate of the cost of processing each
-        operator in a <literal>WHERE</> clause. This is measured as a fraction of
-        the cost of a sequential page fetch. The default is 0.0025.
+        operator or function executed during a query.
+        The default is 0.0025.
+       </para>
+      </listitem>
+     </varlistentry>
+     
+     <varlistentry id="guc-effective-cache-size" xreflabel="effective_cache_size">
+      <term><varname>effective_cache_size</varname> (<type>floating point</type>)</term>
+      <indexterm>
+       <primary><varname>effective_cache_size</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        Sets the planner's assumption about the effective size of the
+        disk cache that is available to a single index scan.  This is
+        factored into estimates of the cost of using an index; a
+        higher value makes it more likely index scans will be used, a
+        lower value makes it more likely sequential scans will be
+        used. When setting this parameter you should consider both
+        <productname>PostgreSQL</productname>'s shared buffers and the
+        portion of the kernel's disk cache that will be used for
+        <productname>PostgreSQL</productname> data files.  Also, take
+        into account the expected number of concurrent queries using
+        different indexes, since they will have to share the available
+        space.  This parameter has no effect on the size of shared
+        memory allocated by <productname>PostgreSQL</productname>, nor
+        does it reserve kernel disk cache; it is used only for
+        estimation purposes.  The value is measured in disk pages,
+        which are normally 8192 bytes each. The default is 1000.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 70fba4ecc0a..4bf14ba7e60 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/indexam.sgml,v 2.12 2006/05/24 11:01:39 teodor Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/indexam.sgml,v 2.13 2006/06/05 02:49:58 tgl Exp $ -->
 
 <chapter id="indexam">
  <title>Index Access Method Interface Definition</title>
@@ -771,14 +771,14 @@ amcostestimate (PlannerInfo *root,
   </para>
 
   <para>
-   The index access costs should be computed in the units used by
+   The index access costs should be computed using the parameters used by
    <filename>src/backend/optimizer/path/costsize.c</filename>: a sequential
-   disk block fetch has cost 1.0, a nonsequential fetch has cost
-   <varname>random_page_cost</>, and the cost of processing one index row
-   should usually be taken as <varname>cpu_index_tuple_cost</>.  In addition,
-   an appropriate multiple of <varname>cpu_operator_cost</> should be charged
-   for any comparison operators invoked during index processing (especially
-   evaluation of the indexQuals themselves).
+   disk block fetch has cost <varname>seq_page_cost</>, a nonsequential fetch
+   has cost <varname>random_page_cost</>, and the cost of processing one index
+   row should usually be taken as <varname>cpu_index_tuple_cost</>.  In
+   addition, an appropriate multiple of <varname>cpu_operator_cost</> should
+   be charged for any comparison operators invoked during index processing
+   (especially evaluation of the indexQuals themselves).
   </para>
 
   <para>
@@ -788,10 +788,10 @@ amcostestimate (PlannerInfo *root,
   </para>
 
   <para>
-   The <quote>start-up cost</quote> is the part of the total scan cost that must be expended
-   before we can begin to fetch the first row.  For most indexes this can
-   be taken as zero, but an index type with a high start-up cost might want
-   to set it nonzero.
+   The <quote>start-up cost</quote> is the part of the total scan cost that
+   must be expended before we can begin to fetch the first row.  For most
+   indexes this can be taken as zero, but an index type with a high start-up
+   cost might want to set it nonzero.
   </para>
 
   <para>
@@ -850,13 +850,13 @@ amcostestimate (PlannerInfo *root,
 <programlisting>
     /*
      * Our generic assumption is that the index pages will be read
-     * sequentially, so they have cost 1.0 each, not random_page_cost.
+     * sequentially, so they cost seq_page_cost each, not random_page_cost.
      * Also, we charge for evaluation of the indexquals at each index row.
      * All the costs are assumed to be paid incrementally during the scan.
      */
     cost_qual_eval(&amp;index_qual_cost, indexQuals);
     *indexStartupCost = index_qual_cost.startup;
-    *indexTotalCost = numIndexPages +
+    *indexTotalCost = seq_page_cost * numIndexPages +
         (cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
 </programlisting>
     </para>
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 4cbd44e1732..6d65938c21e 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.56 2006/03/10 19:10:48 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.57 2006/06/05 02:49:58 tgl Exp $ -->
 
  <chapter id="performance-tips">
   <title>Performance Tips</title>
@@ -60,7 +60,7 @@
     <footnote>
      <para>
       Examples in this section are drawn from the regression test database
-      after doing a <command>VACUUM ANALYZE</>, using 8.1 development sources.
+      after doing a <command>VACUUM ANALYZE</>, using 8.2 development sources.
       You should be able to get similar results if you try the examples yourself,
       but your estimated costs and row counts will probably vary slightly
       because <command>ANALYZE</>'s statistics are random samples rather
@@ -114,12 +114,13 @@ EXPLAIN SELECT * FROM tenk1;
    </para>
 
    <para>
-    The costs are measured in units of disk page fetches; that is, 1.0
-    equals one sequential disk page read, by definition.  (CPU effort
-    estimates are made too; they are converted into disk-page units using some
-    fairly arbitrary fudge factors. If you want to experiment with these
-    factors, see the list of run-time configuration parameters in
-    <xref linkend="runtime-config-query-constants">.)
+    The costs are measured in arbitrary units determined by the planner's
+    cost parameters (see <xref linkend="runtime-config-query-constants">).
+    Traditional practice is to measure the costs in units of disk page
+    fetches; that is, <xref linkend="guc-seq-page-cost"> is conventionally
+    set to <literal>1.0</> and the other cost parameters are set relative
+    to that.  The examples in this section are run with the default cost
+    parameters.
    </para>
 
    <para>
@@ -164,9 +165,9 @@ SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1';
 
     you will find out that <classname>tenk1</classname> has 358 disk
     pages and 10000 rows.  So the cost is estimated at 358 page
-    reads, defined as costing 1.0 apiece, plus 10000 * <xref
-    linkend="guc-cpu-tuple-cost"> which is
-    typically 0.01 (try <command>SHOW cpu_tuple_cost</command>).
+    reads, costing <xref linkend="guc-seq-page-cost"> apiece (1.0 by
+    default), plus 10000 * <xref linkend="guc-cpu-tuple-cost"> which is
+    0.01 by default.
    </para>
 
    <para>
@@ -400,8 +401,9 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 &lt; 100 AND t
 
     Note that the <quote>actual time</quote> values are in milliseconds of
     real time, whereas the <quote>cost</quote> estimates are expressed in
-    arbitrary units of disk fetches; so they are unlikely to match up.
-    The thing to pay attention to is the ratios.
+    arbitrary units; so they are unlikely to match up.
+    The thing to pay attention to is whether the ratios of actual time and
+    estimated costs are consistent.
    </para>
 
    <para>
@@ -427,7 +429,7 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 &lt; 100 AND t
     may be considerably larger, because it includes the time spent processing
     the result rows.  In these commands, the time for the top plan node
     essentially is the time spent computing the new rows and/or locating the
-    old ones, but it doesn't include the time spent making the changes.
+    old ones, but it doesn't include the time spent applying the changes.
     Time spent firing triggers, if any, is also outside the top plan node,
     and is shown separately for each trigger.
    </para>
author	Tom Lane <tgl@sss.pgh.pa.us>	2006-06-05 02:49:58 +0000
committer	Tom Lane <tgl@sss.pgh.pa.us>	2006-06-05 02:49:58 +0000
commit	eed6c9ed7e243948e440ed1388591b7fa28f5827 (patch)
tree	aef8821bdd57ed83e2c1b2ce27014e45beed0934 /doc/src
parent	a837851dc060515c1f7023f74d75156a80c2c936 (diff)
download	postgresql-eed6c9ed7e243948e440ed1388591b7fa28f5827.tar.gz postgresql-eed6c9ed7e243948e440ed1388591b7fa28f5827.zip