aboutsummaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/perform.sgml75
-rw-r--r--doc/src/sgml/ref/explain.sgml44
2 files changed, 100 insertions, 19 deletions
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 6e82d82aad1..6628c9f4925 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1,5 +1,5 @@
<!--
-$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.9 2001/09/13 15:55:23 petere Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.10 2001/09/18 01:59:05 tgl Exp $
-->
<chapter id="performance-tips">
@@ -122,7 +122,7 @@ select * from pg_class where relname = 'tenk1';
regression=# explain select * from tenk1 where unique1 &lt; 1000;
NOTICE: QUERY PLAN:
-Seq Scan on tenk1 (cost=0.00..358.00 rows=1003 width=148)
+Seq Scan on tenk1 (cost=0.00..358.00 rows=1007 width=148)
</programlisting>
The estimate of output rows has gone down because of the WHERE clause.
@@ -147,7 +147,7 @@ Seq Scan on tenk1 (cost=0.00..358.00 rows=1003 width=148)
regression=# explain select * from tenk1 where unique1 &lt; 50;
NOTICE: QUERY PLAN:
-Index Scan using tenk1_unique1 on tenk1 (cost=0.00..173.32 rows=47 width=148)
+Index Scan using tenk1_unique1 on tenk1 (cost=0.00..181.09 rows=49 width=148)
</programlisting>
and you will see that if we make the WHERE condition selective
@@ -166,7 +166,7 @@ regression=# explain select * from tenk1 where unique1 &lt; 50 and
regression-# stringu1 = 'xxx';
NOTICE: QUERY PLAN:
-Index Scan using tenk1_unique1 on tenk1 (cost=0.00..173.44 rows=1 width=148)
+Index Scan using tenk1_unique1 on tenk1 (cost=0.00..181.22 rows=1 width=148)
</programlisting>
The added clause <literal>stringu1 = 'xxx'</literal> reduces the output-rows estimate,
@@ -181,11 +181,11 @@ regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 &lt; 50
regression-# and t1.unique2 = t2.unique2;
NOTICE: QUERY PLAN:
-Nested Loop (cost=0.00..269.11 rows=47 width=296)
+Nested Loop (cost=0.00..330.41 rows=49 width=296)
-&gt; Index Scan using tenk1_unique1 on tenk1 t1
- (cost=0.00..173.32 rows=47 width=148)
+ (cost=0.00..181.09 rows=49 width=148)
-&gt; Index Scan using tenk2_unique2 on tenk2 t2
- (cost=0.00..2.01 rows=1 width=148)
+ (cost=0.00..3.01 rows=1 width=148)
</programlisting>
</para>
@@ -202,7 +202,7 @@ Nested Loop (cost=0.00..269.11 rows=47 width=296)
same inner-scan plan and costs that we'd get from, say, <literal>explain select
* from tenk2 where unique2 = 42</literal>. The costs of the loop node are then set
on the basis of the cost of the outer scan, plus one repetition of the
- inner scan for each outer tuple (47 * 2.01, here), plus a little CPU
+ inner scan for each outer tuple (49 * 3.01, here), plus a little CPU
time for join processing.
</para>
@@ -229,12 +229,12 @@ regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 &lt; 50
regression-# and t1.unique2 = t2.unique2;
NOTICE: QUERY PLAN:
-Hash Join (cost=173.44..557.03 rows=47 width=296)
+Hash Join (cost=181.22..564.83 rows=49 width=296)
-&gt; Seq Scan on tenk2 t2
(cost=0.00..333.00 rows=10000 width=148)
- -&gt; Hash (cost=173.32..173.32 rows=47 width=148)
+ -&gt; Hash (cost=181.09..181.09 rows=49 width=148)
-&gt; Index Scan using tenk1_unique1 on tenk1 t1
- (cost=0.00..173.32 rows=47 width=148)
+ (cost=0.00..181.09 rows=49 width=148)
</programlisting>
This plan proposes to extract the 50 interesting rows of <classname>tenk1</classname>
@@ -245,11 +245,62 @@ Hash Join (cost=173.44..557.03 rows=47 width=296)
cost for the hash join, since we won't get any tuples out until we can
start reading <classname>tenk2</classname>. The total time estimate for the join also
includes a hefty charge for CPU time to probe the hash table
- 10000 times. Note, however, that we are NOT charging 10000 times 173.32;
+ 10000 times. Note, however, that we are NOT charging 10000 times 181.09;
the hash table setup is only done once in this plan type.
</para>
<para>
+ It is possible to check on the accuracy of the planner's estimated costs
+ by using EXPLAIN ANALYZE. This command actually executes the query,
+ and then displays the true runtime accumulated within each plan node
+ along with the same estimated costs that a plain EXPLAIN shows.
+ For example, we might get a result like this:
+
+ <programlisting>
+regression=# explain analyze select * from tenk1 t1, tenk2 t2 where t1.unique1 &lt; 50
+regression-# and t1.unique2 = t2.unique2;
+NOTICE: QUERY PLAN:
+
+Nested Loop (cost=0.00..330.41 rows=49 width=296) (actual time=1.31..28.90 rows=50 loops=1)
+ -&gt; Index Scan using tenk1_unique1 on tenk1 t1
+ (cost=0.00..181.09 rows=49 width=148) (actual time=0.69..8.84 rows=50 loops=1)
+ -&gt; Index Scan using tenk2_unique2 on tenk2 t2
+ (cost=0.00..3.01 rows=1 width=148) (actual time=0.28..0.31 rows=1 loops=50)
+Total runtime: 30.67 msec
+ </programlisting>
+
+ Note that the <quote>actual time</quote> values are in milliseconds of
+ real time, whereas the <quote>cost</quote> estimates are expressed in
+ arbitrary units of disk fetches; so they are unlikely to match up.
+ The thing to pay attention to is the ratios.
+ </para>
+
+ <para>
+ In some query plans, it is possible for a subplan node to be executed more
+ than once. For example, the inner indexscan is executed once per outer
+ tuple in the above nested-loop plan. In such cases, the
+ <quote>loops</quote> value reports the
+ total number of executions of the node, and the actual time and rows
+ values shown are averages per-execution. This is done to make the numbers
+ comparable with the way that the cost estimates are shown. Multiply by
+ the <quote>loops</quote> value to get the total time actually spent in
+ the node.
+ </para>
+
+ <para>
+ The <quote>total runtime</quote> shown by EXPLAIN ANALYZE includes
+ executor startup and shutdown time, as well as time spent processing
+ the result tuples. It does not include parsing, rewriting, or planning
+ time. For a SELECT query, the total runtime will normally be just a
+ little larger than the total time reported for the top-level plan node.
+ For INSERT, UPDATE, and DELETE queries, the total runtime may be
+ considerably larger, because it includes the time spent processing the
+ output tuples. In these queries, the time for the top plan node
+ essentially is the time spent computing the new tuples and/or locating
+ the old ones, but it doesn't include the time spent making the changes.
+ </para>
+
+ <para>
It is worth noting that EXPLAIN results should not be extrapolated
to situations other than the one you are actually testing; for example,
results on a toy-sized table can't be assumed to apply to large tables.
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 7bccea72ad7..b24863529b3 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -1,5 +1,5 @@
<!--
-$Header: /cvsroot/pgsql/doc/src/sgml/ref/explain.sgml,v 1.12 2001/09/03 12:57:50 petere Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/ref/explain.sgml,v 1.13 2001/09/18 01:59:05 tgl Exp $
Postgres documentation
-->
@@ -24,7 +24,7 @@ Postgres documentation
<date>1999-07-20</date>
</refsynopsisdivinfo>
<synopsis>
-EXPLAIN [ VERBOSE ] <replaceable class="PARAMETER">query</replaceable>
+EXPLAIN [ ANALYZE ] [ VERBOSE ] <replaceable class="PARAMETER">query</replaceable>
</synopsis>
<refsect2 id="R2-SQL-EXPLAIN-1">
@@ -38,6 +38,14 @@ EXPLAIN [ VERBOSE ] <replaceable class="PARAMETER">query</replaceable>
<variablelist>
<varlistentry>
+ <term>ANALYZE</term>
+ <listitem>
+ <para>
+ Flag to carry out the query and show actual runtimes.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
<term>VERBOSE</term>
<listitem>
<para>
@@ -125,11 +133,36 @@ EXPLAIN
</para>
<para>
+ The ANALYZE option causes the query to be actually executed, not only
+ planned. The total elapsed time expended within each plan node (in
+ milliseconds) and total number of rows it actually returned are added to
+ the display. This is useful for seeing whether the planner's estimates
+ are close to reality.
+ </para>
+
+ <para>
The VERBOSE option emits the full internal representation of the plan tree,
rather than just a summary (and sends it to the postmaster log file, too).
Usually this option is only useful for debugging Postgres.
</para>
+ <caution>
+ <para>
+ Keep in mind that the query is actually executed when ANALYZE is used.
+ Although <command>EXPLAIN</command> will discard any output that a SELECT
+ would return,
+ other side-effects of the query will happen as usual.
+ If you wish to use <command>EXPLAIN ANALYZE</command> on an INSERT,
+ UPDATE, or DELETE query without letting the query affect your data,
+ use this approach:
+ <programlisting>
+BEGIN;
+EXPLAIN ANALYZE ...;
+ROLLBACK;
+ </programlisting>
+ </para>
+ </caution>
+
<refsect2 id="R2-SQL-EXPLAIN-3">
<refsect2info>
<date>1998-04-15</date>
@@ -140,11 +173,8 @@ EXPLAIN
<para>
There is only sparse documentation on the optimizer's use of cost
information in <productname>Postgres</productname>.
- General information on cost estimation for query optimization
- can be found in database textbooks.
- Refer to the <citetitle>Programmer's Guide</citetitle>
- in the chapters on indexes and the genetic query optimizer for
- more information.
+ Refer to the <citetitle>User's Guide</citetitle> and
+ <citetitle>Programmer's Guide</citetitle> for more information.
</para>
</refsect2>
</refsect1>