diff options
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/perform.sgml | 75 | ||||
-rw-r--r-- | doc/src/sgml/ref/explain.sgml | 44 |
2 files changed, 100 insertions, 19 deletions
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index 6e82d82aad1..6628c9f4925 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -1,5 +1,5 @@ <!-- -$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.9 2001/09/13 15:55:23 petere Exp $ +$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.10 2001/09/18 01:59:05 tgl Exp $ --> <chapter id="performance-tips"> @@ -122,7 +122,7 @@ select * from pg_class where relname = 'tenk1'; regression=# explain select * from tenk1 where unique1 < 1000; NOTICE: QUERY PLAN: -Seq Scan on tenk1 (cost=0.00..358.00 rows=1003 width=148) +Seq Scan on tenk1 (cost=0.00..358.00 rows=1007 width=148) </programlisting> The estimate of output rows has gone down because of the WHERE clause. @@ -147,7 +147,7 @@ Seq Scan on tenk1 (cost=0.00..358.00 rows=1003 width=148) regression=# explain select * from tenk1 where unique1 < 50; NOTICE: QUERY PLAN: -Index Scan using tenk1_unique1 on tenk1 (cost=0.00..173.32 rows=47 width=148) +Index Scan using tenk1_unique1 on tenk1 (cost=0.00..181.09 rows=49 width=148) </programlisting> and you will see that if we make the WHERE condition selective @@ -166,7 +166,7 @@ regression=# explain select * from tenk1 where unique1 < 50 and regression-# stringu1 = 'xxx'; NOTICE: QUERY PLAN: -Index Scan using tenk1_unique1 on tenk1 (cost=0.00..173.44 rows=1 width=148) +Index Scan using tenk1_unique1 on tenk1 (cost=0.00..181.22 rows=1 width=148) </programlisting> The added clause <literal>stringu1 = 'xxx'</literal> reduces the output-rows estimate, @@ -181,11 +181,11 @@ regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 < 50 regression-# and t1.unique2 = t2.unique2; NOTICE: QUERY PLAN: -Nested Loop (cost=0.00..269.11 rows=47 width=296) +Nested Loop (cost=0.00..330.41 rows=49 width=296) -> Index Scan using tenk1_unique1 on tenk1 t1 - (cost=0.00..173.32 rows=47 width=148) + (cost=0.00..181.09 rows=49 width=148) -> Index Scan using tenk2_unique2 on tenk2 t2 - (cost=0.00..2.01 rows=1 width=148) + (cost=0.00..3.01 rows=1 width=148) </programlisting> </para> @@ -202,7 +202,7 @@ Nested Loop (cost=0.00..269.11 rows=47 width=296) same inner-scan plan and costs that we'd get from, say, <literal>explain select * from tenk2 where unique2 = 42</literal>. The costs of the loop node are then set on the basis of the cost of the outer scan, plus one repetition of the - inner scan for each outer tuple (47 * 2.01, here), plus a little CPU + inner scan for each outer tuple (49 * 3.01, here), plus a little CPU time for join processing. </para> @@ -229,12 +229,12 @@ regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 < 50 regression-# and t1.unique2 = t2.unique2; NOTICE: QUERY PLAN: -Hash Join (cost=173.44..557.03 rows=47 width=296) +Hash Join (cost=181.22..564.83 rows=49 width=296) -> Seq Scan on tenk2 t2 (cost=0.00..333.00 rows=10000 width=148) - -> Hash (cost=173.32..173.32 rows=47 width=148) + -> Hash (cost=181.09..181.09 rows=49 width=148) -> Index Scan using tenk1_unique1 on tenk1 t1 - (cost=0.00..173.32 rows=47 width=148) + (cost=0.00..181.09 rows=49 width=148) </programlisting> This plan proposes to extract the 50 interesting rows of <classname>tenk1</classname> @@ -245,11 +245,62 @@ Hash Join (cost=173.44..557.03 rows=47 width=296) cost for the hash join, since we won't get any tuples out until we can start reading <classname>tenk2</classname>. The total time estimate for the join also includes a hefty charge for CPU time to probe the hash table - 10000 times. Note, however, that we are NOT charging 10000 times 173.32; + 10000 times. Note, however, that we are NOT charging 10000 times 181.09; the hash table setup is only done once in this plan type. </para> <para> + It is possible to check on the accuracy of the planner's estimated costs + by using EXPLAIN ANALYZE. This command actually executes the query, + and then displays the true runtime accumulated within each plan node + along with the same estimated costs that a plain EXPLAIN shows. + For example, we might get a result like this: + + <programlisting> +regression=# explain analyze select * from tenk1 t1, tenk2 t2 where t1.unique1 < 50 +regression-# and t1.unique2 = t2.unique2; +NOTICE: QUERY PLAN: + +Nested Loop (cost=0.00..330.41 rows=49 width=296) (actual time=1.31..28.90 rows=50 loops=1) + -> Index Scan using tenk1_unique1 on tenk1 t1 + (cost=0.00..181.09 rows=49 width=148) (actual time=0.69..8.84 rows=50 loops=1) + -> Index Scan using tenk2_unique2 on tenk2 t2 + (cost=0.00..3.01 rows=1 width=148) (actual time=0.28..0.31 rows=1 loops=50) +Total runtime: 30.67 msec + </programlisting> + + Note that the <quote>actual time</quote> values are in milliseconds of + real time, whereas the <quote>cost</quote> estimates are expressed in + arbitrary units of disk fetches; so they are unlikely to match up. + The thing to pay attention to is the ratios. + </para> + + <para> + In some query plans, it is possible for a subplan node to be executed more + than once. For example, the inner indexscan is executed once per outer + tuple in the above nested-loop plan. In such cases, the + <quote>loops</quote> value reports the + total number of executions of the node, and the actual time and rows + values shown are averages per-execution. This is done to make the numbers + comparable with the way that the cost estimates are shown. Multiply by + the <quote>loops</quote> value to get the total time actually spent in + the node. + </para> + + <para> + The <quote>total runtime</quote> shown by EXPLAIN ANALYZE includes + executor startup and shutdown time, as well as time spent processing + the result tuples. It does not include parsing, rewriting, or planning + time. For a SELECT query, the total runtime will normally be just a + little larger than the total time reported for the top-level plan node. + For INSERT, UPDATE, and DELETE queries, the total runtime may be + considerably larger, because it includes the time spent processing the + output tuples. In these queries, the time for the top plan node + essentially is the time spent computing the new tuples and/or locating + the old ones, but it doesn't include the time spent making the changes. + </para> + + <para> It is worth noting that EXPLAIN results should not be extrapolated to situations other than the one you are actually testing; for example, results on a toy-sized table can't be assumed to apply to large tables. diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml index 7bccea72ad7..b24863529b3 100644 --- a/doc/src/sgml/ref/explain.sgml +++ b/doc/src/sgml/ref/explain.sgml @@ -1,5 +1,5 @@ <!-- -$Header: /cvsroot/pgsql/doc/src/sgml/ref/explain.sgml,v 1.12 2001/09/03 12:57:50 petere Exp $ +$Header: /cvsroot/pgsql/doc/src/sgml/ref/explain.sgml,v 1.13 2001/09/18 01:59:05 tgl Exp $ Postgres documentation --> @@ -24,7 +24,7 @@ Postgres documentation <date>1999-07-20</date> </refsynopsisdivinfo> <synopsis> -EXPLAIN [ VERBOSE ] <replaceable class="PARAMETER">query</replaceable> +EXPLAIN [ ANALYZE ] [ VERBOSE ] <replaceable class="PARAMETER">query</replaceable> </synopsis> <refsect2 id="R2-SQL-EXPLAIN-1"> @@ -38,6 +38,14 @@ EXPLAIN [ VERBOSE ] <replaceable class="PARAMETER">query</replaceable> <variablelist> <varlistentry> + <term>ANALYZE</term> + <listitem> + <para> + Flag to carry out the query and show actual runtimes. + </para> + </listitem> + </varlistentry> + <varlistentry> <term>VERBOSE</term> <listitem> <para> @@ -125,11 +133,36 @@ EXPLAIN </para> <para> + The ANALYZE option causes the query to be actually executed, not only + planned. The total elapsed time expended within each plan node (in + milliseconds) and total number of rows it actually returned are added to + the display. This is useful for seeing whether the planner's estimates + are close to reality. + </para> + + <para> The VERBOSE option emits the full internal representation of the plan tree, rather than just a summary (and sends it to the postmaster log file, too). Usually this option is only useful for debugging Postgres. </para> + <caution> + <para> + Keep in mind that the query is actually executed when ANALYZE is used. + Although <command>EXPLAIN</command> will discard any output that a SELECT + would return, + other side-effects of the query will happen as usual. + If you wish to use <command>EXPLAIN ANALYZE</command> on an INSERT, + UPDATE, or DELETE query without letting the query affect your data, + use this approach: + <programlisting> +BEGIN; +EXPLAIN ANALYZE ...; +ROLLBACK; + </programlisting> + </para> + </caution> + <refsect2 id="R2-SQL-EXPLAIN-3"> <refsect2info> <date>1998-04-15</date> @@ -140,11 +173,8 @@ EXPLAIN <para> There is only sparse documentation on the optimizer's use of cost information in <productname>Postgres</productname>. - General information on cost estimation for query optimization - can be found in database textbooks. - Refer to the <citetitle>Programmer's Guide</citetitle> - in the chapters on indexes and the genetic query optimizer for - more information. + Refer to the <citetitle>User's Guide</citetitle> and + <citetitle>Programmer's Guide</citetitle> for more information. </para> </refsect2> </refsect1> |