diff options
Diffstat (limited to 'doc/src/sgml/plan.sgml')
-rw-r--r-- | doc/src/sgml/plan.sgml | 269 |
1 files changed, 0 insertions, 269 deletions
diff --git a/doc/src/sgml/plan.sgml b/doc/src/sgml/plan.sgml deleted file mode 100644 index bfc592b6a40..00000000000 --- a/doc/src/sgml/plan.sgml +++ /dev/null @@ -1,269 +0,0 @@ -<!-- -$Header: /cvsroot/pgsql/doc/src/sgml/Attic/plan.sgml,v 2.5 2000/09/29 20:21:34 petere Exp $ ---> - - <chapter id="understand-performance"> - <title>Understanding Performance</title> - - <para> - Query performance can be affected by many things. Some of these can - be manipulated by the user, while others are fundamental to the underlying - design of the system. - </para> - - <para> - Some performance issues, such as index creation and bulk data - loading, are covered elsewhere. This chapter will discuss the - <command>EXPLAIN</command> command, and will show how the details - of a query can affect the query plan, and hence overall - performance. - </para> - - <sect1 id="using-explain"> - <title>Using <command>EXPLAIN</command></title> - - <note> - <title>Author</title> - <para> - Written by Tom Lane, from e-mail dated 2000-03-27. - </para> - </note> - - <para> - Plan-reading is an art that deserves a tutorial, and I haven't - had time to write one. Here is some quick & dirty explanation. - </para> - - <para> - The numbers that are currently quoted by EXPLAIN are: - - <itemizedlist> - <listitem> - <para> - Estimated start-up cost (time expended before output scan can start, - eg, time to do the sorting in a SORT node). - </para> - </listitem> - - <listitem> - <para> - Estimated total cost (if all tuples are retrieved, which they may not - be --- LIMIT will stop short of paying the total cost, for - example). - </para> - </listitem> - - <listitem> - <para> - Estimated number of rows output by this plan node. - </para> - </listitem> - - <listitem> - <para> - Estimated average width (in bytes) of rows output by this plan - node. - </para> - </listitem> - </itemizedlist> - </para> - - <para> - The costs are measured in units of disk page fetches. (CPU effort - estimates are converted into disk-page units using some - fairly arbitrary fudge-factors. See the <command>SET</command> - reference page if you want to experiment with these.) - It's important to note that the cost of an upper-level node includes - the cost of all its child nodes. It's also important to realize that - the cost only reflects things that the planner/optimizer cares about. - In particular, the cost does not consider the time spent transmitting - result tuples to the frontend --- which could be a pretty dominant - factor in the true elapsed time, but the planner ignores it because - it cannot change it by altering the plan. (Every correct plan will - output the same tuple set, we trust.) - </para> - - <para> - Rows output is a little tricky because it is <emphasis>not</emphasis> the number of rows - processed/scanned by the query --- it is usually less, reflecting the - estimated selectivity of any WHERE-clause constraints that are being - applied at this node. - </para> - - <para> - Average width is pretty bogus because the thing really doesn't have - any idea of the average length of variable-length columns. I'm thinking - about improving that in the future, but it may not be worth the trouble, - because the width isn't used for very much. - </para> - - <para> - Here are some examples (using the regress test database after a - vacuum analyze, and almost-7.0 sources): - - <programlisting> -regression=# explain select * from tenk1; -NOTICE: QUERY PLAN: - -Seq Scan on tenk1 (cost=0.00..333.00 rows=10000 width=148) - </programlisting> - </para> - - <para> - This is about as straightforward as it gets. If you do - - <programlisting> -select * from pg_class where relname = 'tenk1'; - </programlisting> - - you'll find out that tenk1 has 233 disk - pages and 10000 tuples. So the cost is estimated at 233 block - reads, defined as 1.0 apiece, plus 10000 * cpu_tuple_cost which is - currently 0.01 (try <command>show cpu_tuple_cost</command>). - </para> - - <para> - Now let's modify the query to add a qualification clause: - - <programlisting> -regression=# explain select * from tenk1 where unique1 < 1000; -NOTICE: QUERY PLAN: - -Seq Scan on tenk1 (cost=0.00..358.00 rows=1000 width=148) - </programlisting> - - The estimate of output rows has gone down because of the WHERE clause. - (The uncannily accurate estimate is just because tenk1 is a particularly - simple case --- the unique1 column has 10000 distinct values ranging - from 0 to 9999, so the estimator's linear interpolation between min and - max column values is dead-on.) However, the scan will still have to - visit all 10000 rows, so the cost hasn't decreased; in fact it has gone - up a bit to reflect the extra CPU time spent checking the WHERE - condition. - </para> - - <para> - Modify the query to restrict the qualification even more: - - <programlisting> -regression=# explain select * from tenk1 where unique1 < 100; -NOTICE: QUERY PLAN: - -Index Scan using tenk1_unique1 on tenk1 (cost=0.00..89.35 rows=100 width=148) - </programlisting> - - and you will see that if we make the WHERE condition selective - enough, the planner will - eventually decide that an indexscan is cheaper than a sequential scan. - This plan will only have to visit 100 tuples because of the index, - so it wins despite the fact that each individual fetch is expensive. - </para> - - <para> - Add another condition to the qualification: - - <programlisting> -regression=# explain select * from tenk1 where unique1 < 100 and -regression-# stringu1 = 'xxx'; -NOTICE: QUERY PLAN: - -Index Scan using tenk1_unique1 on tenk1 (cost=0.00..89.60 rows=1 width=148) - </programlisting> - - The added clause "stringu1 = 'xxx'" reduces the output-rows estimate, - but not the cost because we still have to visit the same set of tuples. - </para> - - <para> - Let's try joining two tables, using the fields we have been discussing: - - <programlisting> -regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 < 100 -regression-# and t1.unique2 = t2.unique2; -NOTICE: QUERY PLAN: - -Nested Loop (cost=0.00..144.07 rows=100 width=296) - -> Index Scan using tenk1_unique1 on tenk1 t1 - (cost=0.00..89.35 rows=100 width=148) - -> Index Scan using tenk2_unique2 on tenk2 t2 - (cost=0.00..0.53 rows=1 width=148) - </programlisting> - </para> - - <para> - In this nested-loop join, the outer scan is the same indexscan we had - in the example before last, and so its cost and row count are the same - because we are applying the "unique1 < 100" WHERE clause at that node. - The "t1.unique2 = t2.unique2" clause isn't relevant yet, so it doesn't - affect the outer scan's row count. For the inner scan, the - current - outer-scan tuple's unique2 value is plugged into the inner indexscan - to produce an indexqual like - "t2.unique2 = <replaceable>constant</replaceable>". So we get the - same inner-scan plan and costs that we'd get from, say, "explain select - * from tenk2 where unique2 = 42". The loop node's costs are then set - on the basis of the outer scan's cost, plus one repetition of the - inner scan for each outer tuple (100 * 0.53, here), plus a little CPU - time for join processing. - </para> - - <para> - In this example the loop's output row count is the same as the product - of the two scans' row counts, but that's not true in general, because - in general you can have WHERE clauses that mention both relations and - so can only be applied at the join point, not to either input scan. - For example, if we added "WHERE ... AND t1.hundred < t2.hundred", - that'd decrease the output row count of the join node, but not change - either input scan. - </para> - - <para> - We can look at variant plans by forcing the planner to disregard - whatever strategy it thought was the winner (a pretty crude tool, - but it's what we've got at the moment): - - <programlisting> -regression=# set enable_nestloop = off; -SET VARIABLE -regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 < 100 -regression-# and t1.unique2 = t2.unique2; -NOTICE: QUERY PLAN: - -Hash Join (cost=89.60..574.10 rows=100 width=296) - -> Seq Scan on tenk2 t2 - (cost=0.00..333.00 rows=10000 width=148) - -> Hash (cost=89.35..89.35 rows=100 width=148) - -> Index Scan using tenk1_unique1 on tenk1 t1 - (cost=0.00..89.35 rows=100 width=148) - </programlisting> - - This plan proposes to extract the 100 interesting rows of tenk1 - using ye same olde indexscan, stash them into an in-memory hash table, - and then do a sequential scan of tenk2, probing into the hash table - for possible matches of "t1.unique2 = t2.unique2" at each tenk2 tuple. - The cost to read tenk1 and set up the hash table is entirely start-up - cost for the hash join, since we won't get any tuples out until we can - start reading tenk2. The total time estimate for the join also - includes a pretty hefty charge for CPU time to probe the hash table - 10000 times. Note, however, that we are NOT charging 10000 times 89.35; - the hash table setup is only done once in this plan type. - </para> - </sect1> - </chapter> - -<!-- Keep this comment at the end of the file -Local variables: -mode:sgml -sgml-omittag:nil -sgml-shorttag:t -sgml-minimize-attributes:nil -sgml-always-quote-attributes:t -sgml-indent-step:1 -sgml-indent-data:t -sgml-parent-document:nil -sgml-default-dtd-file:"./reference.ced" -sgml-exposed-tags:nil -sgml-local-catalogs:("/usr/lib/sgml/catalog") -sgml-local-ecat-files:nil -End: ---> |