1 files changed, 191 insertions, 9 deletions
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index e3b6be4d97b..b3d72ceb7f8 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/queries.sgml,v 1.45 2008/02/15 22:17:06 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/queries.sgml,v 1.46 2008/10/04 21:56:52 tgl Exp $ -->
 
 <chapter id="queries">
  <title>Queries</title>
@@ -28,10 +28,11 @@
    used to specify queries.  The general syntax of the
    <command>SELECT</command> command is
 <synopsis>
-SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional><replaceable>sort_specification</replaceable></optional>
+<optional>WITH <replaceable>with_queries</replaceable></optional> SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional><replaceable>sort_specification</replaceable></optional>
 </synopsis>
    The following sections describe the details of the select list, the
-   table expression, and the sort specification.
+   table expression, and the sort specification.  <literal>WITH</>
+   queries are treated last since they are an advanced feature.
   </para>
 
   <para>
@@ -107,7 +108,7 @@ SELECT random();
 
   <sect2 id="queries-from">
    <title>The <literal>FROM</literal> Clause</title>
- 
+
    <para>
     The <xref linkend="sql-from" endterm="sql-from-title"> derives a
     table from one or more other tables given in a comma-separated
@@ -211,7 +212,7 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
 <replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> USING ( <replaceable>join column list</replaceable> )
 <replaceable>T1</replaceable> NATURAL { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable>
 </synopsis>
-        
+
        <para>
         The words <literal>INNER</literal> and
         <literal>OUTER</literal> are optional in all forms.
@@ -303,7 +304,7 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
           </para>
          </listitem>
         </varlistentry>
-         
+
         <varlistentry>
          <term><literal>RIGHT OUTER JOIN</></term>
 
@@ -326,7 +327,7 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
           </para>
          </listitem>
         </varlistentry>
-         
+
         <varlistentry>
          <term><literal>FULL OUTER JOIN</></term>
 
@@ -1042,7 +1043,7 @@ SELECT a AS value, b + c AS sum FROM ...
    <para>
     If no output column name is specified using <literal>AS</>,
     the system assigns a default column name.  For simple column references,
-    this is the name of the referenced column.  For function 
+    this is the name of the referenced column.  For function
     calls, this is the name of the function.  For complex expressions,
     the system will generate a generic name.
    </para>
@@ -1302,7 +1303,7 @@ SELECT a, max(b) FROM table1 GROUP BY a ORDER BY 1;
 <programlisting>
 SELECT a + b AS sum, c FROM table1 ORDER BY sum + c;          -- wrong
 </programlisting>
-   This restriction is made to reduce ambiguity.  There is still 
+   This restriction is made to reduce ambiguity.  There is still
    ambiguity if an <literal>ORDER BY</> item is a simple name that
    could match either an output column name or a column from the table
    expression.  The output column is used in such cases.  This would
@@ -1455,4 +1456,185 @@ SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression
 
  </sect1>
 
+
+ <sect1 id="queries-with">
+  <title><literal>WITH</literal> Queries</title>
+
+  <indexterm zone="queries-with">
+   <primary>WITH</primary>
+   <secondary>in SELECT</secondary>
+  </indexterm>
+
+  <indexterm>
+   <primary>common table expression</primary>
+   <see>WITH</see>
+  </indexterm>
+
+  <para>
+   <literal>WITH</> provides a way to write subqueries for use in a larger
+   <literal>SELECT</> query.  The subqueries can be thought of as defining
+   temporary tables that exist just for this query.  One use of this feature
+   is to break down complicated queries into simpler parts.  An example is:
+
+<programlisting>
+WITH regional_sales AS (
+        SELECT region, SUM(amount) AS total_sales
+        FROM orders
+        GROUP BY region
+     ), top_regions AS (
+        SELECT region
+        FROM regional_sales
+        WHERE total_sales &gt; (SELECT SUM(total_sales)/10 FROM regional_sales)
+     )
+SELECT region,
+       product,
+       SUM(quantity) AS product_units,
+       SUM(amount) AS product_sales
+FROM orders
+WHERE region IN (SELECT region FROM top_regions)
+GROUP BY region, product;
+</programlisting>
+
+   which displays per-product sales totals in only the top sales regions.
+   This example could have been written without <literal>WITH</>,
+   but we'd have needed two levels of nested sub-SELECTs.  It's a bit
+   easier to follow this way.
+  </para>
+
+  <para>
+   The optional <literal>RECURSIVE</> modifier changes <literal>WITH</>
+   from a mere syntactic convenience into a feature that accomplishes
+   things not otherwise possible in standard SQL.  Using
+   <literal>RECURSIVE</>, a <literal>WITH</> query can refer to its own
+   output.  A very simple example is this query to sum the integers from 1
+   through 100:
+
+<programlisting>
+WITH RECURSIVE t(n) AS (
+    VALUES (1)
+  UNION ALL
+    SELECT n+1 FROM t WHERE n &lt; 100
+)
+SELECT sum(n) FROM t;
+</programlisting>
+
+   The general form of a recursive <literal>WITH</> query is always a
+   <firstterm>non-recursive term</>, then <literal>UNION ALL</>, then a
+   <firstterm>recursive term</>, where only the recursive term can contain
+   a reference to the query's own output.  Such a query is executed as
+   follows:
+  </para>
+
+  <procedure>
+   <title>Recursive Query Evaluation</title>
+
+   <step performance="required">
+    <para>
+     Evaluate the non-recursive term.  Include all its output rows in the
+     result of the recursive query, and also place them in a temporary
+     <firstterm>working table</>.
+    </para>
+   </step>
+
+   <step performance="required">
+    <para>
+     So long as the working table is not empty, repeat these steps:
+    </para>
+    <substeps>
+     <step performance="required">
+      <para>
+       Evaluate the recursive term, substituting the current contents of
+       the working table for the recursive self-reference.  Include all its
+       output rows in the result of the recursive query, and also place them
+       in a temporary <firstterm>intermediate table</>.
+      </para>
+     </step>
+
+     <step performance="required">
+      <para>
+       Replace the contents of the working table with the contents of the
+       intermediate table, then empty the intermediate table.
+      </para>
+     </step>
+    </substeps>
+   </step>
+  </procedure>
+
+  <note>
+   <para>
+    Strictly speaking, this process is iteration not recursion, but
+    <literal>RECURSIVE</> is the terminology chosen by the SQL standards
+    committee.
+   </para>
+  </note>
+
+  <para>
+   In the example above, the working table has just a single row in each step,
+   and it takes on the values from 1 through 100 in successive steps.  In
+   the 100th step, there is no output because of the <literal>WHERE</>
+   clause, and so the query terminates.
+  </para>
+
+  <para>
+   Recursive queries are typically used to deal with hierarchical or
+   tree-structured data.  A useful example is this query to find all the
+   direct and indirect sub-parts of a product, given only a table that
+   shows immediate inclusions:
+
+<programlisting>
+WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
+    SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
+  UNION ALL
+    SELECT p.sub_part, p.part, p.quantity
+    FROM included_parts pr, parts p
+    WHERE p.part = pr.sub_part
+  )
+SELECT sub_part, SUM(quantity) as total_quantity
+FROM included_parts
+GROUP BY sub_part
+</programlisting>
+  </para>
+
+  <para>
+   When working with recursive queries it is important to be sure that
+   the recursive part of the query will eventually return no tuples,
+   or else the query will loop indefinitely.  A useful trick for
+   development purposes is to place a <literal>LIMIT</> in the parent
+   query.  For example, this query would loop forever without the
+   <literal>LIMIT</>:
+
+<programlisting>
+WITH RECURSIVE t(n) AS (
+    SELECT 1
+  UNION ALL
+    SELECT n+1 FROM t
+)
+SELECT n FROM t LIMIT 100;
+</programlisting>
+
+   This works because <productname>PostgreSQL</productname>'s implementation
+   evaluates only as many rows of a <literal>WITH</> query as are actually
+   demanded by the parent query.  Using this trick in production is not
+   recommended, because other systems might work differently.
+  </para>
+
+  <para>
+   A useful property of <literal>WITH</> queries is that they are evaluated
+   only once per execution of the parent query, even if they are referred to
+   more than once by the parent query or sibling <literal>WITH</> queries.
+   Thus, expensive calculations that are needed in multiple places can be
+   placed within a <literal>WITH</> query to avoid redundant work.  Another
+   possible application is to prevent unwanted multiple evaluations of
+   functions with side-effects.
+   However, the other side of this coin is that the optimizer is less able to
+   push restrictions from the parent query down into a <literal>WITH</> query
+   than an ordinary sub-query.  The <literal>WITH</> query will generally be
+   evaluated as stated, without suppression of rows that the parent query
+   might discard afterwards.  (But, as mentioned above, evaluation might stop
+   early if the reference(s) to the query demand only a limited number of
+   rows.)
+  </para>
+
+ </sect1>
+
 </chapter>