Support ORDER BY within aggregate function calls, at long last providing a

non-kluge method for controlling the order in which values are fed to an aggregate function. At the same time eliminate the old implementation restriction that DISTINCT was only supported for single-argument aggregates. Possibly release-notable behavioral change: formerly, agg(DISTINCT x) dropped null values of x unconditionally. Now, it does so only if the agg transition function is strict; otherwise nulls are treated as DISTINCT normally would, ie, you get one copy. Andrew Gierth, reviewed by Hitoshi Harada
author: Tom Lane <tgl@sss.pgh.pa.us> 2009-12-15 17:57:48 +0000
committer: Tom Lane <tgl@sss.pgh.pa.us> 2009-12-15 17:57:48 +0000
commit: 34d26872ed816b299eef2fa4240d55316697f42d (patch)
tree: d7373e29b365c151702c44325f0f5cc9dcc3e17c /doc/src
parent: 6a6efb964092902bf53965649c3ed78b1868b37e (diff)
download: postgresql-34d26872ed816b299eef2fa4240d55316697f42d.tar.gz
postgresql-34d26872ed816b299eef2fa4240d55316697f42d.zip
2 files changed, 73 insertions, 31 deletions
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f2906cc8223..7d6125c97e5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.492 2009/11/24 19:21:15 petere Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.493 2009/12/15 17:57:46 tgl Exp $ -->
 
  <chapter id="functions">
   <title>Functions and Operators</title>
@@ -8440,7 +8440,8 @@ SELECT xmlroot(xmlparse(document '<?xml version="1.1"?><content>abc</content>'),
      The function <function>xmlagg</function> is, unlike the other
      functions described here, an aggregate function.  It concatenates the
      input values to the aggregate function call,
-     like <function>xmlconcat</function> does.
+     much like <function>xmlconcat</function> does, except that concatenation
+     occurs across rows rather than across expressions in a single row.
      See <xref linkend="functions-aggregate"> for additional information
      about aggregate functions.
     </para>
@@ -8459,18 +8460,29 @@ SELECT xmlagg(x) FROM test;
     </para>
 
     <para>
-     To determine the order of the concatenation, something like the
-     following approach can be used:
+     To determine the order of the concatenation, an <literal>ORDER BY</>
+     clause may be added to the aggregate call as described in
+     <xref linkend="syntax-aggregates">. For example:
 
 <screen><![CDATA[
-SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
+SELECT xmlagg(x ORDER BY y DESC) FROM test;
         xmlagg
 ----------------------
  <bar/><foo>abc</foo>
 ]]></screen>
+    </para>
 
-     Again, see <xref linkend="functions-aggregate"> for additional
-     information.
+    <para>
+     The following non-standard approach used to be recommended
+     in previous versions, and may still be useful in specific
+     cases:
+
+<screen><![CDATA[
+SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
+        xmlagg
+----------------------
+ <bar/><foo>abc</foo>
+]]></screen>
     </para>
    </sect3>
 
@@ -9887,20 +9899,19 @@ SELECT count(*) FROM sometable;
    The aggregate functions <function>array_agg</function>
    and <function>xmlagg</function>, as well as similar user-defined
    aggregate functions, produce meaningfully different result values
-   depending on the order of the input values.  In the current
-   implementation, the order of the input is in principle unspecified.
-   Supplying the input values from a sorted subquery
-   will usually work, however.  For example:
+   depending on the order of the input values.  This ordering is
+   unspecified by default, but can be controlled by writing an
+   <literal>ORDER BY</> clause within the aggregate call, as shown in
+   <xref linkend="syntax-aggregates">.
+   Alternatively, supplying the input values from a sorted subquery
+   will usually work.  For example:
 
 <screen><![CDATA[
 SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
 ]]></screen>
 
    But this syntax is not allowed in the SQL standard, and is
-   not portable to other database systems.  A future version of
-   <productname>PostgreSQL</> might provide an additional feature to control
-   the order in a better-defined way (<literal>xmlagg(expr ORDER BY expr, expr,
-   ...)</literal>).
+   not portable to other database systems.
   </para>
 
   <para>
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 2d11d20984c..984908927f7 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.138 2009/11/05 23:24:22 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.139 2009/12/15 17:57:46 tgl Exp $ -->
 
 <chapter id="sql-syntax">
  <title>SQL Syntax</title>
@@ -1525,17 +1525,19 @@ sqrt(2)
     syntax of an aggregate expression is one of the following:
 
 <synopsis>
-<replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable> [ , ... ] )
-<replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable> [ , ... ] )
-<replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)
+<replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable> [ , ... ] [ <replaceable>order_by_clause</replaceable> ] )
+<replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable> [ , ... ] [ <replaceable>order_by_clause</replaceable> ] )
+<replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable> [ , ... ] [ <replaceable>order_by_clause</replaceable> ] )
 <replaceable>aggregate_name</replaceable> ( * )
 </synopsis>
 
     where <replaceable>aggregate_name</replaceable> is a previously
-    defined aggregate (possibly qualified with a schema name), and
+    defined aggregate (possibly qualified with a schema name),
     <replaceable>expression</replaceable> is
     any value expression that does not itself contain an aggregate
-    expression or a window function call.
+    expression or a window function call, and
+    <replaceable>order_by_clause</replaceable> is a optional
+    <literal>ORDER BY</> clause as described below.
    </para>
 
    <para>
@@ -1545,8 +1547,9 @@ sqrt(2)
     whether to ignore null values or not &mdash; but all the standard ones do.)
     The second form is the same as the first, since
     <literal>ALL</literal> is the default.  The third form invokes the
-    aggregate for all distinct non-null values of the expressions found
-    in the input rows.  The last form invokes the aggregate once for
+    aggregate for all distinct values of the expressions found
+    in the input rows (ignoring nulls if the function chooses to do so).
+    The last form invokes the aggregate once for
     each input row regardless of null or non-null values; since no
     particular input value is specified, it is generally only useful
     for the <function>count(*)</function> aggregate function.
@@ -1561,6 +1564,40 @@ sqrt(2)
    </para>
 
    <para>
+    Ordinarily, the input rows are fed to the aggregate function in an
+    unspecified order.  In many cases this does not matter; for example,
+    <function>min</> produces the same result no matter what order it
+    receives the inputs in.  However, some aggregate functions
+    (such as <function>array_agg</> and <function>xmlagg</>) produce
+    results that depend on the ordering of the input rows.  When using
+    such an aggregate, the optional <replaceable>order_by_clause</> can be
+    used to specify the desired ordering.  The <replaceable>order_by_clause</>
+    has the same syntax as for a query-level <literal>ORDER BY</> clause, as
+    described in <xref linkend="queries-order">, except that its expressions
+    are always just expressions and cannot be output-column names or numbers.
+    For example:
+
+<programlisting>
+SELECT array_agg(a ORDER BY b DESC) FROM table;
+</programlisting>
+   </para>
+
+   <para>
+    If <literal>DISTINCT</> is specified in addition to an
+    <replaceable>order_by_clause</>, then all the <literal>ORDER BY</>
+    expressions must match regular arguments of the aggregate; that is,
+    you cannot sort on an expression that is not included in the
+    <literal>DISTINCT</> list.
+   </para>
+
+   <note>
+    <para>
+     The ability to specify both <literal>DISTINCT</> and <literal>ORDER BY</>
+     in an aggregate function is a <productname>PostgreSQL</> extension.
+    </para>
+   </note>
+
+   <para>
     The predefined aggregate functions are described in <xref
     linkend="functions-aggregate">.  Other aggregate functions can be added
     by the user.
@@ -1588,13 +1625,6 @@ sqrt(2)
     appearing only in the result list or <literal>HAVING</> clause
     applies with respect to the query level that the aggregate belongs to.
    </para>
-
-   <note>
-    <para>
-     <productname>PostgreSQL</productname> currently does not support
-     <literal>DISTINCT</> with more than one input expression.
-    </para>
-   </note>
   </sect2>
 
   <sect2 id="syntax-window-functions">
@@ -1697,7 +1727,8 @@ ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
     <literal>count(*) OVER (PARTITION BY x ORDER BY y)</>.
     <literal>*</> is customarily not used for non-aggregate window functions.
     Aggregate window functions, unlike normal aggregate functions, do not
-    allow <literal>DISTINCT</> to be used within the function argument list.
+    allow <literal>DISTINCT</> or <literal>ORDER BY</> to be used within the
+    function argument list.
    </para>
 
    <para>
author	Tom Lane <tgl@sss.pgh.pa.us>	2009-12-15 17:57:48 +0000
committer	Tom Lane <tgl@sss.pgh.pa.us>	2009-12-15 17:57:48 +0000
commit	34d26872ed816b299eef2fa4240d55316697f42d (patch)
tree	d7373e29b365c151702c44325f0f5cc9dcc3e17c /doc/src
parent	6a6efb964092902bf53965649c3ed78b1868b37e (diff)
download	postgresql-34d26872ed816b299eef2fa4240d55316697f42d.tar.gz postgresql-34d26872ed816b299eef2fa4240d55316697f42d.zip