aboutsummaryrefslogtreecommitdiff
path: root/doc/src/sgml/syntax.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/syntax.sgml')
-rw-r--r--doc/src/sgml/syntax.sgml109
1 files changed, 85 insertions, 24 deletions
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 4f50f431475..434a8941577 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1555,7 +1555,15 @@ sqrt(2)
</indexterm>
<indexterm zone="syntax-aggregates">
- <primary>filter</primary>
+ <primary>ordered-set aggregate</primary>
+ </indexterm>
+
+ <indexterm zone="syntax-aggregates">
+ <primary>WITHIN GROUP</primary>
+ </indexterm>
+
+ <indexterm zone="syntax-aggregates">
+ <primary>FILTER</primary>
</indexterm>
<para>
@@ -1570,6 +1578,7 @@ sqrt(2)
<replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable> [ , ... ] [ <replaceable>order_by_clause</replaceable> ] ) [ FILTER ( WHERE <replaceable>filter_clause</replaceable> ) ]
<replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable> [ , ... ] [ <replaceable>order_by_clause</replaceable> ] ) [ FILTER ( WHERE <replaceable>filter_clause</replaceable> ) ]
<replaceable>aggregate_name</replaceable> ( * ) [ FILTER ( WHERE <replaceable>filter_clause</replaceable> ) ]
+<replaceable>aggregate_name</replaceable> ( [ <replaceable>expression</replaceable> [ , ... ] ] ) WITHIN GROUP ( <replaceable>order_by_clause</replaceable> ) [ FILTER ( WHERE <replaceable>filter_clause</replaceable> ) ]
</synopsis>
where <replaceable>aggregate_name</replaceable> is a previously
@@ -1589,9 +1598,11 @@ sqrt(2)
The third form invokes the aggregate once for each distinct value
of the expression (or distinct set of values, for multiple expressions)
found in the input rows.
- The last form invokes the aggregate once for each input row; since no
+ The fourth form invokes the aggregate once for each input row; since no
particular input value is specified, it is generally only useful
for the <function>count(*)</function> aggregate function.
+ The last form is used with <firstterm>ordered-set</> aggregate
+ functions, which are described below.
</para>
<para>
@@ -1611,23 +1622,6 @@ sqrt(2)
</para>
<para>
- If <literal>FILTER</literal> is specified, then only the input
- rows for which the <replaceable>filter_clause</replaceable>
- evaluates to true are fed to the aggregate function; other rows
- are discarded. For example:
-<programlisting>
-SELECT
- count(*) AS unfiltered,
- count(*) FILTER (WHERE i < 5) AS filtered
-FROM generate_series(1,10) AS s(i);
- unfiltered | filtered
-------------+----------
- 10 | 4
-(1 row)
-</programlisting>
- </para>
-
- <para>
Ordinarily, the input rows are fed to the aggregate function in an
unspecified order. In many cases this does not matter; for example,
<function>min</> produces the same result no matter what order it
@@ -1677,6 +1671,71 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect
</note>
<para>
+ Placing <literal>ORDER BY</> within the aggregate's regular argument
+ list, as described so far, is used when ordering the input rows for
+ a <quote>normal</> aggregate for which ordering is optional. There is a
+ subclass of aggregate functions called <firstterm>ordered-set
+ aggregates</> for which an <replaceable>order_by_clause</replaceable>
+ is <emphasis>required</>, usually because the aggregate's computation is
+ only sensible in terms of a specific ordering of its input rows.
+ Typical examples of ordered-set aggregates include rank and percentile
+ calculations. For an ordered-set aggregate,
+ the <replaceable>order_by_clause</replaceable> is written
+ inside <literal>WITHIN GROUP (...)</>, as shown in the final syntax
+ alternative above. The expressions in
+ the <replaceable>order_by_clause</replaceable> are evaluated once per
+ input row just like normal aggregate arguments, sorted as per
+ the <replaceable>order_by_clause</replaceable>'s requirements, and fed
+ to the aggregate function as input arguments. (This is unlike the case
+ for a non-<literal>WITHIN GROUP</> <replaceable>order_by_clause</>,
+ which is not treated as argument(s) to the aggregate function.) The
+ argument expressions preceding <literal>WITHIN GROUP</>, if any, are
+ called <firstterm>direct arguments</> to distinguish them from
+ the <firstterm>aggregated arguments</> listed in
+ the <replaceable>order_by_clause</replaceable>. Unlike normal aggregate
+ arguments, direct arguments are evaluated only once per aggregate call,
+ not once per input row. This means that they can contain variables only
+ if those variables are grouped by <literal>GROUP BY</>; this restriction
+ is the same as if the direct arguments were not inside an aggregate
+ expression at all. Direct arguments are typically used for things like
+ percentile fractions, which only make sense as a single value per
+ aggregation calculation. The direct argument list can be empty; in this
+ case, write just <literal>()</> not <literal>(*)</>.
+ (<productname>PostgreSQL</> will actually accept either spelling, but
+ only the first way conforms to the SQL standard.)
+ An example of an ordered-set aggregate call is:
+
+<programlisting>
+SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY income) FROM households;
+ percentile_disc
+-----------------
+ 50489
+</programlisting>
+
+ which obtains the 50th percentile, or median, value of
+ the <structfield>income</> column from table <structname>households</>.
+ Here, <literal>0.5</> is a direct argument; it would make no sense
+ for the percentile fraction to be a value varying across rows.
+ </para>
+
+ <para>
+ If <literal>FILTER</literal> is specified, then only the input
+ rows for which the <replaceable>filter_clause</replaceable>
+ evaluates to true are fed to the aggregate function; other rows
+ are discarded. For example:
+<programlisting>
+SELECT
+ count(*) AS unfiltered,
+ count(*) FILTER (WHERE i < 5) AS filtered
+FROM generate_series(1,10) AS s(i);
+ unfiltered | filtered
+------------+----------
+ 10 | 4
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
The predefined aggregate functions are described in <xref
linkend="functions-aggregate">. Other aggregate functions can be added
by the user.
@@ -1695,7 +1754,8 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect
<xref linkend="sql-syntax-scalar-subqueries"> and
<xref linkend="functions-subquery">), the aggregate is normally
evaluated over the rows of the subquery. But an exception occurs
- if the aggregate's arguments contain only outer-level variables:
+ if the aggregate's arguments (and <replaceable>filter_clause</replaceable>
+ if any) contain only outer-level variables:
the aggregate then belongs to the nearest such outer level, and is
evaluated over the rows of that query. The aggregate expression
as a whole is then an outer reference for the subquery it appears in,
@@ -1856,15 +1916,16 @@ UNBOUNDED FOLLOWING
If <literal>FILTER</literal> is specified, then only the input
rows for which the <replaceable>filter_clause</replaceable>
evaluates to true are fed to the window function; other rows
- are discarded. Only aggregate window functions accept
+ are discarded. Only window functions that are aggregates accept
a <literal>FILTER</literal> clause.
</para>
<para>
The built-in window functions are described in <xref
linkend="functions-window-table">. Other window functions can be added by
- the user. Also, any built-in or user-defined aggregate function can be
- used as a window function.
+ the user. Also, any built-in or user-defined normal aggregate function
+ can be used as a window function. Ordered-set aggregates presently
+ cannot be used as window functions, however.
</para>
<para>
@@ -1885,7 +1946,7 @@ UNBOUNDED FOLLOWING
<para>
More information about window functions can be found in
<xref linkend="tutorial-window">,
- <xref linkend="functions-window">,
+ <xref linkend="functions-window">, and
<xref linkend="queries-window">.
</para>
</sect2>