diff options
Diffstat (limited to 'doc/src/sgml/syntax.sgml')
-rw-r--r-- | doc/src/sgml/syntax.sgml | 109 |
1 files changed, 85 insertions, 24 deletions
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 4f50f431475..434a8941577 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -1555,7 +1555,15 @@ sqrt(2) </indexterm> <indexterm zone="syntax-aggregates"> - <primary>filter</primary> + <primary>ordered-set aggregate</primary> + </indexterm> + + <indexterm zone="syntax-aggregates"> + <primary>WITHIN GROUP</primary> + </indexterm> + + <indexterm zone="syntax-aggregates"> + <primary>FILTER</primary> </indexterm> <para> @@ -1570,6 +1578,7 @@ sqrt(2) <replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable> [ , ... ] [ <replaceable>order_by_clause</replaceable> ] ) [ FILTER ( WHERE <replaceable>filter_clause</replaceable> ) ] <replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable> [ , ... ] [ <replaceable>order_by_clause</replaceable> ] ) [ FILTER ( WHERE <replaceable>filter_clause</replaceable> ) ] <replaceable>aggregate_name</replaceable> ( * ) [ FILTER ( WHERE <replaceable>filter_clause</replaceable> ) ] +<replaceable>aggregate_name</replaceable> ( [ <replaceable>expression</replaceable> [ , ... ] ] ) WITHIN GROUP ( <replaceable>order_by_clause</replaceable> ) [ FILTER ( WHERE <replaceable>filter_clause</replaceable> ) ] </synopsis> where <replaceable>aggregate_name</replaceable> is a previously @@ -1589,9 +1598,11 @@ sqrt(2) The third form invokes the aggregate once for each distinct value of the expression (or distinct set of values, for multiple expressions) found in the input rows. - The last form invokes the aggregate once for each input row; since no + The fourth form invokes the aggregate once for each input row; since no particular input value is specified, it is generally only useful for the <function>count(*)</function> aggregate function. + The last form is used with <firstterm>ordered-set</> aggregate + functions, which are described below. </para> <para> @@ -1611,23 +1622,6 @@ sqrt(2) </para> <para> - If <literal>FILTER</literal> is specified, then only the input - rows for which the <replaceable>filter_clause</replaceable> - evaluates to true are fed to the aggregate function; other rows - are discarded. For example: -<programlisting> -SELECT - count(*) AS unfiltered, - count(*) FILTER (WHERE i < 5) AS filtered -FROM generate_series(1,10) AS s(i); - unfiltered | filtered -------------+---------- - 10 | 4 -(1 row) -</programlisting> - </para> - - <para> Ordinarily, the input rows are fed to the aggregate function in an unspecified order. In many cases this does not matter; for example, <function>min</> produces the same result no matter what order it @@ -1677,6 +1671,71 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect </note> <para> + Placing <literal>ORDER BY</> within the aggregate's regular argument + list, as described so far, is used when ordering the input rows for + a <quote>normal</> aggregate for which ordering is optional. There is a + subclass of aggregate functions called <firstterm>ordered-set + aggregates</> for which an <replaceable>order_by_clause</replaceable> + is <emphasis>required</>, usually because the aggregate's computation is + only sensible in terms of a specific ordering of its input rows. + Typical examples of ordered-set aggregates include rank and percentile + calculations. For an ordered-set aggregate, + the <replaceable>order_by_clause</replaceable> is written + inside <literal>WITHIN GROUP (...)</>, as shown in the final syntax + alternative above. The expressions in + the <replaceable>order_by_clause</replaceable> are evaluated once per + input row just like normal aggregate arguments, sorted as per + the <replaceable>order_by_clause</replaceable>'s requirements, and fed + to the aggregate function as input arguments. (This is unlike the case + for a non-<literal>WITHIN GROUP</> <replaceable>order_by_clause</>, + which is not treated as argument(s) to the aggregate function.) The + argument expressions preceding <literal>WITHIN GROUP</>, if any, are + called <firstterm>direct arguments</> to distinguish them from + the <firstterm>aggregated arguments</> listed in + the <replaceable>order_by_clause</replaceable>. Unlike normal aggregate + arguments, direct arguments are evaluated only once per aggregate call, + not once per input row. This means that they can contain variables only + if those variables are grouped by <literal>GROUP BY</>; this restriction + is the same as if the direct arguments were not inside an aggregate + expression at all. Direct arguments are typically used for things like + percentile fractions, which only make sense as a single value per + aggregation calculation. The direct argument list can be empty; in this + case, write just <literal>()</> not <literal>(*)</>. + (<productname>PostgreSQL</> will actually accept either spelling, but + only the first way conforms to the SQL standard.) + An example of an ordered-set aggregate call is: + +<programlisting> +SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY income) FROM households; + percentile_disc +----------------- + 50489 +</programlisting> + + which obtains the 50th percentile, or median, value of + the <structfield>income</> column from table <structname>households</>. + Here, <literal>0.5</> is a direct argument; it would make no sense + for the percentile fraction to be a value varying across rows. + </para> + + <para> + If <literal>FILTER</literal> is specified, then only the input + rows for which the <replaceable>filter_clause</replaceable> + evaluates to true are fed to the aggregate function; other rows + are discarded. For example: +<programlisting> +SELECT + count(*) AS unfiltered, + count(*) FILTER (WHERE i < 5) AS filtered +FROM generate_series(1,10) AS s(i); + unfiltered | filtered +------------+---------- + 10 | 4 +(1 row) +</programlisting> + </para> + + <para> The predefined aggregate functions are described in <xref linkend="functions-aggregate">. Other aggregate functions can be added by the user. @@ -1695,7 +1754,8 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect <xref linkend="sql-syntax-scalar-subqueries"> and <xref linkend="functions-subquery">), the aggregate is normally evaluated over the rows of the subquery. But an exception occurs - if the aggregate's arguments contain only outer-level variables: + if the aggregate's arguments (and <replaceable>filter_clause</replaceable> + if any) contain only outer-level variables: the aggregate then belongs to the nearest such outer level, and is evaluated over the rows of that query. The aggregate expression as a whole is then an outer reference for the subquery it appears in, @@ -1856,15 +1916,16 @@ UNBOUNDED FOLLOWING If <literal>FILTER</literal> is specified, then only the input rows for which the <replaceable>filter_clause</replaceable> evaluates to true are fed to the window function; other rows - are discarded. Only aggregate window functions accept + are discarded. Only window functions that are aggregates accept a <literal>FILTER</literal> clause. </para> <para> The built-in window functions are described in <xref linkend="functions-window-table">. Other window functions can be added by - the user. Also, any built-in or user-defined aggregate function can be - used as a window function. + the user. Also, any built-in or user-defined normal aggregate function + can be used as a window function. Ordered-set aggregates presently + cannot be used as window functions, however. </para> <para> @@ -1885,7 +1946,7 @@ UNBOUNDED FOLLOWING <para> More information about window functions can be found in <xref linkend="tutorial-window">, - <xref linkend="functions-window">, + <xref linkend="functions-window">, and <xref linkend="queries-window">. </para> </sect2> |