aboutsummaryrefslogtreecommitdiff
path: root/doc/src/sgml/query.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/query.sgml')
-rw-r--r--doc/src/sgml/query.sgml73
1 files changed, 62 insertions, 11 deletions
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index a1830eb0297..961bc0d9dc7 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -361,39 +361,90 @@ DELETE FROM classname;
Like most other query languages,
<ProductName>PostgreSQL</ProductName> supports
aggregate functions.
- The current implementation of
- <ProductName>Postgres</ProductName> aggregate functions have some limitations.
- Specifically, while there are aggregates to compute
- such functions as the <Function>count</Function>, <Function>sum</Function>,
+ An aggregate function computes a single result from multiple input rows.
+ For example, there are aggregates to compute the
+ <Function>count</Function>, <Function>sum</Function>,
<Function>avg</Function> (average), <Function>max</Function> (maximum) and
- <Function>min</Function> (minimum) over a set of instances, aggregates can only
- appear in the target list of a query and not directly in the
- qualification (the where clause). As an example,
+ <Function>min</Function> (minimum) over a set of instances.
+ </para>
+
+ <Para>
+ It is important to understand the interaction between aggregates and
+ SQL's <Command>where</Command> and <Command>having</Command> clauses.
+ The fundamental difference between <Command>where</Command> and
+ <Command>having</Command> is this: <Command>where</Command> selects
+ input rows before groups and aggregates are computed (thus, it controls
+ which rows go into the aggregate computation), whereas
+ <Command>having</Command> selects group rows after groups and
+ aggregates are computed. Thus, the
+ <Command>where</Command> clause may not contain aggregate functions;
+ it makes no sense to try to use an aggregate to determine which rows
+ will be inputs to the aggregates. On the other hand,
+ <Command>having</Command> clauses always contain aggregate functions.
+ (Strictly speaking, you are allowed to write a <Command>having</Command>
+ clause that doesn't use aggregates, but it's wasteful; the same condition
+ could be used more efficiently at the <Command>where</Command> stage.)
+ </para>
+
+ <Para>
+ As an example, we can find the highest low-temperature reading anywhere
+ with
<ProgramListing>
SELECT max(temp_lo) FROM weather;
</ProgramListing>
- is allowed, while
+ If we want to know which city (or cities) that reading occurred in,
+ we might try
<ProgramListing>
SELECT city FROM weather WHERE temp_lo = max(temp_lo);
</ProgramListing>
- is not. However, as is often the case the query can be restated to accomplish
- the intended result; here by using a <FirstTerm>subselect</FirstTerm>:
+ but this will not work since the aggregate max() can't be used in
+ <Command>where</Command>. However, as is often the case the query can be
+ restated to accomplish the intended result; here by using a
+ <FirstTerm>subselect</FirstTerm>:
<ProgramListing>
SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
</ProgramListing>
+ This is OK because the sub-select is an independent computation that
+ computes its own aggregate separately from what's happening in the outer
+ select.
</Para>
<Para>
- Aggregates may also have <FirstTerm>group by</FirstTerm> clauses:
+ Aggregates are also very useful in combination with
+ <FirstTerm>group by</FirstTerm> clauses. For example, we can get the
+ maximum low temperature observed in each city with
<ProgramListing>
SELECT city, max(temp_lo)
FROM weather
GROUP BY city;
</ProgramListing>
+ which gives us one output row per city. We can filter these grouped
+ rows using <Command>having</Command>:
+ <ProgramListing>
+SELECT city, max(temp_lo)
+ FROM weather
+ GROUP BY city
+ HAVING min(temp_lo) < 0;
+ </ProgramListing>
+ which gives us the same results for only the cities that have some
+ below-zero readings. Finally, if we only care about cities whose
+ names begin with 'P', we might do
+ <ProgramListing>
+SELECT city, max(temp_lo)
+ FROM weather
+ WHERE city like 'P%'
+ GROUP BY city
+ HAVING min(temp_lo) < 0;
+ </ProgramListing>
+ Note that we can apply the city-name restriction in
+ <Command>where</Command>, since it needs no aggregate. This is
+ more efficient than adding the restriction to <Command>having</Command>,
+ because we avoid doing the grouping and aggregate calculations
+ for all rows that fail the <Command>where</Command> check.
</Para>
</sect1>
</Chapter>