aboutsummaryrefslogtreecommitdiff
path: root/doc/src/sgml/intagg.sgml
blob: 29e74ce146cc9bca4b20e74fd36e97ca60c99c46 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
<!-- doc/src/sgml/intagg.sgml -->

<sect1 id="intagg" xreflabel="intagg">
 <title>intagg &mdash; integer aggregator and enumerator</title>

 <indexterm zone="intagg">
  <primary>intagg</primary>
 </indexterm>

 <para>
  The <filename>intagg</filename> module provides an integer aggregator and an
  enumerator.  <filename>intagg</filename> is now obsolete, because there
  are built-in functions that provide a superset of its capabilities.
  However, the module is still provided as a compatibility wrapper around
  the built-in functions.
 </para>

 <sect2 id="intagg-functions">
  <title>Functions</title>

 <indexterm>
  <primary>int_array_aggregate</primary>
 </indexterm>

 <indexterm>
  <primary>array_agg</primary>
 </indexterm>

 <para>
  The aggregator is an aggregate function
  <function>int_array_aggregate(integer)</function>
  that produces an integer array
  containing exactly the integers it is fed.
  This is a wrapper around <function>array_agg</function>,
  which does the same thing for any array type.
 </para>

 <indexterm>
  <primary>int_array_enum</primary>
 </indexterm>

 <para>
  The enumerator is a function
  <function>int_array_enum(integer[])</function>
  that returns <type>setof integer</type>.  It is essentially the reverse
  operation of the aggregator: given an array of integers, expand it
  into a set of rows.  This is a wrapper around <function>unnest</function>,
  which does the same thing for any array type.
 </para>

 </sect2>

 <sect2 id="intagg-samples">
  <title>Sample Uses</title>

  <para>
   Many database systems have the notion of a many to many table. Such a table
   usually sits between two indexed tables, for example:

<programlisting>
CREATE TABLE left_table  (id INT PRIMARY KEY, ...);
CREATE TABLE right_table (id INT PRIMARY KEY, ...);
CREATE TABLE many_to_many(id_left  INT REFERENCES left_table,
                          id_right INT REFERENCES right_table);
</programlisting>

  It is typically used like this:

<programlisting>
SELECT right_table.*
FROM right_table JOIN many_to_many ON (right_table.id = many_to_many.id_right)
WHERE many_to_many.id_left = <replaceable>item</replaceable>;
</programlisting>

  This will return all the items in the right hand table for an entry
  in the left hand table. This is a very common construct in SQL.
 </para>

 <para>
  Now, this methodology can be cumbersome with a very large number of
  entries in the <structname>many_to_many</structname> table.  Often,
  a join like this would result in an index scan
  and a fetch for each right hand entry in the table for a particular
  left hand entry. If you have a very dynamic system, there is not much you
  can do. However, if you have some data which is fairly static, you can
  create a summary table with the aggregator.

<programlisting>
CREATE TABLE summary AS
  SELECT id_left, int_array_aggregate(id_right) AS rights
  FROM many_to_many
  GROUP BY id_left;
</programlisting>

  This will create a table with one row per left item, and an array
  of right items. Now this is pretty useless without some way of using
  the array; that's why there is an array enumerator.  You can do

<programlisting>
SELECT id_left, int_array_enum(rights) FROM summary WHERE id_left = <replaceable>item</replaceable>;
</programlisting>

  The above query using <function>int_array_enum</function> produces the same results
  as

<programlisting>
SELECT id_left, id_right FROM many_to_many WHERE id_left = <replaceable>item</replaceable>;
</programlisting>

  The difference is that the query against the summary table has to get
  only one row from the table, whereas the direct query against
  <structname>many_to_many</structname> must index scan and fetch a row for each entry.
 </para>

 <para>
  On one system, an <command>EXPLAIN</command> showed a query with a cost of 8488 was
  reduced to a cost of 329.  The original query was a join involving the
  <structname>many_to_many</structname> table, which was replaced by:

<programlisting>
SELECT id_right, count(id_right) FROM
  ( SELECT id_left, int_array_enum(rights) AS id_right
    FROM summary
    JOIN (SELECT id FROM left_table
          WHERE id = <replaceable>item</replaceable>) AS lefts
    ON (summary.id_left = lefts.id)
  ) AS list
  GROUP BY id_right
  ORDER BY count DESC;
</programlisting>
 </para>

 </sect2>

</sect1>