Allow aggregates to provide estimates of their transition state data size.

Formerly the planner had a hard-wired rule of thumb for guessing the amount of space consumed by an aggregate function's transition state data. This estimate is critical to deciding whether it's OK to use hash aggregation, and in many situations the built-in estimate isn't very good. This patch adds a column to pg_aggregate wherein a per-aggregate estimate can be provided, overriding the planner's default, and infrastructure for setting the column via CREATE AGGREGATE. It may be that additional smarts will be required in future, perhaps even a per-aggregate estimation function. But this is already a step forward. This is extracted from a larger patch to improve the performance of numeric and int8 aggregates. I (tgl) thought it was worth reviewing and committing this infrastructure separately. In this commit, all built-in aggregates are given aggtransspace = 0, so no behavior should change. Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra
author: Tom Lane <tgl@sss.pgh.pa.us> 2013-11-16 16:03:40 -0500
committer: Tom Lane <tgl@sss.pgh.pa.us> 2013-11-16 16:03:40 -0500
commit: 6cb86143e8e1e855255edc706bce71c6ebfd9a6c (patch)
tree: 2ed7cf0b5fe28b8ba858ae3e384534cdb7f31aa3 /doc/src
parent: 55c3d86a2a374f9d8fd88fd947601c1f49a4da08 (diff)
download: postgresql-6cb86143e8e1e855255edc706bce71c6ebfd9a6c.tar.gz
postgresql-6cb86143e8e1e855255edc706bce71c6ebfd9a6c.zip
2 files changed, 25 insertions, 0 deletions
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9388df5ac27..acc261ca516 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -373,6 +373,13 @@
       <entry>Data type of the aggregate function's internal transition (state) data</entry>
      </row>
      <row>
+      <entry><structfield>aggtransspace</structfield></entry>
+      <entry><type>int4</type></entry>
+      <entry></entry>
+      <entry>Approximate average size (in bytes) of the transition state
+       data, or zero to use a default estimate</entry>
+     </row>
+     <row>
       <entry><structfield>agginitval</structfield></entry>
       <entry><type>text</type></entry>
       <entry></entry>
diff --git a/doc/src/sgml/ref/create_aggregate.sgml b/doc/src/sgml/ref/create_aggregate.sgml
index 2b35fa4d522..17819dd1a8e 100644
--- a/doc/src/sgml/ref/create_aggregate.sgml
+++ b/doc/src/sgml/ref/create_aggregate.sgml
@@ -24,6 +24,7 @@ PostgreSQL documentation
 CREATE AGGREGATE <replaceable class="parameter">name</replaceable> ( [ <replaceable class="parameter">argmode</replaceable> ] [ <replaceable class="parameter">arg_name</replaceable> ] <replaceable class="parameter">arg_data_type</replaceable> [ , ... ] ) (
     SFUNC = <replaceable class="PARAMETER">sfunc</replaceable>,
     STYPE = <replaceable class="PARAMETER">state_data_type</replaceable>
+    [ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
     [ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
     [ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
     [ , SORTOP = <replaceable class="PARAMETER">sort_operator</replaceable> ]
@@ -35,6 +36,7 @@ CREATE AGGREGATE <replaceable class="PARAMETER">name</replaceable> (
     BASETYPE = <replaceable class="PARAMETER">base_type</replaceable>,
     SFUNC = <replaceable class="PARAMETER">sfunc</replaceable>,
     STYPE = <replaceable class="PARAMETER">state_data_type</replaceable>
+    [ , SSPACE = <replaceable class="PARAMETER">state_data_size</replaceable> ]
     [ , FINALFUNC = <replaceable class="PARAMETER">ffunc</replaceable> ]
     [ , INITCOND = <replaceable class="PARAMETER">initial_condition</replaceable> ]
     [ , SORTOP = <replaceable class="PARAMETER">sort_operator</replaceable> ]
@@ -265,6 +267,22 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
    </varlistentry>
 
    <varlistentry>
+    <term><replaceable class="PARAMETER">state_data_size</replaceable></term>
+    <listitem>
+     <para>
+      The approximate average size (in bytes) of the aggregate's state value.
+      If this parameter is omitted or is zero, a default estimate is used
+      based on the <replaceable>state_data_type</>.
+      The planner uses this value to estimate the memory required for a
+      grouped aggregate query.  The planner will consider using hash
+      aggregation for such a query only if the hash table is estimated to fit
+      in <xref linkend="guc-work-mem">; therefore, large values of this
+      parameter discourage use of hash aggregation.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><replaceable class="PARAMETER">ffunc</replaceable></term>
     <listitem>
      <para>
author	Tom Lane <tgl@sss.pgh.pa.us>	2013-11-16 16:03:40 -0500
committer	Tom Lane <tgl@sss.pgh.pa.us>	2013-11-16 16:03:40 -0500
commit	6cb86143e8e1e855255edc706bce71c6ebfd9a6c (patch)
tree	2ed7cf0b5fe28b8ba858ae3e384534cdb7f31aa3 /doc/src
parent	55c3d86a2a374f9d8fd88fd947601c1f49a4da08 (diff)
download	postgresql-6cb86143e8e1e855255edc706bce71c6ebfd9a6c.tar.gz postgresql-6cb86143e8e1e855255edc706bce71c6ebfd9a6c.zip