Add some documentation for constraint exclusion and basic partitioning.

From Simon Riggs; cleanup and editorialization by Neil Conway.
author: Neil Conway <neilc@samurai.com> 2005-11-01 23:19:05 +0000
committer: Neil Conway <neilc@samurai.com> 2005-11-01 23:19:05 +0000
commit: eddcd492fe68c5870640b2334fd0597a11025de1 (patch)
tree: d06eff8d678b5924b245d14d8127c43a47117606
parent: b524cb36ac2526049fe7978cadc28fca647f9b70 (diff)
download: postgresql-eddcd492fe68c5870640b2334fd0597a11025de1.tar.gz
postgresql-eddcd492fe68c5870640b2334fd0597a11025de1.zip
2 files changed, 718 insertions, 101 deletions
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a72fbc97e4c..ca02d87fdf2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.33 2005/10/26 12:55:07 momjian Exp $
+$PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.34 2005/11/01 23:19:05 neilc Exp $
 -->
 <chapter Id="runtime-config">
   <title>Server Configuration</title>
@@ -1974,11 +1974,11 @@ archive_command = 'copy "%p" /mnt/server/archivedir/"%f"'  # Windows
        </para>
 
        <para>
-        When this parameter is <literal>on</>, the planner compares query
-        conditions with table CHECK constraints, and omits scanning tables
-        where the conditions contradict the constraints.  (Presently
-        this is done only for child tables of inheritance scans.)  For
-        example:
+        When this parameter is <literal>on</>, the planner compares
+        query conditions with table <literal>CHECK</> constraints, and
+        omits scanning tables where the conditions contradict the
+        constraints.  (Presently this is done only for child tables of
+        inheritance scans.)  For example:
 
 <programlisting>
 CREATE TABLE parent(key integer, ...);
@@ -1988,23 +1988,30 @@ CREATE TABLE child2000(check (key between 2000 and 2999)) INHERITS(parent);
 SELECT * FROM parent WHERE key = 2400;
 </programlisting>
 
-        With constraint exclusion enabled, this SELECT will not scan
-        <structname>child1000</> at all.  This can improve performance when
-        inheritance is used to build partitioned tables.
+        With constraint exclusion enabled, this <command>SELECT</>
+        will not scan <structname>child1000</> at all.  This can
+        improve performance when inheritance is used to build
+        partitioned tables.
        </para>
 
        <para>
-        Currently, <varname>constraint_exclusion</> defaults to
-        <literal>off</>, because it risks incorrect results if
-        query plans are cached --- if a table constraint is changed or dropped,
-        the previously generated plan might now be wrong, and there is no
-        built-in mechanism to force re-planning.  (This deficiency will
-        probably be addressed in a future
-        <productname>PostgreSQL</productname> release.)  Another reason
-        for keeping it off is that the constraint checks are relatively
+        Currently, <varname>constraint_exclusion</> is disabled by
+        default because it risks incorrect results if query plans are
+        cached &mdash; if a table constraint is changed or dropped,
+        the previously generated plan might now be wrong, and there is
+        no built-in mechanism to force re-planning.  (This deficiency
+        will probably be addressed in a future
+        <productname>PostgreSQL</> release.)  Another reason for
+        keeping it off is that the constraint checks are relatively
         expensive, and in many circumstances will yield no savings.
-        It is recommended to turn this on only if you are actually using
-        partitioned tables designed to take advantage of the feature.
+        It is recommended to turn this on only if you are actually
+        using partitioned tables designed to take advantage of the
+        feature.
+       </para>
+
+       <para>
+        Refer to <xref linkend="ce-partitioning"> for more information
+        on using constraint exclusion and partitioning.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index e43630f25b1..0b2ca56a121 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/ddl.sgml,v 1.45 2005/10/23 19:29:49 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/ddl.sgml,v 1.46 2005/11/01 23:19:05 neilc Exp $ -->
 
 <chapter id="ddl">
  <title>Data Definition</title>
@@ -398,6 +398,13 @@ CREATE TABLE products (
     ensure that a column does not contain null values, the not-null
     constraint described in the next section can be used.
    </para>
+
+    <para>
+     Check constraints can also be used to enhance performance with
+     very large tables, when used in conjunction with the <xref
+     linkend="guc-constraint-exclusion"> parameter.  This is discussed
+     in more detail in <xref linkend="ce-partitioning">.
+   </para>
   </sect2>
 
   <sect2>
@@ -1040,19 +1047,39 @@ CREATE TABLE order_items (
  <sect1 id="ddl-inherit">
   <title>Inheritance</title>
 
-  <remark>This section needs to be rethought.  Some of the
-  information should go into the following chapters.</remark>
+   <indexterm>
+    <primary>not-null constraint</primary>
+   </indexterm>
+
+   <indexterm>
+    <primary>constraint</primary>
+    <secondary>NOT NULL</secondary>
+   </indexterm>
+
+  <para>
+   <productname>PostgreSQL</productname> implements table inheritance
+   which can be a useful tool for database designers.  The SQL:2003
+   standard optionally defines type inheritance which differs in many
+   respects from the features described here.
+  </para>
 
   <para>
-   Let's create two tables. The capitals  table  contains
-   state  capitals  which  are also cities. Naturally, the
-   capitals table should inherit from cities.
+   Let's start with an example: suppose we are trying to build a data
+   model for cities.  Each state has many cities, but only one
+   capital. We want to be able to quickly retrieve the capital city
+   for any particular state. This can be done by creating two tables,
+   one for state capitals and one for cities that are not
+   capitals. However, what happens when we want to ask for data about
+   a city, regardless of whether it is a capital or not? The
+   inheritance feature can help to resolve this problem. We define the
+   <literal>capitals</literal> table so that it inherits from
+   <literal>cities</literal>:
 
 <programlisting>
 CREATE TABLE cities (
     name            text,
     population      float,
-    altitude        int     -- (in ft)
+    altitude        int     -- in feet
 );
 
 CREATE TABLE capitals (
@@ -1060,24 +1087,19 @@ CREATE TABLE capitals (
 ) INHERITS (cities);
 </programlisting>
 
-   In this case, a row of capitals <firstterm>inherits</firstterm> all
-   attributes (name, population, and altitude) from its parent, cities.  State
-   capitals have an extra attribute, state, that shows their state.  In
-   <productname>PostgreSQL</productname>, a table can inherit from zero or
-   more other tables, and a query can reference either all rows of a table or
-   all rows of a table plus all of its descendants.
-
-   <note>
-    <para>
-     The inheritance hierarchy is actually a directed acyclic graph.
-    </para>
-   </note>
+   In this case, a row of <literal>capitals</> <firstterm>inherits</>
+   all the columns of its parent table, <literal>cities</>. State
+   capitals have an extra attribute, <literal>state</>, that shows
+   their state.
   </para>
 
   <para>
-    For example, the  following  query finds the  names  of  all  cities,
-    including  state capitals, that are located at an altitude 
-    over 500ft:
+   In <productname>PostgreSQL</productname>, a table can inherit from
+   zero or more other tables, and a query can reference either all
+   rows of a table or all rows of a table plus all of its descendants.
+   For example, the following query finds the names of all cities,
+   including state capitals, that are located at an altitude over
+   500ft:
 
 <programlisting>
 SELECT name, altitude
@@ -1097,9 +1119,8 @@ SELECT name, altitude
   </para>
 
   <para>
-    On the other hand, the  following  query  finds
-    all  the cities that are not state capitals and
-    are situated at an altitude over 500ft:
+   On the other hand, the following query finds all the cities that
+   are not state capitals and are situated at an altitude over 500ft:
 
 <programlisting>
 SELECT name, altitude
@@ -1110,60 +1131,55 @@ SELECT name, altitude
 -----------+----------
  Las Vegas |     2174
  Mariposa  |     1953
-</programlisting>         
+</programlisting>
   </para>
 
   <para>
-   Here the <quote>ONLY</quote> before cities indicates that the query should
-   be  run over only cities and not tables below cities in the
-   inheritance hierarchy.  Many of the  commands  that  we
-   have  already discussed -- <command>SELECT</command>,
-   <command>UPDATE</command> and <command>DELETE</command> --
-   support this <quote>ONLY</quote> notation.
+   Here the <literal>ONLY</literal> keyword indicates that the query
+   should apply only to <literal>cities</literal>, and not any tables
+   below <literal>cities</literal> in the inheritance hierarchy.  Many
+   of the commands that we have already discussed &mdash;
+   <command>SELECT</command>, <command>UPDATE</command> and
+   <command>DELETE</command> &mdash; support the
+   <literal>ONLY</literal> keyword.
   </para>
 
-
-<note>
-  <title>Inheritance and Permissions</title>
-  <para>
-  Because permissions are not inherited automatically a user attempting to access
-  a parent table must either have at least the same permission for the child table
-  or must use the <quote>ONLY</quote> notation. If creating a new inheritance 
-  relationship in an existing system be careful that this does not create problems.
-  </para>
-</note>
-  
   <note>
-   <title>Deprecated</title> 
+   <title>Inheritance and Permissions</title>
    <para>
-     In previous versions of <productname>PostgreSQL</productname>, the
-     default behavior was not to include child tables in queries. This was
-     found to be error prone and is also in violation of the SQL:2003
-     standard. Under the old syntax, to get the sub-tables you append
-     <literal>*</literal> to the table name.
-     For example
-<programlisting>
-SELECT * from cities*;
-</programlisting>
-     You can still explicitly specify scanning child tables by appending
-     <literal>*</literal>, as well as explicitly specify not scanning child tables by
-     writing <quote>ONLY</quote>.  But beginning in version 7.1, the default
-     behavior for an undecorated table name is to scan its child tables
-     too, whereas before the default was not to do so.  To get the old
-     default behavior, set the configuration option
-     <literal>SQL_Inheritance</literal> to off, e.g.,
-<programlisting>
-SET SQL_Inheritance TO OFF;
-</programlisting>
-     or add a line in your <filename>postgresql.conf</filename> file.
+    Because permissions are not inherited automatically, a user
+    attempting to access a parent table must either have at least the
+    same permission for the child table or must use the
+    <quote>ONLY</quote> notation. If creating a new inheritance
+    relationship in an existing system be careful that this does not
+    create problems.
    </para>
   </note>
 
   <para>
-  In some cases you may wish to know which table a particular row
-  originated from. There is a system column called
-  <structfield>tableoid</structfield> in each table which can tell you the
-  originating table:
+   Inheritance does not automatically propogate data from
+   <command>INSERT</command> or <command>COPY</command> commands to
+   other tables in the inheritance hierarchy. In our example, the
+   following <command>INSERT</command> statement will fail:
+<programlisting>
+INSERT INTO cities
+(name, population, altitude, state)
+VALUES ('New York', NULL, NULL, 'NY');
+</programlisting>
+   We might hope that the data would be somehow routed to the
+   <literal>capitals</literal> table, though this does not happen. If
+   the child has no locally defined columns, then it is possible to
+   route data from the parent to the child using a rule, see <xref
+   linkend="rules-update">.  This is not possible with the above
+   <command>INSERT</> statement because the <literal>state</> column
+   does not exist on both parent and child tables.
+  </para>
+
+  <para>
+   In some cases you may wish to know which table a particular row
+   originated from. There is a system column called
+   <structfield>tableoid</structfield> in each table which can tell you the
+   originating table:
 
 <programlisting>
 SELECT c.tableoid, c.name, c.altitude
@@ -1200,21 +1216,64 @@ WHERE c.altitude &gt; 500 and c.tableoid = p.oid;
  cities   | Mariposa  |     1953
  capitals | Madison   |      845
 </programlisting>
-   
   </para>
 
   <para>
-   A table can inherit from more than one parent table, in which case it has
-   the union of the columns defined by the parent tables (plus any columns
-   declared specifically for the child table).
+   As shown above, a child table may locally define columns as well as
+   inheriting them from their parents.  However, a locally defined
+   column cannot override the datatype of an inherited column of the
+   same name.  A table can inherit from a table that has itself
+   inherited from other tables. A table can also inherit from more
+   than one parent table, in which case it inherits the union of the
+   columns defined by the parent tables.  Inherited columns with
+   duplicate names and datatypes will be merged so that only a single
+   column is stored.
+  </para>
+
+  <para>
+   Table inheritance can currently only be defined using the <xref
+   linkend="sql-createtable" endterm="sql-createtable-title">
+   statement.  The related statement <literal>CREATE TABLE ... AS
+   SELECT</literal> does not allow inheritance to be specified. There
+   is no way to add an inheritance link to make an existing table into
+   a child table. Similarly, there is no way to remove an inheritance
+   link from a child table once it has been defined, other than using
+   <literal>DROP TABLE</literal>.  A parent table cannot be dropped
+   while any of its children remain. If you wish to remove a table and
+   all of its descendants, then you can do so using the
+   <literal>CASCADE</literal> option of the <xref
+   linkend="sql-droptable" endterm="sql-droptable-title"> statement.
+  </para>
+
+  <para>
+   Check constraints can be defined on tables within an inheritance
+   hierarchy. All check constraints on a parent table are
+   automatically inherited by all of their children. It is currently
+   possible to inherit mutually exclusive check constraints, but that
+   definition quickly shows itself since all attempted row inserts
+   will be rejected.
+  </para>
+
+  <para>
+   <xref linkend="sql-altertable" endterm="sql-altertable-title"> will
+   propogate any changes in data definition on columns or check
+   constraints down the inheritance hierarchy.  Again, dropping
+   columns or constraints on parent tables is only possible when using
+   the <literal>CASCADE</literal> option. <command>ALTER
+   TABLE</command> follows the same rules for duplicate column merging
+   and rejection that apply during <command>CREATE TABLE</command>.
   </para>
 
   <para>
-   A serious limitation of the inheritance feature is that indexes (including
-   unique constraints) and foreign key constraints only apply to single
-   tables, not to their inheritance children.  This is true on both the
-   referencing and referenced sides of a foreign key constraint.  Thus,
-   in the terms of the above example:
+   Both parent and child tables can have primary and foreign keys, so
+   that they can take part normally on both the referencing and
+   referenced sides of a foreign key constraint. Indexes may be
+   defined on any of these columns whether or not they are inherited.
+   However, a serious current limitation of the inheritance feature is
+   that indexes (including unique constraints) and foreign key
+   constraints only apply to single tables and do not also index their
+   inheritance children.  This is true on both sides of a foreign key
+   constraint.  Thus, in the terms of the above example:
 
    <itemizedlist>
     <listitem>
@@ -1236,9 +1295,11 @@ WHERE c.altitude &gt; 500 and c.tableoid = p.oid;
       Similarly, if we were to specify that
       <structname>cities</>.<structfield>name</> <literal>REFERENCES</> some
       other table, this constraint would not automatically propagate to
-      <structname>capitals</>.  In this case you could work around it by
-      manually adding the same <literal>REFERENCES</> constraint to
-      <structname>capitals</>.
+      <structname>capitals</>.  However, it is possible to set up a
+      foreign key such as <structname>capitals</>.<structfield>name</>
+      <literal>REFERENCES</> <structname>states</>.<structfield>name</>.
+      So it is possible to workaround this restriction by manually adding
+      foreign keys to each child table.
      </para>
     </listitem>
 
@@ -1254,7 +1315,556 @@ WHERE c.altitude &gt; 500 and c.tableoid = p.oid;
    These deficiencies will probably be fixed in some future release,
    but in the meantime considerable care is needed in deciding whether
    inheritance is useful for your problem.
+
   </para>
+
+  <note>
+   <title>Deprecated</title>
+   <para>
+     In previous versions of <productname>PostgreSQL</productname>, the
+     default behavior was not to include child tables in queries. This was
+     found to be error prone and is also in violation of the SQL:2003
+     standard. Under the old syntax, to get the sub-tables you append
+     <literal>*</literal> to the table name. For example:
+<programlisting>
+SELECT * from cities*;
+</programlisting>
+     You can still explicitly specify scanning child tables by
+     appending <literal>*</literal>, as well as explicitly specify not
+     scanning child tables by writing <quote>ONLY</quote>.  But
+     beginning in version 7.1, the default behavior for an undecorated
+     table name is to scan its child tables too, whereas before the
+     default was not to do so.  To get the old default behavior,
+     disable the <xref linkend="guc-sql-inheritance"> configuration
+     option.
+   </para>
+  </note>
+
+  </sect1>
+
+  <sect1 id="ce-partitioning">
+   <title>Constraint Exclusion and Partitioning</title>
+
+   <indexterm>
+    <primary>partitioning</primary>
+   </indexterm>
+
+   <indexterm>
+    <primary>constraint exclusion</primary>
+   </indexterm>
+
+   <para>
+    <productname>PostgreSQL</productname> supports basic table
+    partitioning. This section describes why and how you can implement
+    this as part of your database design.
+   </para>
+
+   <sect2 id="ce-partitioning-overview">
+     <title>Overview</title>
+
+   <para>
+    Currently, partitioning is implemented in conjunction with table
+    inheritance only, though using fully SQL:2003 compliant syntax.
+    Table inheritance allows tables to be split into partitions, and
+    constraint exclusion allows partitions to be selectively combined
+    as needed to satisfy a particular <command>SELECT</command>
+    statement. You should be familiar with inheritance (see <xref
+    linkend="ddl-inherit">) before attempting to implement
+    partitioning.
+   </para>
+
+   <para>
+    Partitioning can provide several benefits:
+   <itemizedlist>
+    <listitem>
+     <para>
+      Query performance can be improved dramatically for certain kinds
+      of queries without the need to maintain costly indexes.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      Insert performance can be improved by breaking down a large
+      index into multiple pieces. When an index no longer fits easily
+      in memory, both read and write operations on the index take
+      progressively more disk accesses.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      Bulk deletes may be avoided altogether by simply removing one of the
+      partitions, if that requirement is planned into the partitioning design.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      Seldom-used data can be migrated to cheaper and slower storage media.
+     </para>
+    </listitem>
+   </itemizedlist>
+
+    The benefits will normally be worthwhile only when a data table would
+    otherwise be very large. That is for you to judge, though would not
+    usually be lower than the size of physical RAM on the database server.
+   </para>
+
+   <para>
+    In <productname>PostgreSQL</productname> &version;, the following
+    partitioning types are supported:
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       "Range Partitioning" where the table is partitioned along a
+       "range" defined by a single column or set of columns, with no
+       overlap between partitions. Examples might be a date range or a
+       range of identifiers for particular business objects.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       "List Partitioning" where the table is partitioned by
+       explicitly listing which values relate to each partition.
+      </para>
+     </listitem>
+    </itemizedlist>
+
+    Hash partitioning is not currently supported.
+   </para>
+   </sect2>
+
+   <sect2 id="ce-partitioning-implementation">
+     <title>Implementing Partitioning</title>
+
+    <para>
+     Partitioning a table is a straightforward process.  There
+     are a wide range of options for you to consider, so judging exactly
+     when and how to implement partitioning is a more complex topic. We
+     will address that complexity primarily through the examples in this
+     section.
+    </para>
+
+    <para>
+     To use partitioning, do the following:
+     <orderedlist spacing=compact>
+      <listitem>
+       <para>
+        Create the <quote>master</quote> table, from which all of the
+        partitions will inherit.
+       </para>
+       <para>
+        This table will contain no data.  Do not define any
+        constraints or keys on this table, unless you intend them to
+        be applied equally to all partitions.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        Create several <quote>child</quote> tables that inherit from
+        the master table.
+       </para>
+
+       <para>
+        We will refer to the child tables as partitions, though they
+        are in every way just normal <productname>PostgreSQL</>
+        tables.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        Add table constraints to define the allowed values in each partition.
+       </para>
+       <para>
+        Only clauses of the form [COLUMN] [OPERATOR] [CONSTANT(s)] will be used
+        for constraint exclusion. Simple examples would be:
+<programlisting>
+CHECK ( x = 1 )
+CHECK ( county IN ('Oxfordshire','Buckinghamshire','Warwickshire'))
+CHECK ( outletID BETWEEN 1 AND 99 )
+</programlisting>
+
+        These can be linked together with boolean operators AND and OR to
+        form complex constraints. Note that there is no difference in syntax
+        between Range and List Partitioning mechanisms; those terms are
+        descriptive only. Ensure that the set of values in each child table
+        do not overlap.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        Add any other indexes you want to the partitions, bearing in
+        mind that it is always more efficient to add indexes after
+        data has been bulk loaded.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        Optionally, define a rule or trigger to redirect modifications
+        of the master table to the appropriate partition.
+       </para>
+      </listitem>
+
+     </orderedlist>
+    </para>
+
+    <para>
+     For example, suppose we are constructing a database for a large
+     ice cream company. The company measures peak temperatures every
+     day as well as ice cream sales in each region. They have two
+     tables:
+
+<programlisting>
+CREATE TABLE cities (
+    id              int not null,
+    name            text not null,
+    altitude        int            -- in feet
+);
+
+CREATE TABLE measurement (
+    city_id         int not null,
+    logdate         date not null,
+    peaktemp        int,
+    unitsales       int
+);
+</programlisting>
+
+     To reduce the amount of old data that needs to be stored, we
+     decide to only keep the most recent 3 years worth of data. At the
+     beginning of each month we remove the oldest month's data.
+    </para>
+
+    <para>
+     Most queries just access the last week, month or quarter's data,
+     since we need to keep track of sales. As a result we have a large table,
+     yet only the most frequent 10% is accessed. Most of these queries
+     are online reports for various levels of management. These queries access
+     much of the table, so it is difficult to build enough indexes and at
+     the same time allow us to keep loading all of the data fast enough.
+     Yet, the reports are online so we need to respond quickly.
+    </para>
+
+    <para>
+     In this situation we can use partitioning to help us meet all of our
+     different requirements for the measurements table. Following the
+     steps outlined above, partitioning can be enabled as follows:
+    </para>
+
+    <para>
+     <orderedlist spacing=compact>
+      <listitem>
+       <para>
+        The measurement table is our master table.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        Next we create one partition for each month using inheritance:
+
+<programlisting>
+CREATE TABLE measurement_yy04mm02 ( ) INHERITS (measurement);
+CREATE TABLE measurement_yy04mm03 ( ) INHERITS (measurement);
+...
+CREATE TABLE measurement_yy05mm11 ( ) INHERITS (measurement);
+CREATE TABLE measurement_yy05mm12 ( ) INHERITS (measurement);
+CREATE TABLE measurement_yy06mm01 ( ) INHERITS (measurement);
+</programlisting>
+
+        Each of the partitions are complete tables in their own right,
+        but they inherit their definition from the measurement table.
+       </para>
+
+       <para>
+        This solves one of our problems: deleting old data. Each
+        month, all we need to do is perform a <command>DROP
+        TABLE</command> on the oldest table and create a new table to
+        insert into.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        We now add non-overlapping table constraints, so that our
+        table creation script becomes:
+
+ <programlisting>
+CREATE TABLE measurement_yy04mm02 (
+    CHECK ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
+                                    ) INHERITS (measurement);
+CREATE TABLE measurement_yy04mm03 (
+    CHECK ( logdate >= DATE '2004-03-01' AND logdate < DATE '2004-04-01' )
+                                    ) INHERITS (measurement);
+...
+CREATE TABLE measurement_yy05mm11 (
+    CHECK ( logdate >= DATE '2005-11-01' AND logdate < DATE '2005-12-01' )
+                                    ) INHERITS (measurement);
+CREATE TABLE measurement_yy05mm12 (
+    CHECK ( logdate >= DATE '2005-12-01' AND logdate < DATE '2006-01-01' )
+                                    ) INHERITS (measurement);
+CREATE TABLE measurement_yy06mm01 (
+    CHECK ( logdate >= DATE '2006-01-01' AND logdate < DATE '2006-02-01' )
+                                    ) INHERITS (measurement);
+</programlisting>
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        We choose not to add further indexes at this time.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        Data will be added each day to the latest partition. This
+        allows us to set up a very simple rule to insert data. We must
+        redefine this each month so that it always points to the
+        current partition.
+
+<programlisting>
+CREATE OR REPLACE RULE measurement_current_partition AS
+ON INSERT
+TO measurement
+DO INSTEAD
+    INSERT INTO measurement_yy06mm01 VALUES ( NEW.city_id,
+                                              NEW.logdate,
+                                              NEW.peaktemp,
+                                              NEW.unitsales );
+</programlisting>
+
+        We might want to insert data and have the server automatically
+        locate the partition into which the row should be added. We
+        could do this with a more complex set of rules as shown below.
+
+<programlisting>
+CREATE RULE measurement_insert_yy04mm02 AS
+ON INSERT
+TO measurement WHERE
+    ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
+DO INSTEAD
+    INSERT INTO measurement_yy04mm02 VALUES ( NEW.city_id,
+                                              NEW.logdate,
+                                              NEW.peaktemp,
+                                              NEW.unitsales );
+...
+CREATE RULE measurement_insert_yy05mm12 AS
+ON INSERT
+TO measurement WHERE
+    ( logdate >= DATE '2005-12-01' AND logdate < DATE '2006-01-01' )
+DO INSTEAD
+    INSERT INTO measurement_yy05mm12 VALUES ( NEW.city_id,
+                                              NEW.logdate,
+                                              NEW.peaktemp,
+                                              NEW.unitsales );
+CREATE RULE measurement_insert_yy06mm01 AS
+ON INSERT
+TO measurement WHERE
+    ( logdate >= DATE '2006-01-01' AND logdate < DATE '2006-02-01' )
+DO INSTEAD
+    INSERT INTO measurement_yy06mm01 VALUES ( NEW.city_id,
+                                              NEW.logdate,
+                                              NEW.peaktemp,
+                                              NEW.unitsales );
+</programlisting>
+
+        Note that the <literal>WHERE</literal> clause in each rule
+        exactly matches those used for the <literal>CHECK</literal>
+        constraints on each partition.
+       </para>
+      </listitem>
+     </orderedlist>
+    </para>
+
+    <para>
+     As we can see, a complex partitioning scheme could require a
+     substantial amount of DDL. In the above example we would be
+     creating a new partition each month, so it may be wise to write a
+     script that generates the required DDL automatically.
+    </para>
+
+   <para>
+    The following caveats apply:
+   <itemizedlist>
+    <listitem>
+     <para>
+      There is currently no way to specify that all of the
+      <literal>CHECK</literal> constraints are mutually
+      exclusive. Care is required by the database designer.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      There is currently no way to specify that rows may not be
+      inserted into the master table. A <literal>CHECK</literal>
+      constraint on the master table will be inherited by all child
+      tables, so that cannot not be used for this purpose.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      For some datatypes you must explicitly coerce the constant values
+      into the datatype of the column. The following constraint will
+      work if x is an INTEGER datatype, but not if x is BIGINT datatype.
+<programlisting>
+CHECK ( x = 1 )
+</programlisting>
+      For BIGINT we must use a constraint like:
+<programlisting>
+CHECK ( x = 1::bigint )
+</programlisting>
+      The issue is not restricted to BIGINT datatypes but can occur whenever
+      the default datatype of the constant does not match the datatype of
+      the column to which it is being compared.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      Partitioning can also be arranged using a <literal>UNION
+      ALL</literal> view:
+
+<programlisting>
+CREATE VIEW measurement AS
+          SELECT * FROM measurement_yy04mm02
+UNION ALL SELECT * FROM measurement_yy04mm03
+...
+UNION ALL SELECT * FROM measurement_yy05mm11
+UNION ALL SELECT * FROM measurement_yy05mm12
+UNION ALL SELECT * FROM measurement_yy06mm01;
+</programlisting>
+
+      However, constraint exclusion is currently not supported for
+      partitioned tables defined in this manner.
+     </para>
+    </listitem>
+   </itemizedlist>
+   </para>
+   </sect2>
+
+   <sect2 id="constraint-exclusion-queries">
+    <title>Constraint Exclusion in Queries</title>
+
+   <para>
+    Partitioning can be used to improve query performance when used in
+    conjunction with constraint exclusion. As an example:
+
+<programlisting>
+SET constraint_exclusion=true;
+SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-01-01';
+</programlisting>
+
+    Without constraint exclusion, the above query would scan each of
+    the partitions of the measurement table. With constraint
+    exclusion, the planner will examine each of the constraints and
+    try to prove that each of the partitions needs to be involved in
+    the query. If the planner is able to refute that for any
+    partition, it excludes the partition from the query plan.
+   </para>
+
+   <para>
+    You can use the <command>EXPLAIN</> command to show the difference
+    between a plan with <varname>constraint_exclusion</> on and a plan
+    with it off.
+
+<programlisting>
+SET constraint_exclusion=false;
+EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-01-01';
+
+                                          QUERY PLAN
+-----------------------------------------------------------------------------------------------
+ Aggregate  (cost=158.66..158.68 rows=1 width=0)
+   ->  Append  (cost=0.00..151.88 rows=2715 width=0)
+         ->  Seq Scan on measurement  (cost=0.00..30.38 rows=543 width=0)
+               Filter: (logdate >= '2006-01-01'::date)
+         ->  Seq Scan on measurement_yy04mm02 measurement  (cost=0.00..30.38 rows=543 width=0)
+               Filter: (logdate >= '2006-01-01'::date)
+         ->  Seq Scan on measurement_yy04mm03 measurement  (cost=0.00..30.38 rows=543 width=0)
+               Filter: (logdate >= '2006-01-01'::date)
+...
+         ->  Seq Scan on measurement_yy05mm12 measurement  (cost=0.00..30.38 rows=543 width=0)
+               Filter: (logdate >= '2006-01-01'::date)
+         ->  Seq Scan on measurement_yy06mm01 measurement  (cost=0.00..30.38 rows=543 width=0)
+               Filter: (logdate >= '2006-01-01'::date)
+</programlisting>
+
+    Now when we enable constraint exclusion, we get a significantly
+    reduced plan but the same result set:
+
+<programlisting>
+SET constraint_exclusion=true;
+EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-01-01';
+                                          QUERY PLAN
+-----------------------------------------------------------------------------------------------
+ Aggregate  (cost=63.47..63.48 rows=1 width=0)
+   ->  Append  (cost=0.00..60.75 rows=1086 width=0)
+         ->  Seq Scan on measurement  (cost=0.00..30.38 rows=543 width=0)
+               Filter: (logdate >= '2006-01-01'::date)
+         ->  Seq Scan on measurement_yy06mm01 measurement  (cost=0.00..30.38 rows=543 width=0)
+               Filter: (logdate >= '2006-01-01'::date)
+</programlisting>
+
+    Don't forget that you still need to run <command>ANALYZE</command>
+    on each partition individually. A command like this
+<programlisting>
+ANALYZE measurement;
+</programlisting>
+
+    only affects the master table.
+   </para>
+
+   <para>
+    No indexes are required to use constraint exclusion. The
+    partitions should be defined with appropriate <literal>CHECK</>
+    constraints. These are then compared with the predicates of the
+    <command>SELECT</> query to determine which partitions must be
+    scanned.
+   </para>
+
+   <para>
+    The following caveats apply to this release:
+   <itemizedlist>
+    <listitem>
+     <para>
+      Constraint exclusion only works when the query directly matches
+      a constant. A constant bound to a parameterised query will not
+      work in the same way since the plan is fixed and would need to
+      vary with each execution.  Also, stable constants such as
+      <literal>CURRENT_DATE</literal> may not be used, since these are
+      constant only for during the execution of a single query.  Join
+      conditions will not allow constraint exclusion to work either.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      UPDATEs and DELETEs against the master table do not perform
+      constraint exclusion.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      All constraints on all partitions of the master table are considered for
+      constraint exclusion, so large numbers of partitions are likely to
+      increase query planning time considerably.
+     </para>
+    </listitem>
+
+   </itemizedlist>
+   </para>
+
+   </sect2>
+
  </sect1>
 
  <sect1 id="ddl-alter">
@@ -1530,7 +2140,7 @@ ALTER TABLE products RENAME TO items;
    </para>
   </sect2>
  </sect1>
- 
+
  <sect1 id="ddl-priv">
   <title>Privileges</title>
 
@@ -1953,7 +2563,7 @@ SELECT 3 OPERATOR(pg_catalog.+) 4;
     schema.  To allow that, the <literal>CREATE</literal> privilege on
     the schema needs to be granted.  Note that by default, everyone
     has <literal>CREATE</literal> and <literal>USAGE</literal> privileges on
-    the schema 
+    the schema
     <literal>public</literal>.  This allows all users that are able to
     connect to a given database to create objects in its
     <literal>public</literal> schema.  If you do
author	Neil Conway <neilc@samurai.com>	2005-11-01 23:19:05 +0000
committer	Neil Conway <neilc@samurai.com>	2005-11-01 23:19:05 +0000
commit	eddcd492fe68c5870640b2334fd0597a11025de1 (patch)
tree	d06eff8d678b5924b245d14d8127c43a47117606
parent	b524cb36ac2526049fe7978cadc28fca647f9b70 (diff)
download	postgresql-eddcd492fe68c5870640b2334fd0597a11025de1.tar.gz postgresql-eddcd492fe68c5870640b2334fd0597a11025de1.zip