diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2019-04-05 19:20:30 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2019-04-05 19:20:43 -0400 |
commit | 959d00e9dbe4cfcf4a63bb655ac2c29a5e579246 (patch) | |
tree | 4c260f752780d317d1f2ebab8aae553dc4dc8236 /src/backend/partitioning/partbounds.c | |
parent | 9f06d79ef831ffa333f908f6d3debdb654292414 (diff) | |
download | postgresql-959d00e9dbe4cfcf4a63bb655ac2c29a5e579246.tar.gz postgresql-959d00e9dbe4cfcf4a63bb655ac2c29a5e579246.zip |
Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results. This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.
However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child. Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would). This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.
David Rowley, reviewed by Julien Rouhaud and Antonin Houska
Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
Diffstat (limited to 'src/backend/partitioning/partbounds.c')
-rw-r--r-- | src/backend/partitioning/partbounds.c | 64 |
1 files changed, 64 insertions, 0 deletions
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c index bdd0d238542..c8770ccfee0 100644 --- a/src/backend/partitioning/partbounds.c +++ b/src/backend/partitioning/partbounds.c @@ -862,6 +862,70 @@ partition_bounds_copy(PartitionBoundInfo src, } /* + * partitions_are_ordered + * Determine whether the partitions described by 'boundinfo' are ordered, + * that is partitions appearing earlier in the PartitionDesc sequence + * contain partition keys strictly less than those appearing later. + * Also, if NULL values are possible, they must come in the last + * partition defined in the PartitionDesc. + * + * If out of order, or there is insufficient info to know the order, + * then we return false. + */ +bool +partitions_are_ordered(PartitionBoundInfo boundinfo, int nparts) +{ + Assert(boundinfo != NULL); + + switch (boundinfo->strategy) + { + case PARTITION_STRATEGY_RANGE: + + /* + * RANGE-type partitioning guarantees that the partitions can be + * scanned in the order that they're defined in the PartitionDesc + * to provide sequential, non-overlapping ranges of tuples. + * However, if a DEFAULT partition exists then it doesn't work, as + * that could contain tuples from either below or above the + * defined range, or tuples belonging to gaps between partitions. + */ + if (!partition_bound_has_default(boundinfo)) + return true; + break; + + case PARTITION_STRATEGY_LIST: + + /* + * LIST partitioning can also guarantee ordering, but only if the + * partitions don't accept interleaved values. We could likely + * check for this by looping over the PartitionBound's indexes + * array to check that the indexes are in order. For now, let's + * just keep it simple and just accept LIST partitioning when + * there's no DEFAULT partition, exactly one value per partition, + * and optionally a NULL partition that does not accept any other + * values. Such a NULL partition will come last in the + * PartitionDesc, and the other partitions will be properly + * ordered. This is a cheap test to make as it does not require + * any per-partition processing. Maybe we'd like to handle more + * complex cases in the future. + */ + if (partition_bound_has_default(boundinfo)) + return false; + + if (boundinfo->ndatums + partition_bound_accepts_nulls(boundinfo) + == nparts) + return true; + break; + + default: + /* HASH, or some other strategy */ + break; + } + + return false; +} + +/* * check_new_partition_bound * * Checks if the new partition's bound overlaps any of the existing partitions |