aboutsummaryrefslogtreecommitdiff
path: root/src/backend/optimizer/path/pathkeys.c
Commit message (Collapse)AuthorAge
* Speedup child EquivalenceMember lookup in plannerDavid Rowley2025-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When planning queries to partitioned tables, we clone all EquivalenceMembers belonging to the partitioned table into em_is_child EquivalenceMembers for each non-pruned partition. For partitioned tables with large numbers of partitions, this meant the ec_members list could become large and code searching that list would become slow. Effectively, the more partitions which were present, the more searches needed to be performed for operations such as find_ec_member_matching_expr() during create_plan() and the more partitions present, the longer these searches would take, i.e., a quadratic slowdown. To fix this, here we adjust how we store EquivalenceMembers for em_is_child members. Instead of storing these directly in ec_members, these are now stored in a new array of Lists in the EquivalenceClass, which is indexed by the relid. When we want to find EquivalenceMembers belonging to a certain child relation, we can narrow the search to the array element for that relation. To make EquivalenceMember lookup easier and to reduce the amount of code change, this commit provides a pair of functions to allow iteration over the EquivalenceMembers of an EC which also handles finding the child members, if required. Callers that never need to look at child members can remain using the foreach loop over ec_members, which will now often be faster due to only parent-level members being stored there. The actual performance increases here are highly dependent on the number of partitions and the query being planned. Performance increases can be visible with as few as 8 partitions, but the speedup is marginal for such low numbers of partitions. The speedups become much more visible with a few dozen to hundreds of partitions. With some tested queries using 56 partitions, the planner was around 3x faster than before. For use cases with thousands of partitions, these are likely to become significantly faster. Some testing has shown planner speedups of 60x or more with 8192 partitions. Author: Yuya Watari <watari.yuya@gmail.com> Co-authored-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Tested-by: Thom Brown <thom@linux.com> Tested-by: newtglobal postgresql_contributors <postgresql_contributors@newtglobalcorp.com> Discussion: https://postgr.es/m/CAJ2pMkZNCgoUKSE%2B_5LthD%2BKbXKvq6h2hQN8Esxpxd%2Bcxmgomg%40mail.gmail.com
* Relax ordering-related hardcoded btree requirements in planningPeter Eisentraut2025-04-06
| | | | | | | | | | | | | | | | | | | | There were several places in ordering-related planning where a requirement for btree was hardcoded but an amcanorder index could suffice. This fixes that. We just need to do the necessary mapping between strategy numbers and compare types and adjust some related APIs so that this works independent of btree strategy numbers. For instance, non-btree amcanorder indexes can now be used to support sorting and merge joins. Also, predtest.c works independent of btree strategy numbers now. To avoid performance regressions, some details on btree and other built-in index types are still hardcoded as shortcuts, but other index types now have access to the same features by providing the required flags and callbacks. Author: Mark Dilger <mark.dilger@enterprisedb.com> Co-authored-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
* Convert PathKey to use CompareTypePeter Eisentraut2025-04-04
| | | | | | | | | | | | | | | Change the PathKey struct to use CompareType to record the sort direction instead of hardcoding btree strategy numbers. The CompareType is then converted to the index-type-specific strategy when the plan is created. This reduces the number of places btree strategy numbers are hardcoded, and it's a self-contained subset of a larger effort to allow non-btree indexes to behave like btrees. Author: Mark Dilger <mark.dilger@enterprisedb.com> Co-authored-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Reordering DISTINCT keys to match input path's pathkeysRichard Guo2024-11-26
| | | | | | | | | | | | | | | | | | | | | | | | The ordering of DISTINCT items is semantically insignificant, so we can reorder them as needed. In fact, in the parser, we absorb the sorting semantics of the sortClause as much as possible into the distinctClause, ensuring that one clause is a prefix of the other. This can help avoid a possible need to re-sort. In this commit, we attempt to adjust the DISTINCT keys to match the input path's pathkeys. This can likewise help avoid re-sorting, or allow us to use incremental-sort to save efforts. For DISTINCT ON expressions, the parser already ensures that they match the initial ORDER BY expressions. When reordering the DISTINCT keys, we must ensure that the resulting pathkey list matches the initial distinctClause pathkeys. This introduces a new GUC, enable_distinct_reordering, which allows the optimization to be disabled if needed. Author: Richard Guo Reviewed-by: Andrei Lepikhov Discussion: https://postgr.es/m/CAMbWs48dR26cCcX0f=8bja2JKQPcU64136kHk=xekHT9xschiQ@mail.gmail.com
* Track sort direction in SortGroupClausePeter Eisentraut2024-10-14
| | | | | | | | | | | | | | Functions make_pathkey_from_sortop() and transformWindowDefinitions(), which receive a SortGroupClause, were determining the sort order (ascending vs. descending) by comparing that structure's operator strategy to BTLessStrategyNumber, but could just as easily have gotten it from the SortGroupClause object, if it had such a field, so add one. This reduces the number of places that hardcode the assumption that the strategy refers specifically to a btree strategy, rather than some other index AM's operators. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
* Mark expressions nullable by grouping setsRichard Guo2024-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When generating window_pathkeys, distinct_pathkeys, or sort_pathkeys, we failed to realize that the grouping/ordering expressions might be nullable by grouping sets. As a result, we may incorrectly deem that the PathKeys are redundant by EquivalenceClass processing and thus remove them from the pathkeys list. That would lead to wrong results in some cases. To fix this issue, we mark the grouping expressions nullable by grouping sets if that is the case. If the grouping expression is a Var or PlaceHolderVar or constructed from those, we can just add the RT index of the RTE_GROUP RTE to the existing nullingrels field(s); otherwise we have to add a PlaceHolderVar to carry on the nullingrel bit. However, we have to manually remove this nullingrel bit from expressions in various cases where these expressions are logically below the grouping step, such as when we generate groupClause pathkeys for grouping sets, or when we generate PathTarget for initial input to grouping nodes. Furthermore, in set_upper_references, the targetlist and quals of an Agg node should have nullingrels that include the effects of the grouping step, ie they will have nullingrels equal to the input Vars/PHVs' nullingrels plus the nullingrel bit that references the grouping RTE. In order to perform exact nullingrels matches, we also need to manually remove this nullingrel bit. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com
* Support "Right Semi Join" plan shapesRichard Guo2024-07-05
| | | | | | | | | | | | | | | | | | | | | | | Hash joins can support semijoin with the LHS input on the right, using the existing logic for inner join, combined with the assurance that only the first match for each inner tuple is considered, which can be achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag. This can be very useful in some cases since we may now have the option to hash the smaller table instead of the larger. Merge join could likely support "Right Semi Join" too. However, the benefit of swapping inputs tends to be small here, so we do not address that in this patch. Note that this patch also modifies a test query in join.sql to ensure it continues testing as intended. With this patch the original query would result in a right-semi-join rather than semi-join, compromising its original purpose of testing the fix for neqjoinsel's behavior for semi-joins. Author: Richard Guo Reviewed-by: wenhui qiu, Alena Rybakina, Japin Li Discussion: https://postgr.es/m/CAMbWs4_X1mN=ic+SxcyymUqFx9bB8pqSLTGJ-F=MHy4PW3eRXw@mail.gmail.com
* Restore preprocess_groupclause()Alexander Korotkov2024-06-06
| | | | | | | | | | | | | | | | | | | | | | | | 0452b461bc made optimizer explore alternative orderings of group-by pathkeys. It eliminated preprocess_groupclause(), which was intended to match items between GROUP BY and ORDER BY. Instead, get_useful_group_keys_orderings() function generates orderings of GROUP BY elements at the time of grouping paths generation. The get_useful_group_keys_orderings() function takes into account 3 orderings of GROUP BY pathkeys and clauses: original order as written in GROUP BY, matching ORDER BY clauses as much as possible, and matching the input path as much as possible. Given that even before 0452b461b, preprocess_groupclause() could change the original order of GROUP BY clauses we don't need to consider it apart from ordering matching ORDER BY clauses. This commit restores preprocess_groupclause() to provide an ordering of GROUP BY elements matching ORDER BY before generation of paths. The new version of preprocess_groupclause() takes into account an incremental sort. The get_useful_group_keys_orderings() function now takes into 2 orderings of GROUP BY elements: the order generated preprocess_groupclause() and the order matching the input path as much as possible. Discussion: https://postgr.es/m/CAPpHfdvyWLMGwvxaf%3D7KAp-z-4mxbSH8ti2f6mNOQv5metZFzg%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Andrei Lepikhov, Pavel Borisov
* Rename PathKeyInfo to GroupByOrderingAlexander Korotkov2024-06-06
| | | | | | | | | | | | | 0452b461bc made optimizer explore alternative orderings of group-by pathkeys. The PathKeyInfo data structure was used to store the particular ordering of group-by pathkeys and corresponding clauses. It turns out that PathKeyInfo is not the best name for that purpose. This commit renames this data structure to GroupByOrdering, and revises its comment. Discussion: https://postgr.es/m/db0fc3a4-966c-4cec-a136-94024d39212d%40postgrespro.ru Reported-by: Tom Lane Author: Andrei Lepikhov Reviewed-by: Alexander Korotkov, Pavel Borisov
* Add invariants check to get_useful_group_keys_orderings()Alexander Korotkov2024-06-06
| | | | | | | | | | This commit introduces invariants checking of generated orderings in get_useful_group_keys_orderings() for assert-enabled builds. Discussion: https://postgr.es/m/a663f0f6-cbf6-49aa-af2e-234dc6768a07%40postgrespro.ru Reported-by: Tom Lane Author: Andrei Lepikhov Reviewed-by: Alexander Korotkov, Pavel Borisov
* Fix asymmetry in setting EquivalenceClass.ec_sortrefAlexander Korotkov2024-06-06
| | | | | | | | | | | | | | | | | | 0452b461bc made get_eclass_for_sort_expr() always set EquivalenceClass.ec_sortref if it's not done yet. This leads to an asymmetric situation when whoever first looks for the EquivalenceClass sets the ec_sortref. It is also counterintuitive that get_eclass_for_sort_expr() performs modification of data structures. This commit makes make_pathkeys_for_sortclauses_extended() responsible for setting EquivalenceClass.ec_sortref. Now we set the EquivalenceClass.ec_sortref's needed to explore alternative GROUP BY ordering specifically during building pathkeys by the list of grouping clauses. Discussion: https://postgr.es/m/17037754-f187-4138-8285-0e2bfebd0dea%40postgrespro.ru Reported-by: Tom Lane Author: Andrei Lepikhov Reviewed-by: Alexander Korotkov, Pavel Borisov
* Re-allow planner to use Merge Append to efficiently implement UNION.Robert Haas2024-05-21
| | | | | | | | | | | This reverts commit 7204f35919b7e021e8d1bc9f2d76fd6bfcdd2070, thus restoring 66c0185a3 (Allow planner to use Merge Append to efficiently implement UNION) as well as the follow-on commits d5d2205c8, 3b1a7eb28, 7487044d6. Per further discussion on pgsql-release, we wish to ship beta1 with this feature, and patch the bug that was found just before wrap, rather than shipping beta1 with the feature reverted.
* Revert commit 66c0185a3 and follow-on patches.Tom Lane2024-05-20
| | | | | | | | | | | | | | | | | | | This reverts 66c0185a3 (Allow planner to use Merge Append to efficiently implement UNION) as well as the follow-on commits d5d2205c8, 3b1a7eb28, 7487044d6. In addition to those, 07746a8ef had to be removed then re-applied in a different place, because 66c0185a3 moved the relevant code. The reason for this last-minute thrashing is that depesz found a case in which the patched code creates a completely wrong plan that silently gives incorrect query results. It's unclear what the cause is or how many cases are affected, but with beta1 wrap staring us in the face, there's no time for closer investigation. After we figure that out, we can decide whether to un-revert this for beta2 or hold it for v18. Discussion: https://postgr.es/m/Zktzf926vslR35Fv@depesz.com (also some private discussion among pgsql-release)
* Fix typos and duplicate wordsDaniel Gustafsson2024-04-18
| | | | | | | | | | | | This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
* Allow planner to use Merge Append to efficiently implement UNIONDavid Rowley2024-03-25
| | | | | | | | | | | | | | | | | | | | | Until now, UNION queries have often been suboptimal as the planner has only ever considered using an Append node and making the results unique by either using a Hash Aggregate, or by Sorting the entire Append result and running it through the Unique operator. Both of these methods always require reading all rows from the union subqueries. Here we adjust the union planner so that it can request that each subquery produce results in target list order so that these can be Merge Appended together and made unique with a Unique node. This can improve performance significantly as the union child can make use of the likes of btree indexes and/or Merge Joins to provide the top-level UNION with presorted input. This is especially good if the top-level UNION contains a LIMIT node that limits the output rows to a small subset of the unioned rows as cheap startup plans can be used. Author: David Rowley Reviewed-by: Richard Guo, Andy Fan Discussion: https://postgr.es/m/CAApHDvpb_63XQodmxKUF8vb9M7CxyUyT4sWvEgqeQU-GB7QFoQ@mail.gmail.com
* Remove unused #include's from backend .c filesPeter Eisentraut2024-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
* Simplify PathKey checking codeDavid Rowley2024-02-15
| | | | | | | | | | | | | | | | | | | | | | pathkeys_useful_for_ordering() contained some needless checks to return 0 when either root->query_pathkeys or pathkeys lists were empty. This is already handled by pathkeys_count_contained_in(), so let's have it do the work instead of having redundant checks. Similarly, in pathkeys_useful_for_grouping(), checking pathkeys is an empty list just before looping over it isn't required. Technically, neither is the list empty check for group_pathkeys, but I felt a bit more work would have to be done to get the equivalent behavior if we'd left it up to the foreach loop to call list_member_ptr(). This was noticed by Andy while he was reviewing a patch to improve the UNION planner. Since that patch adds another function similar to pathkeys_useful_for_ordering() and since I wasn't planning to copy these redundant checks over to the new function, let's adjust the existing code so that both functions will be consistent. Author: Andy Fan Discussion: https://postgr.es/m/87o7cti48f.fsf@163.com
* Fix usage of aggregate pathkeys in group_keys_reorder_by_pathkeys()Alexander Korotkov2024-02-09
| | | | | | | | | | | | | | | | group_keys_reorder_by_pathkeys() function searched for matching pathkeys within root->group_pathkeys. That could lead to picking an aggregate pathkey and using its pathkey->pk_eclass->ec_sortref as an argument of get_sortgroupref_clause_noerr(). Given that ec_sortref of an aggregate pathkey references aggregate targetlist not query targetlist, this leads to incorrect query optimization. Fix this by looking for matching pathkeys only within the first num_groupby_pathkeys pathkeys. Reported-by: David G. Johnston Discussion: https://postgr.es/m/CAKFQuwY3Ek%3DcLThgd8FdaSc5JRDVt0FaV00gMcWra%2BTAR4gGUw%40mail.gmail.com Author: Andrei Lepikhov, Alexander Korotkov
* Explore alternative orderings of group-by pathkeys during optimization.Alexander Korotkov2024-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When evaluating a query with a multi-column GROUP BY clause, we can minimize sort operations or avoid them if we synchronize the order of GROUP BY clauses with the ORDER BY sort clause or sort order, which comes from the underlying query tree. Grouping does not imply any ordering, so we can compare the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg, we simply compared keys in the order specified in the query. This commit explores alternative ordering of the keys, trying to find a cheaper one. The ordering of group keys may interact with other parts of the query, some of which may not be known while planning the grouping. For example, there may be an explicit ORDER BY clause or some other ordering-dependent operation higher up in the query, and using the same ordering may allow using either incremental sort or even eliminating the sort entirely. The patch always keeps the ordering specified in the query, assuming the user might have additional insights. This introduces a new GUC enable_group_by_reordering so that the optimization may be disabled if needed. Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru Author: Andrei Lepikhov, Teodor Sigaev Reviewed-by: Tomas Vondra, Claudio Freire, Gavin Flower, Dmitry Dolgov Reviewed-by: Robert Haas, Pavel Borisov, David Rowley, Zhihong Yu Reviewed-by: Tom Lane, Alexander Korotkov, Richard Guo, Alena Rybakina
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Reorder tests in get_cheapest_path_for_pathkeys().Robert Haas2023-09-07
| | | | | | | | | | | Checking parallel safety should be even cheaper than cost comparison, so do that first. Also make some minor, related comment improvements. Richard Guo, reviewed by Aleksander Alekseev, Andy Fan, and me. Discussion: http://postgr.es/m/CAMbWs4-KE2wf4QPj_Sr5mX4QFtBNNKGmxK=+e=KZEGUjdG33=g@mail.gmail.com
* Support "Right Anti Join" plan shapes.Tom Lane2023-04-05
| | | | | | | | | | | | | Merge and hash joins can support antijoin with the non-nullable input on the right, using very simple combinations of their existing logic for right join and anti join. This gives the planner more freedom about how to order the join. It's particularly useful for hash join, since we may now have the option to hash the smaller table instead of the larger. Richard Guo, reviewed by Ronan Dunklau and myself Discussion: https://postgr.es/m/CAMbWs48xh9hMzXzSy3VaPzGAz+fkxXXTUbCLohX1_L8THFRm2Q@mail.gmail.com
* Do assorted mop-up in the planner.Tom Lane2023-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove RestrictInfo.nullable_relids, along with a good deal of infrastructure that calculated it. One use-case for it was in join_clause_is_movable_to, but we can now replace that usage with a check to see if the clause's relids include any outer join that can null the target relation. The other use-case was in join_clause_is_movable_into, but that test can just be dropped entirely now that the clause's relids include outer joins. Furthermore, join_clause_is_movable_into should now be accurate enough that it will accept anything returned by generate_join_implied_equalities, so we can restore the Assert that was diked out in commit 95f4e59c3. Remove the outerjoin_delayed mechanism. We needed this before to prevent quals from getting evaluated below outer joins that should null some of their vars. Now that we consider varnullingrels while placing quals, that's taken care of automatically, so throw the whole thing away. Teach remove_useless_result_rtes to also remove useless FromExprs. Having done that, the delay_upper_joins flag serves no purpose any more and we can remove it, largely reverting 11086f2f2. Use constant TRUE for "dummy" clauses when throwing back outer joins. This improves on a hack I introduced in commit 6a6522529. If we have a left-join clause l.x = r.y, and a WHERE clause l.x = constant, we generate r.y = constant and then don't really have a need for the join clause. But we must throw the join clause back anyway after marking it redundant, so that the join search heuristics won't think this is a clauseless join and avoid it. That was a kluge introduced under time pressure, and after looking at it I thought of a better way: let's just introduce constant-TRUE "join clauses" instead, and get rid of them at the end. This improves the generated plans for such cases by not having to test a redundant join clause. We can also get rid of the ugly hack used to mark such clauses as redundant for selectivity estimation. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
* Remove redundant grouping and DISTINCT columns.Tom Lane2023-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid explicitly grouping by columns that we know are redundant for sorting, for example we need group by only one of x and y in SELECT ... WHERE x = y GROUP BY x, y This comes up more often than you might think, as shown by the changes in the regression tests. It's nearly free to detect too, since we are just piggybacking on the existing logic that detects redundant pathkeys. (In some of the existing plans that change, it's visible that a sort step preceding the grouping step already didn't bother to sort by the redundant column, making the old plan a bit silly-looking.) To do this, build processed_groupClause and processed_distinctClause lists that omit any provably-redundant sort items, and consult those not the originals where relevant. This means that within the planner, one should usually consult root->processed_groupClause or root->processed_distinctClause if one wants to know which columns are to be grouped on; but to check whether grouping or distinct-ing is happening at all, check non-NIL-ness of parse->groupClause or parse->distinctClause. This is comparable to longstanding rules about handling the HAVING clause, so I don't think it'll be a huge maintenance problem. nodeAgg.c also needs minor mods, because it's now possible to generate AGG_PLAIN and AGG_SORTED Agg nodes with zero grouping columns. Patch by me; thanks to Richard Guo and David Rowley for review. Discussion: https://postgr.es/m/185315.1672179489@sss.pgh.pa.us
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Revert "Optimize order of GROUP BY keys".Tom Lane2022-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and several follow-on fixes. The idea of making a cost-based choice of the order of the sorting columns is not fundamentally unsound, but it requires cost information and data statistics that we don't really have. For example, relying on procost to distinguish the relative costs of different sort comparators is pretty pointless so long as most such comparator functions are labeled with cost 1.0. Moreover, estimating the number of comparisons done by Quicksort requires more than just an estimate of the number of distinct values in the input: you also need some idea of the sizes of the larger groups, if you want an estimate that's good to better than a factor of three or so. That's data that's often unknown or not very reliable. Worse, to arrive at estimates of the number of calls made to the lower-order-column comparison functions, the code needs to make estimates of the numbers of distinct values of multiple columns, which are necessarily even less trustworthy than per-column stats. Even if all the inputs are perfectly reliable, the cost algorithm as-implemented cannot offer useful information about how to order sorting columns beyond the point at which the average group size is estimated to drop to 1. Close inspection of the code added by db0d67db2 shows that there are also multiple small bugs. These could have been fixed, but there's not much point if we don't trust the estimates to be accurate in-principle. Finally, the changes in cost_sort's behavior made for very large changes (often a factor of 2 or so) in the cost estimates for all sorting operations, not only those for multi-column GROUP BY. That naturally changes plan choices in many situations, and there's precious little evidence to show that the changes are for the better. Given the above doubts about whether the new estimates are really trustworthy, it's hard to summon much confidence that these changes are better on the average. Since we're hard up against the release deadline for v15, let's revert these changes for now. We can always try again later. Note: in v15, I left T_PathKeyInfo in place in nodes.h even though it's unreferenced. Removing it would be an ABI break, and it seems a bit late in the release cycle for that. Discussion: https://postgr.es/m/TYAPR01MB586665EB5FB2C3807E893941F5579@TYAPR01MB5866.jpnprd01.prod.outlook.com
* Fix misleading comment for get_cheapest_group_keys_orderDavid Rowley2022-09-20
| | | | | | | | | | | | | | | | | The header comment for get_cheapest_group_keys_order() claimed that the output arguments were set to a newly allocated list which may be freed by the calling function, however, this was not always true as the function would simply leave these arguments untouched in some cases. This tripped me up when working on 1349d2790 as I mistakenly assumed I could perform a list_concat with the output parameters. That turned out bad due to list_concat modifying the original input lists. In passing, make it more clear that the number of distinct values is important to reduce tiebreaks during sorts. Also, explain what the n_preordered parameter means. Backpatch-through: 15, where get_cheapest_group_keys_order was introduced.
* Fix planner to consider matches to boolean columns in extension indexes.Tom Lane2022-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | The planner has to special-case indexes on boolean columns, because what we need for an indexscan on such a column is a qual of the shape of "boolvar = pseudoconstant". For plain bool constants, previous simplification will have reduced this to "boolvar" or "NOT boolvar", and we have to reverse that if we want to make an indexqual. There is existing code to do so, but it only fires when the index's opfamily is BOOL_BTREE_FAM_OID or BOOL_HASH_FAM_OID. Thus extension AMs, or extension opclasses such as contrib/btree_gin, are out in the cold. The reason for hard-wiring the set of relevant opfamilies was mostly to avoid a catalog lookup in a hot code path. We can improve matters while not taking much of a performance hit by relying on the hard-wired set when the opfamily OID is visibly built-in, and only checking the catalogs when dealing with an extension opfamily. While here, rename IsBooleanOpfamily to IsBuiltinBooleanOpfamily to remind future users of that macro of its limitations. At some point we might want to make indxpath.c's improved version of the test globally accessible, but it's not presently needed elsewhere. Zongliang Quan and Tom Lane Discussion: https://postgr.es/m/f293b91d-1d46-d386-b6bb-4b06ff5c667b@yeah.net
* Improve performance of ORDER BY / DISTINCT aggregatesDavid Rowley2022-08-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ORDER BY / DISTINCT aggreagtes have, since implemented in Postgres, been executed by always performing a sort in nodeAgg.c to sort the tuples in the current group into the correct order before calling the transition function on the sorted tuples. This was not great as often there might be an index that could have provided pre-sorted input and allowed the transition functions to be called as the rows come in, rather than having to store them in a tuplestore in order to sort them once all the tuples for the group have arrived. Here we change the planner so it requests a path with a sort order which supports the most amount of ORDER BY / DISTINCT aggregate functions and add new code to the executor to allow it to support the processing of ORDER BY / DISTINCT aggregates where the tuples are already sorted in the correct order. Since there can be many ORDER BY / DISTINCT aggregates in any given query level, it's very possible that we can't find an order that suits all of these aggregates. The sort order that the planner chooses is simply the one that suits the most aggregate functions. We take the most strictly sorted variation of each order and see how many aggregate functions can use that, then we try again with the order of the remaining aggregates to see if another order would suit more aggregate functions. For example: SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ... would request the sort order to be {a, b} because {a} is a subset of the sort order of {a,b}, but; SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ... would just pick a plan ordered by {a} (we give precedence to aggregates which are earlier in the targetlist). SELECT agg(a ORDER BY a),agg2(a ORDER BY b),agg3(a ORDER BY b) ... would choose to order by {b} since two aggregates suit that vs just one that requires input ordered by {a}. Author: David Rowley Reviewed-by: Ronan Dunklau, James Coleman, Ranier Vilela, Richard Guo, Tom Lane Discussion: https://postgr.es/m/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com
* Relax overly strict rules in select_outer_pathkeys_for_merge()David Rowley2022-08-02
| | | | | | | | | | | | | | | | | | The select_outer_pathkeys_for_merge function made an attempt to build the merge join pathkeys in the same order as query_pathkeys. This was done as it may have led to no sort being required for an ORDER BY or GROUP BY clause in the upper planner. However, this restriction seems overly strict as it required that we match the query_pathkeys entirely or we don't bother putting the merge join pathkeys in that order. Here we relax this rule so that we use a prefix of the query_pathkeys providing that prefix matches all of the join quals. This may provide the upper planner with partially sorted input which will allow the use of incremental sorts instead of full sorts. Author: David Rowley Reviewed-by: Richard Guo Discussion: https://postgr.es/m/CAApHDvrtZu0PHVfDPFM4Yx3jNR2Wuwosv+T2zqa7LrhhBr2rRg@mail.gmail.com
* Use list_copy_head() instead of list_truncate(list_copy(...), ...)David Rowley2022-07-13
| | | | | | | | | | | | | | | | Truncating off the end of a freshly copied List is not a very efficient way of copying the first N elements of a List. In many of the cases that are updated here, the pattern was only being used to remove the final element of a List. That's about the best case for it, but there were many instances where the truncate trimming the List down much further. 4cc832f94 added list_copy_head(), so let's use it in cases where it's useful. Author: David Rowley Discussion: https://postgr.es/m/1986787.1657666922%40sss.pgh.pa.us
* Tidy up code in get_cheapest_group_keys_order()David Rowley2022-07-13
| | | | | | | | | | | | | | | | | | | | | | | There are a few things that we could do a little better within get_cheapest_group_keys_order(): 1. We should be using list_free() rather than pfree() on a List. 2. We should use for_each_from() instead of manually coding a for loop to skip the first n elements of a List 3. list_truncate(list_copy(...), n) is not a great way to copy the first n elements of a list. Let's invent list_copy_head() for that. That way we don't need to copy the entire list just to truncate it directly afterwards. 4. We can simplify finding the cheapest cost by setting the cheapest cost variable to DBL_MAX. That allows us to skip special-casing the initial iteration of the loop. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvrGyL3ft8waEkncG9y5HDMu5TFFJB1paoTC8zi9YK97Nw@mail.gmail.com Backpatch-through: 15, where get_cheapest_group_keys_order was added.
* Pre-beta mechanical code beautification.Tom Lane2022-05-12
| | | | | Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.
* Fix various typos and spelling mistakes in code commentsDavid Rowley2022-04-11
| | | | | Author: Justin Pryzby Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
* Optimize order of GROUP BY keysTomas Vondra2022-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When evaluating a query with a multi-column GROUP BY clause using sort, the cost may be heavily dependent on the order in which the keys are compared when building the groups. Grouping does not imply any ordering, so we're allowed to compare the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg, we simply compared keys in the order as specified in the query. This commit explores alternative ordering of the keys, trying to find a cheaper one. In principle, we might generate grouping paths for all permutations of the keys, and leave the rest to the optimizer. But that might get very expensive, so we try to pick only a couple interesting orderings based on both local and global information. When planning the grouping path, we explore statistics (number of distinct values, cost of the comparison function) for the keys and reorder them to minimize comparison costs. Intuitively, it may be better to perform more expensive comparisons (for complex data types etc.) last, because maybe the cheaper comparisons will be enough. Similarly, the higher the cardinality of a key, the lower the probability we’ll need to compare more keys. The patch generates and costs various orderings, picking the cheapest ones. The ordering of group keys may interact with other parts of the query, some of which may not be known while planning the grouping. E.g. there may be an explicit ORDER BY clause, or some other ordering-dependent operation, higher up in the query, and using the same ordering may allow using either incremental sort or even eliminate the sort entirely. The patch generates orderings and picks those minimizing the comparison cost (for various pathkeys), and then adds orderings that might be useful for operations higher up in the plan (ORDER BY, etc.). Finally, it always keeps the ordering specified in the query, on the assumption the user might have additional insights. This introduces a new GUC enable_group_by_reordering, so that the optimization may be disabled if needed. The original patch was proposed by Teodor Sigaev, and later improved and reworked by Dmitry Dolgov. Reviews by a number of people, including me, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed and Zhihong Yu. Author: Dmitry Dolgov, Teodor Sigaev, Tomas Vondra Reviewed-by: Tomas Vondra, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed, Zhihong Yu Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru Discussion: https://postgr.es/m/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Allow ordered partition scans in more casesDavid Rowley2021-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 959d00e9d added the ability to make use of an Append node instead of a MergeAppend when we wanted to perform a scan of a partitioned table and the required sort order was the same as the partitioned keys and the partitioned table was defined in such a way that earlier partitions were guaranteed to only contain lower-order values than later partitions. However, previously we didn't allow these ordered partition scans for LIST partitioned table when there were any partitions that allowed multiple Datums. This was a very cheap check to make and we could likely have done a little better by checking if there were interleaved partitions, but at the time we didn't have visibility about which partitions were pruned, so we still may have disallowed cases where all interleaved partitions were pruned. Since 475dbd0b7, we now have knowledge of pruned partitions, we can do a much better job inside partitions_are_ordered(). Here we pass which partitions survived partition pruning into partitions_are_ordered() and, for LIST partitioning, have it check to see if any live partitions exist that are also in the new "interleaved_parts" field defined in PartitionBoundInfo. For RANGE partitioning we can relax the code which caused the partitions to be unordered if a DEFAULT partition existed. Since we now know which partitions were pruned, partitions_are_ordered() now returns true when the DEFAULT partition was pruned. Reviewed-by: Amit Langote, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvrdoN_sXU52i=QDXe2k3WAo=EVry29r2+Tq2WYcn2xhEA@mail.gmail.com
* Fix pull_varnos' miscomputation of relids set for a PlaceHolderVar.Tom Lane2021-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, pull_varnos() took the relids of a PlaceHolderVar as being equal to the relids in its contents, but that fails to account for the possibility that we have to postpone evaluation of the PHV due to outer joins. This could result in a malformed plan. The known cases end up triggering the "failed to assign all NestLoopParams to plan nodes" sanity check in createplan.c, but other symptoms may be possible. The right value to use is the join level we actually intend to evaluate the PHV at. We can get that from the ph_eval_at field of the associated PlaceHolderInfo. However, there are some places that call pull_varnos() before the PlaceHolderInfos have been created; in that case, fall back to the conservative assumption that the PHV will be evaluated at its syntactic level. (In principle this might result in missing some legal optimization, but I'm not aware of any cases where it's an issue in practice.) Things are also a bit ticklish for calls occurring during deconstruct_jointree(), but AFAICS the ph_eval_at fields should have reached their final values by the time we need them. The main problem in making this work is that pull_varnos() has no way to get at the PlaceHolderInfos. We can fix that easily, if a bit tediously, in HEAD by passing it the planner "root" pointer. In the back branches that'd cause an unacceptable API/ABI break for extensions, so leave the existing entry points alone and add new ones with the additional parameter. (If an old entry point is called and encounters a PHV, it'll fall back to using the syntactic level, again possibly missing some valid optimization.) Back-patch to v12. The computation is surely also wrong before that, but it appears that we cannot reach a bad plan thanks to join order restrictions imposed on the subquery that the PlaceHolderVar came from. The error only became reachable when commit 4be058fe9 allowed trivial subqueries to be collapsed out completely, eliminating their join order restrictions. Per report from Stephan Springl. Discussion: https://postgr.es/m/171041.1610849523@sss.pgh.pa.us
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Initial pgindent and pgperltidy run for v13.Tom Lane2020-05-14
| | | | | | | | | | | Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit c9d297751, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.
* Implement Incremental SortTomas Vondra2020-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Incremental Sort is an optimized variant of multikey sort for cases when the input is already sorted by a prefix of the requested sort keys. For example when the relation is already sorted by (key1, key2) and we need to sort it by (key1, key2, key3) we can simply split the input rows into groups having equal values in (key1, key2), and only sort/compare the remaining column key3. This has a number of benefits: - Reduced memory consumption, because only a single group (determined by values in the sorted prefix) needs to be kept in memory. This may also eliminate the need to spill to disk. - Lower startup cost, because Incremental Sort produce results after each prefix group, which is beneficial for plans where startup cost matters (like for example queries with LIMIT clause). We consider both Sort and Incremental Sort, and decide based on costing. The implemented algorithm operates in two different modes: - Fetching a minimum number of tuples without check of equality on the prefix keys, and sorting on all columns when safe. - Fetching all tuples for a single prefix group and then sorting by comparing only the remaining (non-prefix) keys. We always start in the first mode, and employ a heuristic to switch into the second mode if we believe it's beneficial - the goal is to minimize the number of unnecessary comparions while keeping memory consumption below work_mem. This is a very old patch series. The idea was originally proposed by Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the patch was taken over by James Coleman, who wrote and rewrote most of the current code. There were many reviewers/contributors since 2013 - I've done my best to pick the most active ones, and listed them in this commit message. Author: James Coleman, Alexander Korotkov Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Speed up finding EquivalenceClasses for a given set of relsDavid Rowley2019-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously in order to determine which ECs a relation had members in, we had to loop over all ECs stored in PlannerInfo's eq_classes and check if ec_relids mentioned the relation. For the most part, this was fine, as generally, unless queries were fairly complex, the overhead of performing the lookup would have not been that significant. However, when queries contained large numbers of joins and ECs, the overhead to find the set of classes matching a given set of relations could become a significant portion of the overall planning effort. Here we allow a much more efficient method to access the ECs which match a given relation or set of relations. A new Bitmapset field in RelOptInfo now exists to store the indexes into PlannerInfo's eq_classes list which each relation is mentioned in. This allows very fast lookups to find all ECs belonging to a single relation. When we need to lookup ECs belonging to a given pair of relations, we can simply bitwise-AND the Bitmapsets from each relation and use the result to perform the lookup. We also take the opportunity to write a new implementation of generate_join_implied_equalities which makes use of the new indexes. generate_join_implied_equalities_for_ecs must remain as is as it can be given a custom list of ECs, which we can't easily determine the indexes of. This was originally intended to fix the performance penalty of looking up foreign keys matching a join condition which was introduced by 100340e2d. However, we're speeding up much more than just that here. Author: David Rowley, Tom Lane Reviewed-by: Tom Lane, Tomas Vondra Discussion: https://postgr.es/m/6970.1545327857@sss.pgh.pa.us
* Represent Lists as expansible arrays, not chains of cons-cells.Tom Lane2019-07-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally, Postgres Lists were a more or less exact reimplementation of Lisp lists, which consist of chains of separately-allocated cons cells, each having a value and a next-cell link. We'd hacked that once before (commit d0b4399d8) to add a separate List header, but the data was still in cons cells. That makes some operations -- notably list_nth() -- O(N), and it's bulky because of the next-cell pointers and per-cell palloc overhead, and it's very cache-unfriendly if the cons cells end up scattered around rather than being adjacent. In this rewrite, we still have List headers, but the data is in a resizable array of values, with no next-cell links. Now we need at most two palloc's per List, and often only one, since we can allocate some values in the same palloc call as the List header. (Of course, extending an existing List may require repalloc's to enlarge the array. But this involves just O(log N) allocations not O(N).) Of course this is not without downsides. The key difficulty is that addition or deletion of a list entry may now cause other entries to move, which it did not before. For example, that breaks foreach() and sister macros, which historically used a pointer to the current cons-cell as loop state. We can repair those macros transparently by making their actual loop state be an integer list index; the exposed "ListCell *" pointer is no longer state carried across loop iterations, but is just a derived value. (In practice, modern compilers can optimize things back to having just one loop state value, at least for simple cases with inline loop bodies.) In principle, this is a semantics change for cases where the loop body inserts or deletes list entries ahead of the current loop index; but I found no such cases in the Postgres code. The change is not at all transparent for code that doesn't use foreach() but chases lists "by hand" using lnext(). The largest share of such code in the backend is in loops that were maintaining "prev" and "next" variables in addition to the current-cell pointer, in order to delete list cells efficiently using list_delete_cell(). However, we no longer need a previous-cell pointer to delete a list cell efficiently. Keeping a next-cell pointer doesn't work, as explained above, but we can improve matters by changing such code to use a regular foreach() loop and then using the new macro foreach_delete_current() to delete the current cell. (This macro knows how to update the associated foreach loop's state so that no cells will be missed in the traversal.) There remains a nontrivial risk of code assuming that a ListCell * pointer will remain good over an operation that could now move the list contents. To help catch such errors, list.c can be compiled with a new define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents whenever that could possibly happen. This makes list operations significantly more expensive so it's not normally turned on (though it is on by default if USE_VALGRIND is on). There are two notable API differences from the previous code: * lnext() now requires the List's header pointer in addition to the current cell's address. * list_delete_cell() no longer requires a previous-cell argument. These changes are somewhat unfortunate, but on the other hand code using either function needs inspection to see if it is assuming anything it shouldn't, so it's not all bad. Programmers should be aware of these significant performance changes: * list_nth() and related functions are now O(1); so there's no major access-speed difference between a list and an array. * Inserting or deleting a list element now takes time proportional to the distance to the end of the list, due to moving the array elements. (However, it typically *doesn't* require palloc or pfree, so except in long lists it's probably still faster than before.) Notably, lcons() used to be about the same cost as lappend(), but that's no longer true if the list is long. Code that uses lcons() and list_delete_first() to maintain a stack might usefully be rewritten to push and pop at the end of the list rather than the beginning. * There are now list_insert_nth...() and list_delete_nth...() functions that add or remove a list cell identified by index. These have the data-movement penalty explained above, but there's no search penalty. * list_concat() and variants now copy the second list's data into storage belonging to the first list, so there is no longer any sharing of cells between the input lists. The second argument is now declared "const List *" to reflect that it isn't changed. This patch just does the minimum needed to get the new implementation in place and fix bugs exposed by the regression tests. As suggested by the foregoing, there's a fair amount of followup work remaining to do. Also, the ENABLE_LIST_COMPAT macros are finally removed in this commit. Code using those should have been gone a dozen years ago. Patch by me; thanks to David Rowley, Jesper Pedersen, and others for review. Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
* Phase 2 pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | | Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
* Repair issues with faulty generation of merge-append plans.Tom Lane2019-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | create_merge_append_plan failed to honor the CP_EXACT_TLIST flag: it would generate the expected targetlist but then it felt free to add resjunk sort targets to it. This demonstrably leads to assertion failures in v11 and HEAD, and it's probably just accidental that we don't see the same in older branches. I've not looked into whether there would be any real-world consequences in non-assert builds. In HEAD, create_append_plan has sprouted the same problem, so fix that too (although we do not have any test cases that seem able to reach that bug). This is an oversight in commit 3fc6e2d7f which invented the CP_EXACT_TLIST flag, so back-patch to 9.6 where that came in. convert_subquery_pathkeys would create pathkeys for subquery output values if they match any EquivalenceClass known in the outer query and are available in the subquery's syntactic targetlist. However, the second part of that condition is wrong, because such values might not appear in the subquery relation's reltarget list, which would mean that they couldn't be accessed above the level of the subquery scan. We must check that they appear in the reltarget list, instead. This can lead to dropping knowledge about the subquery's sort ordering, but I believe it's okay, because any sort key that the outer query actually has any interest in would appear in the reltarget list. This second issue is of very long standing, but right now there's no evidence that it causes observable problems before 9.6, so I refrained from back-patching further than that. We can revisit that choice if somebody finds a way to make it cause problems in older branches. (Developing useful test cases for these issues is really problematic; fixing convert_subquery_pathkeys removes the only known way to exhibit the create_merge_append_plan bug, and neither of the test cases added by this patch causes a problem in all branches, even when considering the issues separately.) The second issue explains bug #15795 from Suresh Kumar R ("could not find pathkey item to sort" with nested DISTINCT queries). I stumbled across the first issue while investigating that. Discussion: https://postgr.es/m/15795-fadb56c8e44ee73c@postgresql.org
* Use Append rather than MergeAppend for scanning ordered partitions.Tom Lane2019-04-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we need ordered output from a scan of a partitioned table, but the ordering matches the partition ordering, then we don't need to use a MergeAppend to combine the pre-ordered per-partition scan results: a plain Append will produce the same results. This both saves useless comparison work inside the MergeAppend proper, and allows us to start returning tuples after istarting up just the first child node not all of them. However, all is not peaches and cream, because if some of the child nodes have high startup costs then there will be big discontinuities in the tuples-returned-versus-elapsed-time curve. The planner's cost model cannot handle that (yet, anyway). If we model the Append's startup cost as being just the first child's startup cost, we may drastically underestimate the cost of fetching slightly more tuples than are available from the first child. Since we've had bad experiences with over-optimistic choices of "fast start" plans for ORDER BY LIMIT queries, that seems scary. As a klugy workaround, set the startup cost estimate for an ordered Append to be the sum of its children's startup costs (as MergeAppend would). This doesn't really describe reality, but it's less likely to cause a bad plan choice than an underestimated startup cost would. In practice, the cases where we really care about this optimization will have child plans that are IndexScans with zero startup cost, so that the overly conservative estimate is still just zero. David Rowley, reviewed by Julien Rouhaud and Antonin Houska Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
* Refactor planner's header files.Tom Lane2019-01-29
| | | | | | | | | | | | | | | | | | | | | | | | Create a new header optimizer/optimizer.h, which exposes just the planner functions that can be used "at arm's length", without need to access Paths or the other planner-internal data structures defined in nodes/relation.h. This is intended to provide the whole planner API seen by most of the rest of the system; although FDWs still need to use additional stuff, and more thought is also needed about just what selfuncs.c should rely on. The main point of doing this now is to limit the amount of new #include baggage that will be needed by "planner support functions", which I expect to introduce later, and which will be in relevant datatype modules rather than anywhere near the planner. This commit just moves relevant declarations into optimizer.h from other header files (a couple of which go away because everything got moved), and adjusts #include lists to match. There's further cleanup that could be done if we want to decide that some stuff being exposed by optimizer.h doesn't belong in the planner at all, but I'll leave that for another day. Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4