aboutsummaryrefslogtreecommitdiff
path: root/src/backend/optimizer/path/allpaths.c
Commit message (Collapse)AuthorAge
* Fix tuple_fraction calculation in generate_orderedappend_paths()Alexander Korotkov2025-05-18
| | | | | | | | | | | | | | | | | | | 6b94e7a6da adjusted generate_orderedappend_paths() to consider fractional paths. However, it didn't manage to interpret the tuple_fraction value correctly. According to the header comment of grouping_planner(), the tuple_fraction >= 1 specifies the absolute number of expected tuples. That number must be divided by the expected total number of tuples to get the actual fraction. Even though this is a bug fix, we don't backpatch it. The risks of the side effects of plan changes on stable branches are too high. Reported-by: Andrei Lepikhov <lepihov@gmail.com> Discussion: https://postgr.es/m/3ca271fa-ca5c-458c-8934-eb148622b270%40gmail.com Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
* Relax ordering-related hardcoded btree requirements in planningPeter Eisentraut2025-04-06
| | | | | | | | | | | | | | | | | | | | There were several places in ordering-related planning where a requirement for btree was hardcoded but an amcanorder index could suffice. This fixes that. We just need to do the necessary mapping between strategy numbers and compare types and adjust some related APIs so that this works independent of btree strategy numbers. For instance, non-btree amcanorder indexes can now be used to support sorting and merge joins. Also, predtest.c works independent of btree strategy numbers now. To avoid performance regressions, some details on btree and other built-in index types are still hardcoded as shortcuts, but other index types now have access to the same features by providing the required flags and callbacks. Author: Mark Dilger <mark.dilger@enterprisedb.com> Co-authored-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
* Teach Append to consider tuple_fraction when accumulating subpaths.Alexander Korotkov2025-03-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | This change is dedicated to more active usage of IndexScan and parameterized NestLoop paths in partitioned cases under an Append node, as it already works with plain tables. As newly added regression tests demonstrate, it should provide more smartness to the partitionwise technique. With an indication of how many tuples are needed, it may be more meaningful to use the 'fractional branch' subpaths of the Append path list, which are more optimal for this specific number of tuples. Planning on a higher level, if the optimizer needs all the tuples, it will choose non-fractional paths. In the case when, during execution, Append needs to return fewer tuples than declared by tuple_fraction, it would not be harmful to use the 'intermediate' variant of paths. However, it will earn a considerable profit if a sensible set of tuples is selected. The change of the existing regression test demonstrates the positive outcome of this feature: instead of scanning the whole table, the optimizer prefers to use a parameterized scan, being aware of the only single tuple the join has to produce to perform the query. Discussion: https://www.postgresql.org/message-id/flat/CAN-LCVPxnWB39CUBTgOQ9O7Dd8DrA_tpT1EY3LNVnUuvAX1NjA%40mail.gmail.com Author: Nikita Malakhov <hukutoc@gmail.com> Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Andy Fan <zhihuifan1213@163.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
* Adjust tuples estimate for appendrelsRichard Guo2025-02-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | In set_append_rel_size(), we currently set rel->tuples to rel->rows for an appendrel. Generally, rel->tuples is the raw number of tuples in the relation and rel->rows is the estimated number of tuples after the relation's restriction clauses have been applied. Although an appendrel itself doesn't directly enforce any quals today, its child relations may. Therefore, setting rel->tuples equal to rel->rows for an appendrel isn't always appropriate. Doing so can lead to issues in cost estimates in some cases. For instance, when estimating the number of distinct values from an appendrel, we would not be able to adjust the estimate based on the restriction selectivity. This patch addresses this by setting an appendrel's tuples to the total number of tuples accumulated from each live child, which better aligns with reality. This is arguably a bug, but nobody has complained about that until now, so no back-patch. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru> Discussion: https://postgr.es/m/CAMbWs4_TG_+kVn6fjG-5GYzzukrNK57=g9eUo4gsrUG26OFawg@mail.gmail.com
* Add OLD/NEW support to RETURNING in DML queries.Dean Rasheed2025-01-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows the RETURNING list of INSERT/UPDATE/DELETE/MERGE queries to explicitly return old and new values by using the special aliases "old" and "new", which are automatically added to the query (if not already defined) while parsing its RETURNING list, allowing things like: RETURNING old.colname, new.colname, ... RETURNING old.*, new.* Additionally, a new syntax is supported, allowing the names "old" and "new" to be changed to user-supplied alias names, e.g.: RETURNING WITH (OLD AS o, NEW AS n) o.colname, n.colname, ... This is useful when the names "old" and "new" are already defined, such as inside trigger functions, allowing backwards compatibility to be maintained -- the interpretation of any existing queries that happen to already refer to relations called "old" or "new", or use those as aliases for other relations, is not changed. For an INSERT, old values will generally be NULL, and for a DELETE, new values will generally be NULL, but that may change for an INSERT with an ON CONFLICT ... DO UPDATE clause, or if a query rewrite rule changes the command type. Therefore, we put no restrictions on the use of old and new in any DML queries. Dean Rasheed, reviewed by Jian He and Jeff Davis. Discussion: https://postgr.es/m/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Introduce an RTE for the grouping stepRichard Guo2024-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there are subqueries in the grouping expressions, each of these subqueries in the targetlist and HAVING clause is expanded into distinct SubPlan nodes. As a result, only one of these SubPlan nodes would be converted to reference to the grouping key column output by the Agg node; others would have to get evaluated afresh. This is not efficient, and with grouping sets this can cause wrong results issues in cases where they should go to NULL because they are from the wrong grouping set. Furthermore, during re-evaluation, these SubPlan nodes might use nulled column values from grouping sets, which is not correct. This issue is not limited to subqueries. For other types of expressions that are part of grouping items, if they are transformed into another form during preprocessing, they may fail to match lower target items. This can also lead to wrong results with grouping sets. To fix this issue, we introduce a new kind of RTE representing the output of the grouping step, with columns that are the Vars or expressions being grouped on. In the parser, we replace the grouping expressions in the targetlist and HAVING clause with Vars referencing this new RTE, so that the output of the parser directly expresses the semantic requirement that the grouping expressions be gotten from the grouping output rather than computed some other way. In the planner, we first preprocess all the columns of this new RTE and then replace any Vars in the targetlist and HAVING clause that reference this new RTE with the underlying grouping expressions, so that we will have only one instance of a SubPlan node for each subquery contained in the grouping expressions. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com
* Fix rowcount estimate for gather (merge) pathsRichard Guo2024-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case of a parallel plan, when computing the number of tuples processed per worker, we divide the total number of tuples by the parallel_divisor obtained from get_parallel_divisor(), which accounts for the leader's contribution in addition to the number of workers. Accordingly, when estimating the number of tuples for gather (merge) nodes, we should multiply the number of tuples per worker by the same parallel_divisor to reverse the division. However, currently we use parallel_workers rather than parallel_divisor for the multiplication. This could result in an underestimation of the number of tuples for gather (merge) nodes, especially when there are fewer than four workers. This patch fixes this issue by using the same parallel_divisor for the multiplication. There is one ensuing plan change in the regression tests, but it looks reasonable and does not compromise its original purpose of testing parallel-aware hash join. In passing, this patch removes an unnecessary assignment for path.rows in create_gather_merge_path, and fixes an uninitialized-variable issue in generate_useful_gather_paths. No backpatch as this could result in plan changes. Author: Anthonin Bonnefoy Reviewed-by: Rafia Sabih, Richard Guo Discussion: https://postgr.es/m/CAO6_Xqr9+51NxgO=XospEkUeAg-p=EjAWmtpdcZwjRgGKJ53iA@mail.gmail.com
* Remove grotty use of disable_cost for TID scan plans.Robert Haas2024-07-22
| | | | | | | | | | | | | | | | | | | Previously, the code charged disable_cost for CurrentOfExpr, and then subtracted disable_cost from the cost of a TID path that used CurrentOfExpr as the TID qual, effectively disabling all paths except that one. Now, we instead suppress generation of the disabled paths entirely, and generate only the one that the executor will actually understand. With this approach, we do not need to rely on disable_cost being large enough to prevent the wrong path from being chosen, and we save some CPU cycle by avoiding generating paths that we can't actually use. In my opinion, the code is also easier to understand like this. Patch by me. Review by Heikki Linnakangas. Discussion: http://postgr.es/m/591b3596-2ea0-4b8e-99c6-fad0ef2801f5@iki.fi
* Re-allow planner to use Merge Append to efficiently implement UNION.Robert Haas2024-05-21
| | | | | | | | | | | This reverts commit 7204f35919b7e021e8d1bc9f2d76fd6bfcdd2070, thus restoring 66c0185a3 (Allow planner to use Merge Append to efficiently implement UNION) as well as the follow-on commits d5d2205c8, 3b1a7eb28, 7487044d6. Per further discussion on pgsql-release, we wish to ship beta1 with this feature, and patch the bug that was found just before wrap, rather than shipping beta1 with the feature reverted.
* Revert commit 66c0185a3 and follow-on patches.Tom Lane2024-05-20
| | | | | | | | | | | | | | | | | | | This reverts 66c0185a3 (Allow planner to use Merge Append to efficiently implement UNION) as well as the follow-on commits d5d2205c8, 3b1a7eb28, 7487044d6. In addition to those, 07746a8ef had to be removed then re-applied in a different place, because 66c0185a3 moved the relevant code. The reason for this last-minute thrashing is that depesz found a case in which the patched code creates a completely wrong plan that silently gives incorrect query results. It's unclear what the cause is or how many cases are affected, but with beta1 wrap staring us in the face, there's no time for closer investigation. After we figure that out, we can decide whether to un-revert this for beta2 or hold it for v18. Discussion: https://postgr.es/m/Zktzf926vslR35Fv@depesz.com (also some private discussion among pgsql-release)
* Fix query pullup issue with WindowClause runConditionDavid Rowley2024-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 94985c210 added code to detect when WindowFuncs were monotonic and allowed additional quals to be "pushed down" into the subquery to be used as WindowClause runConditions in order to short-circuit execution in nodeWindowAgg.c. The Node representation of runConditions wasn't well selected and because we do qual pushdown before planning the subquery, the planning of the subquery could perform subquery pull-up of nested subqueries. For WindowFuncs with args, the arguments could be changed after pushing the qual down to the subquery. This was made more difficult by the fact that the code duplicated the WindowFunc inside an OpExpr to include in the WindowClauses runCondition field. This could result in duplication of subqueries and a pull-up of such a subquery could result in another initplan parameter being issued for the 2nd version of the subplan. This could result in errors such as: ERROR: WindowFunc not found in subplan target lists To fix this, we change the node representation of these run conditions and instead of storing an OpExpr containing the WindowFunc in a list inside WindowClause, we now store a new node type named WindowFuncRunCondition within a new field in the WindowFunc. These get transformed into OpExprs later in planning once subquery pull-up has been performed. This problem did exist in v15 and v16, but that was fixed by 9d36b883b and e5d20bbd. Cat version bump due to new node type and modifying WindowFunc struct. Bug: #18305 Reported-by: Zuming Jiang Discussion: https://postgr.es/m/18305-33c49b4c830b37b3%40postgresql.org
* Fix assert failure when planning setop subqueries with CTEsDavid Rowley2024-04-02
| | | | | | | | | | | | | | | | | | | | | 66c0185a3 adjusted the UNION planner to request that union child queries produce Paths correctly ordered to implement the UNION by way of MergeAppend followed by Unique. The code there made a bad assumption that if the root->parent_root->parse had setOperations set that the query must be the child subquery of a set operation. That's not true when it comes to planning a non-inlined CTE which is parented by a set operation. This causes issues as the CTE's targetlist has no requirement to match up to the SetOperationStmt's groupClauses Fix this by adding a new parameter to both subquery_planner() and grouping_planner() to explicitly pass the SetOperationStmt only when planning set operation child subqueries. Thank you to Tom Lane for helping to rationalize the decision on the best function signature for subquery_planner(). Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/242fc7c6-a8aa-2daf-ac4c-0a231e2619c1@gmail.com
* Update comment in set_dummy_rel_pathlist().Tom Lane2024-03-28
| | | | | | | | | | This comment claimed that set_dummy_rel_pathlist() has callers other than (possibly indirectly) set_rel_size(). It doesn't, so revise the argument to not rely on that. Noted by Richard Guo. Discussion: https://postgr.es/m/CAMbWs4-KFEU_fDuJPNCOkUu3rwvZvKBEytkd9VrM4kH4-2h1CQ@mail.gmail.com
* Remove some redundant set_cheapest() calls.Tom Lane2024-03-26
| | | | | | | | | | | | | | Commit e2fa76d80 centralized the responsibility for doing set_cheapest() for a baserel, but these functions added later seemingly didn't get the memo. There's no apparent reason why we need the cheapest path for these relation types to be available any sooner than it is for other base relation types, so delete the duplicate calls. Doesn't save much since there's only one path in these cases, but it might improve clarity. Richard Guo Discussion: https://postgr.es/m/CAMbWs4-KFEU_fDuJPNCOkUu3rwvZvKBEytkd9VrM4kH4-2h1CQ@mail.gmail.com
* Propagate pathkeys from CTEs up to the outer query.Tom Lane2024-03-26
| | | | | | | | | | | | | | | | | | | | | | | If we know the sort order of a CTE's output, and it is relevant to the outer query, label the CTE's outer-query access path using those pathkeys. This may enable optimizations such as avoiding a sort in the outer query. The code for hoisting pathkeys into the outer query already exists for regular RTE_SUBQUERY subqueries, but it wasn't getting used for CTEs, possibly out of concern for maintaining an optimization fence between the CTE and the outer query. However, on the same arguments used for commit f7816aec2, there seems no harm in letting the outer query know what the inner query decided to do. In support of this, we now remember the best Path as well as Plan for each subquery for the rest of the planner run. There may be future applications for having that at hand, and it surely costs little to build one more List. Richard Guo (minor mods by me) Discussion: https://postgr.es/m/CAMbWs49xYd3f8CrE8-WW3--dV1zH_sDSDn-vs2DzHj81Wcnsew@mail.gmail.com
* Remove unused #include's from backend .c filesPeter Eisentraut2024-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
* Adjust reltarget assignment for UPPERREL_PARTIAL_DISTINCT relDavid Rowley2024-02-07
| | | | | | | | | | | | | | | | | | | | | | A comment in grouping_planner() claimed that the PlannerInfo upper_targets array was not used in core code. However, the code that generated the paths for the UPPERREL_PARTIAL_DISTINCT rel made that comment untrue. Here we adjust the create_distinct_paths() function signature to pass down the PathTarget the same as is done for create_grouping_paths(), thus making the aforementioned comment true again. In passing adjust the order of the upper_targets[] assignments. These seem to be following the reverse enum order apart from UPPERREL_PARTIAL_DISTINCT. Also, update the header comment for generate_gather_paths() to mention the function is also used to create gather paths for partial distinct paths. Author: Richard Guo, David Rowley Discussion: https://postgr.es/m/CAMbWs48u9VoVOouJsys1qOaC9WVGVmBa+wT1dx8KvxF5GPzezA@mail.gmail.com
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Add trailing commas to enum definitionsPeter Eisentraut2023-10-26
| | | | | | | | | | | | | | | | | | | | Since C99, there can be a trailing comma after the last value in an enum definition. A lot of new code has been introducing this style on the fly. Some new patches are now taking an inconsistent approach to this. Some add the last comma on the fly if they add a new last value, some are trying to preserve the existing style in each place, some are even dropping the last comma if there was one. We could nudge this all in a consistent direction if we just add the trailing commas everywhere once. I omitted a few places where there was a fixed "last" value that will always stay last. I also skipped the header files of libpq and ecpg, in case people want to use those with older compilers. There were also a small number of cases where the enum type wasn't used anywhere (but the enum values were), which ended up confusing pgindent a bit, so I left those alone. Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
* Replace has_multiple_baserels() with a bitmap test on all_baserels.Tom Lane2023-10-10
| | | | | | | | | | | | Since we added the PlannerInfo.all_baserels set, it's not really necessary to grovel over the rangetable to count baserels in the current query. So let's drop has_multiple_baserels() in favor of a bms_membership() test. This might be microscopically faster, but the main point is to remove some unnecessary code. Richard Guo Discussion: https://postgr.es/m/CAMbWs4_8RcSbbfs1ASZLrMuL0c0EQgXWcoLTQD8swBRY_pQQiA@mail.gmail.com
* Fix possible crash in add_paths_to_append_rel()David Rowley2023-10-10
| | | | | | | | | | | | | | While working on a8a968a82, I failed to consider that cheapest_startup_path can be NULL when there is no non-parameterized path in the pathlist. This is well documented in set_cheapest(), I just failed to notice. Here we adjust the code to just check if the RelOptInfo has a cheapest_startup_path set before adding it to the startup_subpaths list. Reported-by: Richard Guo Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs49w3t03V69XhdCuw+GDwivny4uQUxrkVp6Gejaspt0wMQ@mail.gmail.com
* Remove debug_print_rel and replace usages with pprintDavid Rowley2023-10-09
| | | | | | | | | | | | | | | | | | | | Going by c4a1933b4, b33ef397a and 05893712c (to name just a few), it seems that maintaining debug_print_rel() is often forgotten. In the case of c4a1933b4, it was several years before anyone noticed that a path type was not handled by debug_print_rel(). (debug_print_rel() is only compiled when building with OPTIMIZER_DEBUG). After a quick survey on the pgsql-hackers mailing list, nobody came forward to admit that they use OPTIMIZER_DEBUG. So to prevent any future maintenance neglect, let's just remove debug_print_rel() and have OPTIMIZER_DEBUG make use of pprint() instead (as suggested by Tom Lane). If anyone wants to come forward to claim they make use of OPTIMIZER_DEBUG in a way that they need debug_print_rel() then they have around 10 months remaining in the v17 cycle where we could revert this. If nobody comes forward in that time, then we can likely safely declare debug_print_rel() as not worth keeping. Discussion: https://postgr.es/m/CAApHDvoCdjo8Cu2zEZF4-AxWG-90S+pYXAnoDDa9J3xH-OrczQ@mail.gmail.com
* Consider cheap startup paths in add_paths_to_append_relDavid Rowley2023-10-05
| | | | | | | | | | | | 6b94e7a6d did this for ordered append paths to allow fast startup MergeAppends, however, nothing was done for the Append case. Here we adjust add_paths_to_append_rel() to have it build an AppendPath containing the cheapest startup paths from each of the child relations when the append rel has "consider_startup" set. Author: Andy Fan, David Rowley Discussion: https://www.postgresql.org/message-id/CAKU4AWrXSkUV=Pt-gRxQT7EbfUeNssprGyNsB=5mJibFZ6S3ww@mail.gmail.com
* C comment: add optimizer function referenceBruce Momjian2023-09-29
| | | | | | | | Reported-by: James Coleman Discussion: https://postgr.es/m/CAAaqYe9F6uoMhAr+8rMLwvGzaKaSknPA0Wi3Ehtv8pbSYmJq-Q@mail.gmail.com Backpatch-through: master
* Add missing TidRangePath handling in print_path()David Rowley2023-09-29
| | | | | | | | | | | | | Tid Range scans were added back in bb437f995. That commit forgot to add handling for TidRangePaths in print_path(). Only people building with OPTIMIZER_DEBUG might have noticed this, which likely is the reason it's taken 4 years for anyone to notice. Author: Andrey Lepikhov Reported-by: Andrey Lepikhov Discussion: https://postgr.es/m/379082d6-1b6a-4cd6-9ecf-7157d8c08635@postgrespro.ru Backpatch-through: 14, where bb437f995 was introduced
* Fix some typos and some incorrectly duplicated wordsDavid Rowley2023-04-18
| | | | | | Author: Justin Pryzby Reviewed-by: David Rowley Discussion: https://postgr.es/m/ZD3D1QxoccnN8A1V@telsasoft.com
* Fix incorrect logic for determining safe WindowAgg run conditionsDavid Rowley2023-03-17
| | | | | | | | | | | | | | | The logic added in 9d9c02ccd to determine when a qual can be used as a WindowClause run condition failed to correctly check for subqueries in the qual. This was being done correctly for normal subquery qual pushdowns, it's just that 9d9c02ccd failed to follow the lead on that. This also fixes various other cases where transforming the qual into a WindowClause run condition in the subquery should have been disallowed. Bug: #17826 Reported-by: Anban Company Discussion: https://postgr.es/m/17826-7d8750952f19a5f5@postgresql.org Backpatch-through: 15, where 9d9c02ccd was introduced.
* Support PlaceHolderVars in MERGE actions.Tom Lane2023-03-15
| | | | | | | | | | | | | preprocess_targetlist thought PHVs couldn't appear here. It was mistaken, as per report from Önder Kalacı. Surveying other pull_var_clause calls, I noted no similar errors, but I did notice that qual_is_pushdown_safe's assertion about !contain_window_function was pointless, because the following pull_var_clause call would complain about them anyway. In HEAD only, remove the redundant Assert and improve the commentary. Discussion: https://postgr.es/m/CACawEhUuum-gC_2S3sXLTcsk7bUSPSHOD+g1ZpfKaDK-KKPPWA@mail.gmail.com
* Work around implementation restriction in adjust_appendrel_attrs.Tom Lane2023-03-12
| | | | | | | | | | | | | | | | | | | | | | | adjust_appendrel_attrs can't transfer nullingrel labeling to a non-Var translation expression (mainly because it's too late to wrap such an expression in a PlaceHolderVar). I'd supposed in commit 2489d76c4 that that restriction was unreachable because we'd not attempt to push problematic clauses down to an appendrel child relation. I forgot that set_append_rel_size blindly converts all the parent rel's joininfo clauses to child clauses, and that list could well contain clauses from above a nulling outer join. We might eventually have to devise a direct fix for this implementation restriction, but for now it seems enough to filter out troublesome clauses while constructing the child's joininfo list. Such clauses are certainly not useful while constructing paths for the child rel; they'll have to be applied later when we join the completed appendrel to something else. So we don't need them here, and omitting them from the list should save a few cycles while processing the child rel. Per bug #17832 from Marko Tiikkaja. Discussion: https://postgr.es/m/17832-d0a8106cdf1b722e@postgresql.org
* Optimize generate_orderedappend_pathsDavid Rowley2023-02-20
| | | | | | | | | | | | | | | | | | In generate_orderedappend_paths(), when match_partition_order_desc was true, we would lcons() items to various lists in a loop over each live partition. When the number of live partitions was large, the lcons() could show up in profiles due to it having to perform memmove() to make way for the new list item. Here we adjust things so that we just perform the loop over the live partitions backwards when match_partition_order_desc is true. This allows us to simplify the logic in the loop. Now, as far as the guts of the loop knows, there's no difference between match_partition_order and match_partition_order_desc. We can just set match_partition_order to true so that we build the correct list of paths for the asc and desc case. Per idea from Andres Freund. Discussion: https://postgr.es/m/20230217002351.nyt4y5tdzg6hugdt@awork3.anarazel.de
* Do assorted mop-up in the planner.Tom Lane2023-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove RestrictInfo.nullable_relids, along with a good deal of infrastructure that calculated it. One use-case for it was in join_clause_is_movable_to, but we can now replace that usage with a check to see if the clause's relids include any outer join that can null the target relation. The other use-case was in join_clause_is_movable_into, but that test can just be dropped entirely now that the clause's relids include outer joins. Furthermore, join_clause_is_movable_into should now be accurate enough that it will accept anything returned by generate_join_implied_equalities, so we can restore the Assert that was diked out in commit 95f4e59c3. Remove the outerjoin_delayed mechanism. We needed this before to prevent quals from getting evaluated below outer joins that should null some of their vars. Now that we consider varnullingrels while placing quals, that's taken care of automatically, so throw the whole thing away. Teach remove_useless_result_rtes to also remove useless FromExprs. Having done that, the delay_upper_joins flag serves no purpose any more and we can remove it, largely reverting 11086f2f2. Use constant TRUE for "dummy" clauses when throwing back outer joins. This improves on a hack I introduced in commit 6a6522529. If we have a left-join clause l.x = r.y, and a WHERE clause l.x = constant, we generate r.y = constant and then don't really have a need for the join clause. But we must throw the join clause back anyway after marking it redundant, so that the join search heuristics won't think this is a clauseless join and avoid it. That was a kluge introduced under time pressure, and after looking at it I thought of a better way: let's just introduce constant-TRUE "join clauses" instead, and get rid of them at the end. This improves the generated plans for such cases by not having to test a redundant join clause. We can also get rid of the ugly hack used to mark such clauses as redundant for selectivity estimation. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
* Make Vars be outer-join-aware.Tom Lane2023-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally we used the same Var struct to represent the value of a table column everywhere in parse and plan trees. This choice predates our support for SQL outer joins, and it's really a pretty bad idea with outer joins, because the Var's value can depend on where it is in the tree: it might go to NULL above an outer join. So expression nodes that are equal() per equalfuncs.c might not represent the same value, which is a huge correctness hazard for the planner. To improve this, decorate Var nodes with a bitmapset showing which outer joins (identified by RTE indexes) may have nulled them at the point in the parse tree where the Var appears. This allows us to trust that equal() Vars represent the same value. A certain amount of klugery is still needed to cope with cases where we re-order two outer joins, but it's possible to make it work without sacrificing that core principle. PlaceHolderVars receive similar decoration for the same reason. In the planner, we include these outer join bitmapsets into the relids that an expression is considered to depend on, and in consequence also add outer-join relids to the relids of join RelOptInfos. This allows us to correctly perceive whether an expression can be calculated above or below a particular outer join. This change affects FDWs that want to plan foreign joins. They *must* follow suit when labeling foreign joins in order to match with the core planner, but for many purposes (if postgres_fdw is any guide) they'd prefer to consider only base relations within the join. To support both requirements, redefine ForeignScan.fs_relids as base+OJ relids, and add a new field fs_base_relids that's set up by the core planner. Large though it is, this commit just does the minimum necessary to install the new mechanisms and get check-world passing again. Follow-up patches will perform some cleanup. (The README additions and comments mention some stuff that will appear in the follow-up.) Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Remove pessimistic cost penalization from Incremental SortDavid Rowley2022-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When incremental sorts were added in v13 a 1.5x pessimism factor was added to the cost modal. Seemingly this was done because the cost modal only has an estimate of the total number of input rows and the number of presorted groups. It assumes that the input rows will be evenly distributed throughout the presorted groups. The 1.5x pessimism factor was added to slightly reduce the likelihood of incremental sorts being used in the hope to avoid performance regressions where an incremental sort plan was picked and turned out slower due to a large skew in the number of rows in the presorted groups. An additional quirk with the path generation code meant that we could consider both a sort and an incremental sort on paths with presorted keys. This meant that with the pessimism factor, it was possible that we opted to perform a sort rather than an incremental sort when the given path had presorted keys. Here we remove the 1.5x pessimism factor to allow incremental sorts to have a fairer chance at being chosen against a full sort. Previously we would generally create a sort path on the cheapest input path (if that wasn't sorted already) and incremental sort paths on any path which had presorted keys. This meant that if the cheapest input path wasn't completely sorted but happened to have presorted keys, we would create a full sort path *and* an incremental sort path on that input path. Here we change this logic so that if there are presorted keys, we only create an incremental sort path, and create sort paths only when a full sort is required. Both the removal of the cost pessimism factor and the changes made to the path generation make it more likely that incremental sorts will now be chosen. That, of course, as with teaching the planner any new tricks, means an increased likelihood that the planner will perform an incremental sort when it's not the best method. Our standard escape hatch for these cases is an enable_* GUC. enable_incremental_sort already exists for this. This came out of a report by Pavel Luzanov where he mentioned that the master branch was choosing to perform a Seq Scan -> Sort -> Group Aggregate for his query with an ORDER BY aggregate function. The v15 plan for his query performed an Index Scan -> Group Aggregate, of course, the aggregate performed the final sort internally in nodeAgg.c for the aggregate's ORDER BY. The ideal plan would have been to use the index, which provided partially sorted input then use an incremental sort to provide the aggregate with the sorted input. This was not being chosen due to the pessimism in the incremental sort cost modal, so here we remove that and rationalize the path generation so that sort and incremental sort plans don't have to needlessly compete. We assume that it's senseless to ever use a full sort on a given input path where an incremental sort can be performed. Reported-by: Pavel Luzanov Reviewed-by: Richard Guo Discussion: https://postgr.es/m/9f61ddbf-2989-1536-b31e-6459370a6baa%40postgrespro.ru
* Fix 32-bit build dangling pointer issue in WindowAggDavid Rowley2022-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 9d9c02ccd added window "run conditions", which allows the evaluation of monotonic window functions to be skipped when the run condition is no longer true. Prior to this commit, once the run condition was no longer true and we stopped evaluating the window functions, we simply just left the ecxt_aggvalues[] and ecxt_aggnulls[] arrays alone to store whatever value was stored there the last time the window function was evaluated. Leaving a stale value in there isn't really a problem on 64-bit builds as all of the window functions which we recognize as monotonic all return int8, which is passed by value on 64-bit builds. However, on 32-bit builds, this was a problem as the value stored in the ecxt_values[] element would be a by-ref value and it would be pointing to some memory which would get reset once the tuple context is destroyed. Since the WindowAgg node will output these values in the resulting tupleslot, this could be problematic for the top-level WindowAgg node which must look at these values to filter out the rows that don't meet its filter condition. Here we fix this by just zeroing the ecxt_aggvalues[] and setting the ecxt_aggnulls[] array to true when the run condition first becomes false. This results in the WindowAgg's output having NULLs for the WindowFunc's columns rather than the stale or pointer pointing to possibly freed memory. These tuples with the NULLs can only make it as far as the top-level WindowAgg node before they're filtered out. To ensure that these tuples *are* always filtered out, we now insist that OpExprs making up the run condition are strict OpExprs. Currently, all the window functions which the planner recognizes as monotonic return INT8 and the operator which is used for the run condition must be a member of a btree opclass. In reality, these restrictions exclude nothing that's built-in to Postgres and are unlikely to exclude anyone's custom operators due to the requirement that the operator is part of a btree opclass. It would be unusual if those were not strict. Reported-by: Sergey Shinderuk, using valgrind Reviewed-by: Richard Guo, Sergey Shinderuk Discussion: https://postgr.es/m/29184c50-429a-ebd7-f1fb-0589c6723a35@postgrespro.ru Backpatch-through: 15, where 9d9c02ccd was added
* Fix generate_partitionwise_join_paths() to tolerate failure.Tom Lane2022-12-04
| | | | | | | | | | | | | | | | | | | | | | | | | We might fail to generate a partitionwise join, because reparameterize_path_by_child() does not support all path types. This should not be a hard failure condition: we should just fall back to a non-partitioned join. However, generate_partitionwise_join_paths did not consider this possibility and would emit the (misleading) error "could not devise a query plan for the given query" if we'd failed to make any paths for a child join. Fix it to give up on partitionwise joining if so. (The accepted technique for giving up appears to be to set rel->nparts = 0, which I find pretty bizarre, but there you have it.) I have not added a test case because there'd be little point: any omissions of this sort that we identify would soon get fixed by extending reparameterize_path_by_child(), so the test would stop proving anything. However, right now there is a known test case based on failure to cover MaterialPath, and with that I've found that this is broken in all supported versions. Hence, patch all the way back. Original report and patch by me; thanks to Richard Guo for identifying a test case that works against committed versions. Discussion: https://postgr.es/m/1854233.1669949723@sss.pgh.pa.us
* Fix confusion about havingQual vs hasHavingQual in planner.Tom Lane2022-10-18
| | | | | | | | | | | | | | | | | | | | | | | Preprocessing of the HAVING clause will reduce havingQual to NIL if the clause is constant-TRUE. This is one case where that convention is rather unfortunate, because "HAVING TRUE" is not at all the same as not having any HAVING clause at all. (Per the SQL spec, it still forces the query to be grouped.) The planner deals with this by having a boolean hasHavingQual that records whether havingQual was originally nonempty; places that just want to check whether HAVING was specified are supposed to consult that. I found three places that got that wrong. Fortunately, these could only affect cost estimates not correctness. It'd be hard even to demonstrate the errors; for example, the one in allpaths.c would only matter in a query that has HAVING TRUE but no GROUP BY and no aggregates, which would require a completely variable-free SELECT list, making the case probably of only academic interest. Hence, while these are worth fixing before someone copies the incorrect coding somewhere more critical, they don't seem worth back-patching. I didn't bother trying to devise regression tests, either. Discussion: https://postgr.es/m/2503888.1666042643@sss.pgh.pa.us
* Fix failure to set correct operator in window run conditionDavid Rowley2022-08-05
| | | | | | | | | | | | This was a simple omission in 9d9c02ccd where the code didn't correctly set the operator to use in the run condition OpExpr when the window function was both monotonically increasing and decreasing. Bug discovered by Julien Roze, although he did not report it. Reported-by: Phil Florent Discussion: https://postgr.es/m/PA4P191MB160009A09B9D0624359278CFBA9F9@PA4P191MB1600.EURP191.PROD.OUTLOOK.COM Backpatch-through: 15, where 9d9c02ccd was added
* Fix incorrect is-this-the-topmost-join tests in parallel planning.Tom Lane2022-07-30
| | | | | | | | | | | | | | | | | | | | Two callers of generate_useful_gather_paths were testing the wrong thing when deciding whether to call that function: they checked for being at the top of the current join subproblem, rather than being at the actual top join. This'd result in failing to construct parallel paths for a sub-join for which they might be useful. While set_rel_pathlist() isn't actively broken, it seems best to make its identical-in-intention test for this be like the other two. This has been wrong all along, but given the lack of field complaints I'm hesitant to back-patch into stable branches; we usually prefer to avoid non-bug-fix changes in plan choices in minor releases. It seems not too late for v15 though. Richard Guo, reviewed by Antonin Houska and Tom Lane Discussion: https://postgr.es/m/CAMbWs4-mH8Zf87-w+3P2J=nJB+5OyicO28ia9q_9o=Lamf_VHg@mail.gmail.com
* Remove fls(), use pg_leftmost_one_pos32() instead.Thomas Munro2022-07-22
| | | | | | | | | | | | | | Commit 4f658dc8 provided the traditional BSD fls() function in src/port/fls.c so it could be used in several places. Later we added a bunch of similar facilities in pg_bitutils.h, based on compiler builtins that map to hardware instructions. It's a bit confusing to have both 1-based and 0-based variants of this operation in use in different parts of the tree, and neither is blessed by a standard. Let's drop fls.c and the configure probe, and reuse the newer code. Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKG%2B7dSX1XF8yFGmYk-%3D48dbjH2kmzZj16XvhbrWP-9BzRg%40mail.gmail.com
* Estimate cost of elided SubqueryScan, Append, MergeAppend nodes better.Tom Lane2022-07-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | setrefs.c contains logic to discard no-op SubqueryScan nodes, that is, ones that have no qual to check and copy the input targetlist unchanged. (Formally it's not very nice to be applying such optimizations so late in the planner, but there are practical reasons for it; mostly that we can't unify relids between the subquery and the parent query until we flatten the rangetable during setrefs.c.) This behavior falsifies our previous cost estimates, since we would've charged cpu_tuple_cost per row just to pass data through the node. Most of the time that's little enough to not matter, but there are cases where this effect visibly changes the plan compared to what you would've gotten with no sub-select. To improve the situation, make the callers of cost_subqueryscan tell it whether they think the targetlist is trivial. cost_subqueryscan already has the qual list, so it can check the other half of the condition easily. It could make its own determination of tlist triviality too, but doing so would be repetitive (for callers that may call it several times) or unnecessarily expensive (for callers that can determine this more cheaply than a general test would do). This isn't a 100% solution, because createplan.c also does things that can falsify any earlier estimate of whether the tlist is trivial. However, it fixes nearly all cases in practice, if results for the regression tests are anything to go by. setrefs.c also contains logic to discard no-op Append and MergeAppend nodes. We did have knowledge of that behavior at costing time, but somebody failed to update it when a check on parallel-awareness was added to the setrefs.c logic. Fix that while we're here. These changes result in two minor changes in query plans shown in our regression tests. Neither is relevant to the purposes of its test case AFAICT. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/2581077.1651703520@sss.pgh.pa.us
* Use list_copy_head() instead of list_truncate(list_copy(...), ...)David Rowley2022-07-13
| | | | | | | | | | | | | | | | Truncating off the end of a freshly copied List is not a very efficient way of copying the first N elements of a List. In many of the cases that are updated here, the pattern was only being used to remove the final element of a List. That's about the best case for it, but there were many instances where the truncate trimming the List down much further. 4cc832f94 added list_copy_head(), so let's use it in cases where it's useful. Author: David Rowley Discussion: https://postgr.es/m/1986787.1657666922%40sss.pgh.pa.us
* Teach remove_unused_subquery_outputs about window run conditionsDavid Rowley2022-05-27
| | | | | | | | | | | | | | | | | | | | | | | | | 9d9c02ccd added code to allow the executor to take shortcuts when quals on monotonic window functions guaranteed that once the qual became false it could never become true again. When possible, baserestrictinfo quals are converted to become these quals, which we call run conditions. Unfortunately, in 9d9c02ccd, I forgot to update remove_unused_subquery_outputs to teach it about these run conditions. This could cause a WindowFunc column which was unused in the target list but referenced by an upper-level WHERE clause to be removed from the subquery when the qual in the WHERE clause was converted into a window run condition. Because of this, the entire WindowClause would be removed from the query resulting in additional rows making it into the resultset when they should have been filtered out by the WHERE clause. Here we fix this by recording which target list items in the subquery have run conditions. That gets passed along to remove_unused_subquery_outputs to tell it not to remove these items from the target list. Bug: #17495 Reported-by: Jeremy Evans Reviewed-by: Richard Guo Discussion: https://postgr.es/m/17495-7ffe2fa0b261b9fa@postgresql.org
* Pre-beta mechanical code beautification.Tom Lane2022-05-12
| | | | | Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.
* Remove inadequate assertion check in CTE inlining.Tom Lane2022-04-21
| | | | | | | | | | | | | | | | | | | | inline_cte() expected to find exactly as many references to the target CTE as its cterefcount indicates. While that should be accurate for the tree as emitted by the parser, there are some optimizations that occur upstream of here that could falsify it, notably removal of unused subquery output expressions. Trying to make the accounting 100% accurate seems expensive and doomed to future breakage. It's not really worth it, because all this code is protecting is downstream assumptions that every referenced CTE has a plan. Let's convert those assertions to regular test-and-elog just in case there's some actual problem, and then drop the failing assertion. Per report from Tomas Vondra (thanks also to Richard Guo for analysis). Back-patch to v12 where the faulty code came in. Discussion: https://postgr.es/m/29196a1e-ed47-c7ca-9be2-b1c636816183@enterprisedb.com
* Teach planner and executor about monotonic window funcsDavid Rowley2022-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Window functions such as row_number() always return a value higher than the previously returned value for tuples in any given window partition. Traditionally queries such as; SELECT * FROM ( SELECT *, row_number() over (order by c) rn FROM t ) t WHERE rn <= 10; were executed fairly inefficiently. Neither the query planner nor the executor knew that once rn made it to 11 that nothing further would match the outer query's WHERE clause. It would blindly continue until all tuples were exhausted from the subquery. Here we implement means to make the above execute more efficiently. This is done by way of adding a pg_proc.prosupport function to various of the built-in window functions and adding supporting code to allow the support function to inform the planner if the window function is monotonically increasing, monotonically decreasing, both or neither. The planner is then able to make use of that information and possibly allow the executor to short-circuit execution by way of adding a "run condition" to the WindowAgg to allow it to determine if some of its execution work can be skipped. This "run condition" is not like a normal filter. These run conditions are only built using quals comparing values to monotonic window functions. For monotonic increasing functions, quals making use of the btree operators for <, <= and = can be used (assuming the window function column is on the left). You can see here that once such a condition becomes false that a monotonic increasing function could never make it subsequently true again. For monotonically decreasing functions the >, >= and = btree operators for the given type can be used for run conditions. The best-case situation for this is when there is a single WindowAgg node without a PARTITION BY clause. Here when the run condition becomes false the WindowAgg node can simply return NULL. No more tuples will ever match the run condition. It's a little more complex when there is a PARTITION BY clause. In this case, we cannot return NULL as we must still process other partitions. To speed this case up we pull tuples from the outer plan to check if they're from the same partition and simply discard them if they are. When we find a tuple belonging to another partition we start processing as normal again until the run condition becomes false or we run out of tuples to process. When there are multiple WindowAgg nodes to evaluate then this complicates the situation. For intermediate WindowAggs we must ensure we always return all tuples to the calling node. Any filtering done could lead to incorrect results in WindowAgg nodes above. For all intermediate nodes, we can still save some work when the run condition becomes false. We've no need to evaluate the WindowFuncs anymore. Other WindowAgg nodes cannot reference the value of these and these tuples will not appear in the final result anyway. The savings here are small in comparison to what can be saved in the top-level WingowAgg, but still worthwhile. Intermediate WindowAgg nodes never filter out tuples, but here we change WindowAgg so that the top-level WindowAgg filters out tuples that don't match the intermediate WindowAgg node's run condition. Such filters appear in the "Filter" clause in EXPLAIN for the top-level WindowAgg node. Here we add prosupport functions to allow the above to work for; row_number(), rank(), dense_rank(), count(*) and count(expr). It appears technically possible to do the same for min() and max(), however, it seems unlikely to be useful enough, so that's not done here. Bump catversion Author: David Rowley Reviewed-by: Andy Fan, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqvp3At8++yF8ij06sdcoo1S_b2YoaT9D4Nf+MObzsrLQ@mail.gmail.com
* Consider fractional paths in generate_orderedappend_pathsTomas Vondra2022-01-12
| | | | | | | | | | | | | | | | | | | When building append paths, we've been looking only at startup and total costs for the paths. When building fractional paths that may eliminate the cheapest one, because it may be dominated by two separate paths (one for startup, one for total cost). This extends generate_orderedappend_paths() to also consider which paths have lowest fractional cost. Currently we only consider paths matching pathkeys - in the future this may be improved by also considering paths that are only partially sorted, with an incremental sort on top. Original report of an issue by Arne Roland, patch by me (based on a suggestion by Tom Lane). Reviewed-by: Arne Roland, Zhihong Yu Discussion: https://postgr.es/m/e8f9ec90-546d-e948-acce-0525f3e92773%40enterprisedb.com Discussion: https://postgr.es/m/1581042da8044e71ada2d6e3a51bf7bb%40index.de
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Allow ordered partition scans in more casesDavid Rowley2021-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 959d00e9d added the ability to make use of an Append node instead of a MergeAppend when we wanted to perform a scan of a partitioned table and the required sort order was the same as the partitioned keys and the partitioned table was defined in such a way that earlier partitions were guaranteed to only contain lower-order values than later partitions. However, previously we didn't allow these ordered partition scans for LIST partitioned table when there were any partitions that allowed multiple Datums. This was a very cheap check to make and we could likely have done a little better by checking if there were interleaved partitions, but at the time we didn't have visibility about which partitions were pruned, so we still may have disallowed cases where all interleaved partitions were pruned. Since 475dbd0b7, we now have knowledge of pruned partitions, we can do a much better job inside partitions_are_ordered(). Here we pass which partitions survived partition pruning into partitions_are_ordered() and, for LIST partitioning, have it check to see if any live partitions exist that are also in the new "interleaved_parts" field defined in PartitionBoundInfo. For RANGE partitioning we can relax the code which caused the partitions to be unordered if a DEFAULT partition existed. Since we now know which partitions were pruned, partitions_are_ordered() now returns true when the DEFAULT partition was pruned. Reviewed-by: Amit Langote, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvrdoN_sXU52i=QDXe2k3WAo=EVry29r2+Tq2WYcn2xhEA@mail.gmail.com