aboutsummaryrefslogtreecommitdiff
path: root/src/backend/optimizer/util/relnode.c
Commit message (Collapse)AuthorAge
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Remove dead code in get_param_path_clause_serials()Richard Guo2024-11-26
| | | | | | | | | | | | | | | | The function get_param_path_clause_serials() is used to get the set of pushed-down clauses enforced within a parameterized Path. Since we don't currently support parameterized MergeAppend paths, and it doesn't look like that is going to change anytime soon (as explained in the comments for generate_orderedappend_paths), we don't need to consider MergeAppendPath in this function. This change won't make any measurable difference in performance; it's just for clarity's sake. Author: Richard Guo Reviewed-by: Andrei Lepikhov Discussion: https://postgr.es/m/CAMbWs4_Puie4DQ2ODvjQB_3CxYkUODnrJm8jn_ObMAcrjYNW7Q@mail.gmail.com
* Disallow partitionwise join when collations don't matchAmit Langote2024-11-08
| | | | | | | | | | | | | | | | If the collation of any join key column doesn’t match the collation of the corresponding partition key, partitionwise joins can yield incorrect results. For example, rows that would match under the join key collation might be located in different partitions due to the partitioning collation. In such cases, a partitionwise join would yield different results from a non-partitionwise join, so disallow it in such cases. Reported-by: Tender Wang <tndrwang@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com Backpatch-through: 12
* Fix partitionwise join with partially-redundant join clausesRichard Guo2024-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To determine if the two relations being joined can use partitionwise join, we need to verify the existence of equi-join conditions involving pairs of matching partition keys for all partition keys. Currently we do that by looking through the join's restriction clauses. However, it has been discovered that this approach is insufficient, because there might be partition keys known equal by a specific EC, but they do not form a join clause because it happens that other members of the EC than the partition keys are constrained to become a join clause. To address this issue, in addition to examining the join's restriction clauses, we also check if any partition keys are known equal by ECs, by leveraging function exprs_known_equal(). To accomplish this, we enhance exprs_known_equal() to check equality per the semantics of the opfamily, if provided. It could be argued that exprs_known_equal() could be called O(N^2) times, where N is the number of partition key expressions, resulting in noticeable performance costs if there are a lot of partition key expressions. But I think this is not a problem. The number of a joinrel's partition key expressions would only be equal to the join degree, since each base relation within the join contributes only one partition key expression. That is to say, it does not scale with the number of partitions. A benchmark with a query involving 5-way joins of partitioned tables, each with 3 partition keys and 1000 partitions, shows that the planning time is not significantly affected by this patch (within the margin of error), particularly when compared to the impact caused by partitionwise join. Thanks to Tom Lane for the idea of leveraging exprs_known_equal() to check if partition keys are known equal by ECs. Author: Richard Guo, Tom Lane Reviewed-by: Tom Lane, Ashutosh Bapat, Robert Haas Discussion: https://postgr.es/m/CAN_9JTzo_2F5dKLqXVtDX5V6dwqB0Xk+ihstpKEt3a1LT6X78A@mail.gmail.com
* Reduce memory used by partitionwise joinsRichard Guo2024-07-29
| | | | | | | | | | | | | | | | | | | In try_partitionwise_join, we aim to break down the join between two partitioned relations into joins between matching partitions. To achieve this, we iterate through each pair of partitions from the two joining relations and create child-join relations for them. With potentially thousands of partitions, the local objects allocated in each iteration can accumulate significant memory usage. Therefore, we opt to eagerly free these local objects at the end of each iteration. In line with this approach, this patch frees the bitmap set that represents the relids of child-join relations at the end of each iteration. Additionally, it modifies build_child_join_rel() to reuse the AppendRelInfo structures generated within each iteration. Author: Ashutosh Bapat Reviewed-by: David Christensen, Richard Guo Discussion: https://postgr.es/m/CAExHW5s4EqY43oB=ne6B2=-xLgrs9ZGeTr1NXwkGFt2j-OmaQQ@mail.gmail.com
* Ensure generated join clauses for child rels have correct relids.Tom Lane2024-04-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building a join clause derived from an EquivalenceClass, if the clause is to be used with an appendrel child relation then make sure its clause_relids include the relids of that child relation. Normally this would be true already because the EquivalenceMember would be a Var of that relation. However, if the appendrel represents a flattened UNION ALL construct then some child EquivalenceMembers could be constants with no relids. The resulting under-marked clause is problematic because it could mislead join_clause_is_movable_into about where the clause should be evaluated. We do not have an example showing incorrect plan generation, but there are existing cases in the regression tests that will fail the Asserts this patch adds to get_baserel_parampathinfo. A similarly wrong conclusion about a clause being considered by get_joinrel_parampathinfo would lead to wrong placement of the clause. (This also squares with the way that clause_relids is calculated for non-equijoin clauses in adjust_appendrel_attrs.) The other reason for wanting these new Asserts is that the previous blithe assumption that the results of generate_join_implied_equalities "necessarily satisfy join_clause_is_movable_into" turns out to be wrong pre-v16. If it's still wrong it'd be good to find out. Per bug #18429 from Benoît Ryder. The bug as filed was fixed by commit 2489d76c4, but these changes correlate with the fix we will need to apply in pre-v16 branches. Discussion: https://postgr.es/m/18429-8982d4a348cc86c6@postgresql.org
* Fix IS [NOT] NULL qual optimization for inheritance tablesDavid Rowley2024-04-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | b262ad440 added code to have the planner remove redundant IS NOT NULL quals and eliminate needless scans for IS NULL quals on tables where the qual's column has a NOT NULL constraint. That commit failed to consider that an inheritance parent table could have differing NOT NULL constraints between the parent and the child. This caused issues as if we eliminated a qual on the parent, when applying the quals to child tables in apply_child_basequals(), the qual might not have been added to the parent's baserestrictinfo. Here we fix this by not applying the optimization to remove redundant quals to RelOptInfos belonging to inheritance parents and applying the optimization again in apply_child_basequals(). Effectively, this means that the parent and child are considered independently as the parent has both an inh=true and inh=false RTE and we still apply the optimization to the RelOptInfo corresponding to the inh=false RTE. We're able to still apply the optimization in add_base_clause_to_rel() for partitioned tables as the NULLability of partitions must match that of their parent. And, if we ever expand restriction_is_always_false() and restriction_is_always_true() to handle partition constraints then we can apply the same logic as, even in multi-level partitioned tables, there's no way to route values to a partition when the qual does not match the partition qual of the partitioned table's parent partition. The same is true for CHECK constraints as those must also match between arent partitioned tables and their partitions. Author: Richard Guo, David Rowley Discussion: https://postgr.es/m/CAMbWs4930gQSZmjR7aANzEapdy61gCg6z8dT-kAEYD0sYWKPdQ@mail.gmail.com
* Make the order of the header file includes consistentPeter Eisentraut2024-03-13
| | | | | | | | Similar to commit 7e735035f20. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-WhpCFMbXCjtJ%2BFzmjfPrp7Hw1pk4p%2BZpU95Kh3ofZ1A%40mail.gmail.com
* Add better handling of redundant IS [NOT] NULL qualsDavid Rowley2024-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now PostgreSQL has not been very smart about optimizing away IS NOT NULL base quals on columns defined as NOT NULL. The evaluation of these needless quals adds overhead. Ordinarily, anyone who came complaining about that would likely just have been told to not include the qual in their query if it's not required. However, a recent bug report indicates this might not always be possible. Bug 17540 highlighted that when we optimize Min/Max aggregates the IS NOT NULL qual that the planner adds to make the rewritten plan ignore NULLs can cause issues with poor index choice. That particular case demonstrated that other quals, especially ones where no statistics are available to allow the planner a chance at estimating an approximate selectivity for can result in poor index choice due to cheap startup paths being prefered with LIMIT 1. Here we take generic approach to fixing this by having the planner check for NOT NULL columns and just have the planner remove these quals (when they're not needed) for all queries, not just when optimizing Min/Max aggregates. Additionally, here we also detect IS NULL quals on a NOT NULL column and transform that into a gating qual so that we don't have to perform the scan at all. This also works for join relations when the Var is not nullable by any outer join. This also helps with the self-join removal work as it must replace strict join quals with IS NOT NULL quals to ensure equivalence with the original query. Author: David Rowley, Richard Guo, Andy Fan Reviewed-by: Richard Guo, David Rowley Discussion: https://postgr.es/m/CAApHDvqg6XZDhYRPz0zgOcevSMo0d3vxA9DvHrZtKfqO30WTnw@mail.gmail.com Discussion: https://postgr.es/m/17540-7aa1855ad5ec18b4%40postgresql.org
* Allow examine_simple_variable() to work on INSERT RETURNING Vars.Tom Lane2024-01-08
| | | | | | | | | | | | | | | | | | | | | Since commit 599b33b94, this function assumed that every RTE_RELATION RangeTblEntry would have an associated RelOptInfo. But that's not so: we only build RelOptInfos for relations that are scanned by the query. In particular the target of an INSERT won't have one, so that Vars appearing in an INSERT ... RETURNING list will not have an associated RelOptInfo. This apparently wasn't a problem before commit f7816aec2 taught examine_simple_variable() to drill down into CTEs containing INSERT RETURNING, but it is now. To fix, add a fallback code path that gets the userid to use directly from the RTEPermissionInfo associated with the RTE. (Sadly, we must have two code paths, because not every RTE has a RTEPermissionInfo either.) Per report from Alexander Lakhin. No back-patch, since the case is apparently unreachable before f7816aec2. Discussion: https://postgr.es/m/608a4886-6c60-0f9e-97d5-591256bd4150@gmail.com
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Prevent integer overflow when forming tuple width estimates.Tom Lane2023-12-19
| | | | | | | | | | | | | | | | | | | | It's at least theoretically possible to overflow int32 when adding up column width estimates to make a row width estimate. (The bug example isn't terribly convincing as a real use-case, but perhaps wide joins would provide a more plausible route to trouble.) This'd lead to assertion failures or silly planner behavior. To forestall it, make the relevant functions compute their running sums in int64 arithmetic and then clamp to int32 range at the end. We can reasonably assume that MaxAllocSize is a hard limit on actual tuple width, so clamping to that is simply a correction for dubious input values, and there's no need to go as far as widening width variables to int64 everywhere. Per bug #18247 from RekGRpth. There've been no reports of this issue arising in practical cases, so I feel no need to back-patch. Richard Guo and Tom Lane Discussion: https://postgr.es/m/18247-11ac477f02954422@postgresql.org
* Robustify find_base_rel and find_base_rel_ignore_joinDavid Rowley2023-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve find_base_rel() and find_base_rel_ignore_join() so that they raise an ERROR if they ever receive a negative relid value in non-cassert builds. If either of these functions had ever received a negative relid then they'd have attempted to access memory that does not belong to simple_rel_array. Because no evidence has been presented of actual cases where bugs have caused this to happen, here we take a lightweight approach to checking for negative values and simply cast both values to uint32 before performing the comparison. This will cause any negative relids to be seen as greater than simple_rel_array_size which will ERROR rather than attempt to access a negative simple_rel_array element. Obviously, the run-time error is better than a crash, so it makes sense to protect against this, especially when it can be done without adding any additional run-time overhead. There is a slight change here if the functions are ever called with a relid of 0. This will pass the bounds check, but that array entry should be NULL (along with the corresponding simple_rte_array entry), so won't pass the "if (rel)" condition and still fall through and raise an ERROR. Author: Ranier Vilela Reviewed-by: Ashutosh Bapat, David Rowley Discussion: https://postgr.es/m/CAEudQArQSghBu2gLojg4o_tnHj_x2HcS%3D%2BwewL3NJS8z0VnK%2Bg%40mail.gmail.com
* Fix calculation of relid sets for partitionwise child joins.Tom Lane2023-07-21
| | | | | | | | | | | | | | | | | | | Applying add_outer_joins_to_relids() to a child join doesn't actually work, even if we've built a SpecialJoinInfo specialized to the child, because that function will also compare the join's relids to elements of the main join_info_list, which only deal in regular relids not child relids. This mistake escaped detection by the existing partitionwise join tests because they didn't test any cases where add_outer_joins_to_relids() needs to add additional OJ relids (that is, any cases where join reordering per identity 3 is possible). Instead, let's apply adjust_child_relids() to the relids of the parent join. This requires minor code reordering to collect the relevant AppendRelInfo structures first, but that's work we'd do shortly anyway. Report and fix by Richard Guo; cosmetic changes by me Discussion: https://postgr.es/m/CAMbWs49NCNbyubZWgci3o=_OTY=snCfAPtMnM-32f3mm-K-Ckw@mail.gmail.com
* Fix filtering of "cloned" outer-join quals some more.Tom Lane2023-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've had multiple issues with the clause_is_computable_at logic that I introduced in 2489d76c4: it's been known to accept more than one clone of the same qual at the same plan node, and also to accept no clones at all. It's looking impractical to get it 100% right on the basis of the currently-stored information, so fix it by introducing a new RestrictInfo field "incompatible_relids" that explicitly shows which outer joins a given clone mustn't be pushed above. In principle we could populate this field in every RestrictInfo, but that would cost space and there doesn't presently seem to be a need for it in general. Also, while deconstruct_distribute_oj_quals can easily fill the field with the remaining members of the commutative join set that it's considering, computing it in the general case seems again pretty complicated. So for now, just fill it for clone quals. Along the way, fix a bug that may or may not be only latent: equivclass.c was generating replacement clauses with is_pushed_down and has_clone/is_clone markings that didn't match their required_relids. This led me to conclude that leaving the clone flags out of make_restrictinfo's purview wasn't such a great idea after all, so add them. Per report from Richard Guo. Discussion: https://postgr.es/m/CAMbWs48EYi_9-pSd0ORes1kTmTeAjT4Q3gu49hJtYCbSn2JyeA@mail.gmail.com
* Pre-beta mechanical code beautification.Tom Lane2023-05-19
| | | | | | | | | | | | | | | Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
* Tweak API of new function clause_is_computable_at().Tom Lane2023-05-18
| | | | | | | | | | Pass it the RestrictInfo under consideration, not just the clause_relids. This should save some trivial amount of code at the call sites, and it gives us more flexibility about what clause_is_computable_at() does. There's no actual functional change here, though. Discussion: https://postgr.es/m/3564467.1684352557@sss.pgh.pa.us
* Fix some issues with improper placement of outer join clauses.Tom Lane2023-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | After applying outer-join identity 3 in the forward direction, it was possible for the planner to mistakenly apply a qual clause from above the two outer joins at the now-lower join level. This can give the wrong answer, since a value that would get nulled by the now-upper join might not yet be null. To fix, when we perform such a transformation, consider that the now-lower join hasn't really completed the outer join it's nominally responsible for and thus its relid set should not include that OJ's relid (nor should its output Vars have that nullingrel bit set). Instead we add those bits when the now-upper join is performed. The existing rules for qual placement then suffice to prevent higher qual clauses from dropping below the now-upper join. There are a few complications from needing to consider transitive closures in case multiple pushdowns have happened, but all in all it's not a very complex patch. This is all new logic (from 2489d76c4) so no need to back-patch. The added test cases all have the same results as in v15. Tom Lane and Richard Guo Discussion: https://postgr.es/m/0b819232-4b50-f245-1c7d-c8c61bf41827@postgrespro.ru
* Remove local optimizations of empty Bitmapsets into null pointers.Tom Lane2023-03-02
| | | | | | | | These are all dead code now that it's done centrally. Patch by me; thanks to Nathan Bossart and Richard Guo for review. Discussion: https://postgr.es/m/1159933.1677621588@sss.pgh.pa.us
* Fix mis-handling of outer join quals generated by EquivalenceClasses.Tom Lane2023-02-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's possible, in admittedly-rather-contrived cases, for an eclass to generate a derived "join" qual that constrains the post-outer-join value(s) of some RHS variable(s) without mentioning the LHS at all. While the mechanisms were set up to work for this, we fell foul of the "get_common_eclass_indexes" filter installed by commit 3373c7155: it could decide that such an eclass wasn't relevant to the join, so that the required qual clause wouldn't get emitted there or anywhere else. To fix, apply get_common_eclass_indexes only at inner joins, where its rule is still valid. At an outer join, fall back to examining all eclasses that mention either input (or the OJ relid, though it should be impossible for an eclass to mention that without mentioning either input). Perhaps we can improve on that later, but the cost/benefit of adding more complexity to skip some irrelevant eclasses is dubious. To allow cheaply distinguishing outer from inner joins, pass the ojrelid to generate_join_implied_equalities as a separate argument. This also allows cleaning up some sloppiness that had crept into the definition of its join_relids argument, and it allows accurate calculation of nominal_join_relids for a child outer join. (The latter oversight seems not to have been a live bug, but it certainly could have caused problems in future.) Also fix what might be a live bug in check_index_predicates: it was being sloppy about what it passed to generate_join_implied_equalities. Per report from Richard Guo. Discussion: https://postgr.es/m/CAMbWs4-DsTBfOvXuw64GdFss2=M5cwtEhY=0DCS7t2gT7P6hSA@mail.gmail.com
* Correctly set userid of subquery relations' child relsAlvaro Herrera2023-02-20
| | | | | | | | | | | | | | | | | The RelOptInfo->userid field (the user ID to check permissions as) of an "otherrel" relation was being copied from its parent relation, which is correct in most cases but wrong when the parent is a subquery. In that case, using the value from the RTEPermissionInfo of the child itself is the appropriate thing to do. Coming up with a test case where user-visible behavior changes proves hard enough, so we don't add one here. Bug introduced by a61b1f74823c, discovered by Amit while reviewing nearby code. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqE0WY_AhLnGtTsY7eYebG212XWbM-D8gr2A_ToOHyCywQ@mail.gmail.com
* Further tighten nullingrel marking rules in build_joinrel_tlist().Tom Lane2023-02-08
| | | | | | | | | | The code I added in fee7b77b9 could misbehave if commute_above_r contains multiple relids. While adding too many relids here is probably harmless (pre-fee7b77b9, we did it all the time), it's not very expensive to be accurate: we just have to intersect commute_above_r with the join's relids. Discussion: https://postgr.es/m/17781-c0405c8b3cd5e072@postgresql.org
* Rethink nullingrel marking rules in build_joinrel_tlist().Tom Lane2023-02-07
| | | | | | | | | | | | | | | | | | The logic for when to add the current outer join's own relid to the nullingrels sets of output Vars and PHVs was overly complicated and underly correct. Not sure why I didn't think of this before, but since what we want is marking per the syntactic structure, we can just consult our records about the syntactic structure, ie syn_righthand/syn_lefthand. Also, tighten the rule about when to add the commute_above_r bits, in hopes of eliminating some squishy reasoning. I do not know of a reason to think that that's broken as-is, but this way seems better. Per bug #17781 from Robins Tharakan. Discussion: https://postgr.es/m/17781-c0405c8b3cd5e072@postgresql.org
* Do assorted mop-up in the planner.Tom Lane2023-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove RestrictInfo.nullable_relids, along with a good deal of infrastructure that calculated it. One use-case for it was in join_clause_is_movable_to, but we can now replace that usage with a check to see if the clause's relids include any outer join that can null the target relation. The other use-case was in join_clause_is_movable_into, but that test can just be dropped entirely now that the clause's relids include outer joins. Furthermore, join_clause_is_movable_into should now be accurate enough that it will accept anything returned by generate_join_implied_equalities, so we can restore the Assert that was diked out in commit 95f4e59c3. Remove the outerjoin_delayed mechanism. We needed this before to prevent quals from getting evaluated below outer joins that should null some of their vars. Now that we consider varnullingrels while placing quals, that's taken care of automatically, so throw the whole thing away. Teach remove_useless_result_rtes to also remove useless FromExprs. Having done that, the delay_upper_joins flag serves no purpose any more and we can remove it, largely reverting 11086f2f2. Use constant TRUE for "dummy" clauses when throwing back outer joins. This improves on a hack I introduced in commit 6a6522529. If we have a left-join clause l.x = r.y, and a WHERE clause l.x = constant, we generate r.y = constant and then don't really have a need for the join clause. But we must throw the join clause back anyway after marking it redundant, so that the join search heuristics won't think this is a clauseless join and avoid it. That was a kluge introduced under time pressure, and after looking at it I thought of a better way: let's just introduce constant-TRUE "join clauses" instead, and get rid of them at the end. This improves the generated plans for such cases by not having to test a redundant join clause. We can also get rid of the ugly hack used to mark such clauses as redundant for selectivity estimation. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
* Make Vars be outer-join-aware.Tom Lane2023-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally we used the same Var struct to represent the value of a table column everywhere in parse and plan trees. This choice predates our support for SQL outer joins, and it's really a pretty bad idea with outer joins, because the Var's value can depend on where it is in the tree: it might go to NULL above an outer join. So expression nodes that are equal() per equalfuncs.c might not represent the same value, which is a huge correctness hazard for the planner. To improve this, decorate Var nodes with a bitmapset showing which outer joins (identified by RTE indexes) may have nulled them at the point in the parse tree where the Var appears. This allows us to trust that equal() Vars represent the same value. A certain amount of klugery is still needed to cope with cases where we re-order two outer joins, but it's possible to make it work without sacrificing that core principle. PlaceHolderVars receive similar decoration for the same reason. In the planner, we include these outer join bitmapsets into the relids that an expression is considered to depend on, and in consequence also add outer-join relids to the relids of join RelOptInfos. This allows us to correctly perceive whether an expression can be calculated above or below a particular outer join. This change affects FDWs that want to plan foreign joins. They *must* follow suit when labeling foreign joins in order to match with the core planner, but for many purposes (if postgres_fdw is any guide) they'd prefer to consider only base relations within the join. To support both requirements, redefine ForeignScan.fs_relids as base+OJ relids, and add a new field fs_base_relids that's set up by the core planner. Large though it is, this commit just does the minimum necessary to install the new mechanisms and get check-world passing again. Follow-up patches will perform some cleanup. (The README additions and comments mention some stuff that will appear in the follow-up.) Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
* Remove some dead code in selfuncs.cAlvaro Herrera2023-01-19
| | | | | | | | | | | RelOptInfo.userid is the same for all relations in a given inheritance tree, so the code in examine_variable() and example_simple_variable() that repeats the ACL checks on the root parent rel instead of a given leaf child relations need not recompute userid too. Author: Amit Langote <amitlangote09@gmail.com> Reported-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20221210201753.GA27893@telsasoft.com
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Rework query relation permission checkingAlvaro Herrera2022-12-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, information about the permissions to be checked on relations mentioned in a query is stored in their range table entries. So the executor must scan the entire range table looking for relations that need to have permissions checked. This can make the permission checking part of the executor initialization needlessly expensive when many inheritance children are present in the range range. While the permissions need not be checked on the individual child relations, the executor still must visit every range table entry to filter them out. This commit moves the permission checking information out of the range table entries into a new plan node called RTEPermissionInfo. Every top-level (inheritance "root") RTE_RELATION entry in the range table gets one and a list of those is maintained alongside the range table. This new list is initialized by the parser when initializing the range table. The rewriter can add more entries to it as rules/views are expanded. Finally, the planner combines the lists of the individual subqueries into one flat list that is passed to the executor for checking. To make it quick to find the RTEPermissionInfo entry belonging to a given relation, RangeTblEntry gets a new Index field 'perminfoindex' that stores the corresponding RTEPermissionInfo's index in the query's list of the latter. ExecutorCheckPerms_hook has gained another List * argument; the signature is now: typedef bool (*ExecutorCheckPerms_hook_type) (List *rangeTable, List *rtePermInfos, bool ereport_on_violation); The first argument is no longer used by any in-core uses of the hook, but we leave it in place because there may be other implementations that do. Implementations should likely scan the rtePermInfos list to determine which operations to allow or deny. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
* Add repalloc0 and repalloc0_arrayPeter Eisentraut2022-11-12
| | | | | | | | These zero out the space added by repalloc. This is a common pattern that is quite hairy to code by hand. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/b66dfc89-9365-cb57-4e1f-b7d31813eeec@enterprisedb.com
* Update some comments that should've covered MERGEAlvaro Herrera2022-10-24
| | | | | | | Oversight in 7103ebb7aae8. Backpatch to 15. Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAMbWs48gnDjZXq3-b56dVpQCNUJ5hD9kdtWN4QFwKCEapspNsA@mail.gmail.com
* Improve performance of adjust_appendrel_attrs_multilevel.Tom Lane2022-08-18
| | | | | | | | | | | | | | | | | | | | | | | | The present implementations of adjust_appendrel_attrs_multilevel and its sibling adjust_child_relids_multilevel are very messy, because they work by reconstructing the relids of the child's immediate parent and then seeing if that's bms_equal to the relids of the target parent. Aside from being quite inefficient, this will not work with planned future changes to make joinrels' relid sets contain outer-join relids in addition to baserels. The whole thing can be solved at a stroke by adding explicit parent and top_parent links to child RelOptInfos, and making these functions work with RelOptInfo pointers instead of relids. Doing that is simpler for most callers, too. In my original version of this patch, I got rid of RelOptInfo.top_parent_relids on the grounds that it was now redundant. However, that adds a lot of code churn in places that otherwise would not need changing, and arguably the extra indirection needed to fetch top_parent->relids in those places costs something. So this version leaves that field in place. Discussion: https://postgr.es/m/553080.1657481916@sss.pgh.pa.us
* Refactor addition of PlaceHolderVars to joinrel targetlists.Tom Lane2022-08-17
| | | | | | | | | | | | | | | | | | | Make build_joinrel_tlist() responsible for adding PHVs that were already computed in one or the other input relation, and therefore change add_placeholders_to_joinrel() to only add PHVs that will be newly computed in this joinrel's output. This makes the handling of PHVs in build_joinrel_tlist() more like its handling of plain Vars, which seems like a good thing on intelligibility grounds and will simplify planned future changes. There is a purely cosmetic side-effect that the order of entries in the joinrel's tlist may change; but since it becomes more like the order of entries in the input tlists, that's not bad. The reason it wasn't done like this originally was the potential cost of looking up PlaceHolderInfo entries to consult ph_needed. Now that that's O(1) it shouldn't hurt. Discussion: https://postgr.es/m/1405792.1660677844@sss.pgh.pa.us
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Track a Bitmapset of non-pruned partitions in RelOptInfoDavid Rowley2021-08-03
| | | | | | | | | | | | | | | | | | | | For partitioned tables with large numbers of partitions where queries are able to prune all but a very small number of partitions, the time spent in the planner looping over RelOptInfo.part_rels checking for non-NULL RelOptInfos could become a large portion of the overall planning time. Here we add a Bitmapset that records the non-pruned partitions. This allows us to more efficiently skip the pruned partitions by looping over the Bitmapset. This will cause a very slight slow down in cases where no or not many partitions could be pruned, however, those cases are already slow to plan anyway and the overhead of looping over the Bitmapset would be unmeasurable when compared with the other tasks such as path creation for a large number of partitions. Reviewed-by: Amit Langote, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqnPx6JnUuPwaf5ao38zczrAb9mxt9gj4U1EKFfd4AqLA@mail.gmail.com
* Rework planning and execution of UPDATE and DELETE.Tom Lane2021-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes two closely related sets of changes: 1. For UPDATE, the subplan of the ModifyTable node now only delivers the new values of the changed columns (i.e., the expressions computed in the query's SET clause) plus row identity information such as CTID. ModifyTable must re-fetch the original tuple to merge in the old values of any unchanged columns. The core advantage of this is that the changed columns are uniform across all tables of an inherited or partitioned target relation, whereas the other columns might not be. A secondary advantage, when the UPDATE involves joins, is that less data needs to pass through the plan tree. The disadvantage of course is an extra fetch of each tuple to be updated. However, that seems to be very nearly free in context; even worst-case tests don't show it to add more than a couple percent to the total query cost. At some point it might be interesting to combine the re-fetch with the tuple access that ModifyTable must do anyway to mark the old tuple dead; but that would require a good deal of refactoring and it seems it wouldn't buy all that much, so this patch doesn't attempt it. 2. For inherited UPDATE/DELETE, instead of generating a separate subplan for each target relation, we now generate a single subplan that is just exactly like a SELECT's plan, then stick ModifyTable on top of that. To let ModifyTable know which target relation a given incoming row refers to, a tableoid junk column is added to the row identity information. This gets rid of the horrid hack that was inheritance_planner(), eliminating O(N^2) planning cost and memory consumption in cases where there were many unprunable target relations. Point 2 of course requires point 1, so that there is a uniform definition of the non-junk columns to be returned by the subplan. We can't insist on uniform definition of the row identity junk columns however, if we want to keep the ability to have both plain and foreign tables in a partitioning hierarchy. Since it wouldn't scale very far to have every child table have its own row identity column, this patch includes provisions to merge similar row identity columns into one column of the subplan result. In particular, we can merge the whole-row Vars typically used as row identity by FDWs into one column by pretending they are type RECORD. (It's still okay for the actual composite Datums to be labeled with the table's rowtype OID, though.) There is more that can be done to file down residual inefficiencies in this patch, but it seems to be committable now. FDW authors should note several API changes: * The argument list for AddForeignUpdateTargets() has changed, and so has the method it must use for adding junk columns to the query. Call add_row_identity_var() instead of manipulating the parse tree directly. You might want to reconsider exactly what you're adding, too. * PlanDirectModify() must now work a little harder to find the ForeignScan plan node; if the foreign table is part of a partitioning hierarchy then the ForeignScan might not be the direct child of ModifyTable. See postgres_fdw for sample code. * To check whether a relation is a target relation, it's no longer sufficient to compare its relid to root->parse->resultRelation. Instead, check it against all_result_relids or leaf_result_relids, as appropriate. Amit Langote and Tom Lane Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
* Add TID Range Scans to support efficient scanning ranges of TIDsDavid Rowley2021-02-27
| | | | | | | | | | | | | | | | | | | | | This adds a new executor node named TID Range Scan. The query planner will generate paths for TID Range scans when quals are discovered on base relations which search for ranges on the table's ctid column. These ranges may be open at either end. For example, WHERE ctid >= '(10,0)'; will return all tuples on page 10 and over. To support this, two new optional callback functions have been added to table AM. scan_set_tidrange is used to set the scan range to just the given range of TIDs. scan_getnextslot_tidrange fetches the next tuple in the given range. For AMs were scanning ranges of TIDs would not make sense, these functions can be set to NULL in the TableAmRoutine. The query planner won't generate TID Range Scan Paths in that case. Author: Edmund Horner, David Rowley Reviewed-by: David Rowley, Tomas Vondra, Tom Lane, Andres Freund, Zhihong Yu Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Improve hash_create()'s API for some added robustness.Tom Lane2020-12-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Invent a new flag bit HASH_STRINGS to specify C-string hashing, which was formerly the default; and add assertions insisting that exactly one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set. This is in hopes of preventing recurrences of the type of oversight fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS). Also, when HASH_STRINGS is specified, insist that the keysize be more than 8 bytes. This is a heuristic, but it should catch accidental use of HASH_STRINGS for integer or pointer keys. (Nearly all existing use-cases set the keysize to NAMEDATALEN or more, so there's little reason to think this restriction should be problematic.) Tweak hash_create() to insist that the HASH_ELEM flag be set, and remove the defaults it had for keysize and entrysize. Since those defaults were undocumented and basically useless, no callers omitted HASH_ELEM anyway. Also, remove memset's zeroing the HASHCTL parameter struct from those callers that had one. This has never been really necessary, and while it wasn't a bad coding convention it was confusing that some callers did it and some did not. We might as well save a few cycles by standardizing on "not". Also improve the documentation for hash_create(). In passing, improve reinit.c's usage of a hash table by storing the key as a binary Oid rather than a string; and, since that's a temporary hash table, allocate it in CurrentMemoryContext for neatness. Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us
* Allow run-time pruning on nested Append/MergeAppend nodesDavid Rowley2020-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we only tagged on the required information to allow the executor to perform run-time partition pruning for Append/MergeAppend nodes belonging to base relations. It was thought that nested Append/MergeAppend nodes were just about always pulled up into the top-level Append/MergeAppend and that making the run-time pruning info for any sub Append/MergeAppend nodes was a waste of time. However, that was likely badly thought through. Some examples of cases we're unable to pullup nested Append/MergeAppends are: 1) Parallel Append nodes with a mix of parallel and non-parallel paths into a Parallel Append. 2) When planning an ordered Append scan a sub-partition which is unordered may require a nested MergeAppend path to ensure sub-partitions don't mix up the order of tuples being fed into the top-level Append. Unfortunately, it was not just as simple as removing the lines in createplan.c which were purposefully not building the run-time pruning info for anything but RELOPT_BASEREL relations. The code in add_paths_to_append_rel() was far too sloppy about which partitioned_rels it included for the Append/MergeAppend paths. The original code there would always assume accumulate_append_subpath() would pull each sub-Append and sub-MergeAppend path into the top-level path. While it does not appear that there were any actual bugs caused by having the additional partitioned table RT indexes recorded, what it did mean is that later in planning, when we built the run-time pruning info that we wasted effort and built PartitionedRelPruneInfos for partitioned tables that we had no subpaths for the executor to run-time prune. Here we tighten that up so that partitioned_rels only ever contains the RT index for partitioned tables which actually have subpaths in the given Append/MergeAppend. We can now Assert that every PartitionedRelPruneInfo has a non-empty present_parts. That should allow us to catch any weird corner cases that have been missed. In passing, it seems there is no longer a good reason to have the AppendPath and MergeAppendPath's partitioned_rel fields a List of IntList. We can simply have a List of Relids instead. This is more compact in memory and faster to add new members to. We still know which is the root level partition as these always have a lower relid than their children. Previously this field was used for more things, but run-time partition pruning now remains the only user of it and it has no need for a List of IntLists. Here we also get rid of the RelOptInfo partitioned_child_rels field. This is what was previously used to (sometimes incorrectly) set the Append/MergeAppend path's partitioned_rels field. That was the only usage of that field, so we can happily just remove it. I also couldn't resist changing some nearby code to make use of the newly added for_each_from macro so we can skip the first element in the list without checking if the current item was the first one on each iteration. A bug report from Andreas Kretschmer prompted all this work, however, after some consideration, I'm not personally classing this as a bug fix. So no backpatch. In Andreas' test case, it just wasn't that clear that there was a nested Append since the top-level Append just had a single sub-path which was pulled up a level, per 8edd0e794. Author: David Rowley Reviewed-by: Amit Langote Discussion: https://postgr.es/m/flat/CAApHDvqSchs%2BubdybcfFaSPB%2B%2BEA7kqMaoqajtP0GtZvzOOR3g%40mail.gmail.com
* Remove unnecessary #include.Etsuro Fujita2020-05-12
| | | | My oversight in commit c8434d64c.
* Allow partitionwise join to handle nested FULL JOIN USING cases.Tom Lane2020-04-07
| | | | | | | | | | | | | | | | This case didn't work because columns merged by FULL JOIN USING are represented in the parse tree by COALESCE expressions, and the logic for recognizing a partitionable join failed to match upper-level join clauses to such expressions. To fix, synthesize suitable COALESCE expressions and add them to the nullable_partexprs lists. This is pretty ugly and brute-force, but it gets the job done. (I have ambitions of rethinking the way outer-join output Vars are represented, so maybe that will provide a cleaner solution someday. For now, do this.) Amit Langote, reviewed by Justin Pryzby, Richard Guo, and myself Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com
* Allow partitionwise joins in more cases.Etsuro Fujita2020-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | Previously, the partitionwise join technique only allowed partitionwise join when input partitioned tables had exactly the same partition bounds. This commit extends the technique to some cases when the tables have different partition bounds, by using an advanced partition-matching algorithm introduced by this commit. For both the input partitioned tables, the algorithm checks whether every partition of one input partitioned table only matches one partition of the other input partitioned table at most, and vice versa. In such a case the join between the tables can be broken down into joins between the matching partitions, so the algorithm produces the pairs of the matching partitions, plus the partition bounds for the join relation, to allow partitionwise join for computing the join. Currently, the algorithm works for list-partitioned and range-partitioned tables, but not hash-partitioned tables. See comments in partition_bounds_merge(). Ashutosh Bapat and Etsuro Fujita, most of regression tests by Rajkumar Raghuwanshi, some of the tests by Mark Dilger and Amul Sul, reviewed by Dmitry Dolgov and Amul Sul, with additional review at various points by Ashutosh Bapat, Mark Dilger, Robert Haas, Antonin Houska, Amit Langote, Justin Pryzby, and Tomas Vondra Discussion: https://postgr.es/m/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com
* Cosmetic improvements for code related to partitionwise join.Tom Lane2020-04-03
| | | | | | | | | | | | | Move have_partkey_equi_join and match_expr_to_partition_keys to relnode.c, since they're used only there. Refactor build_joinrel_partition_info to split out the code that fills the joinrel's partition key lists; this doesn't have any non-cosmetic impact, but it seems like a useful separation of concerns. Improve assorted nearby comments. Amit Langote, with a little further editorialization by me Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Fix typo in comment.Etsuro Fujita2019-11-27
|
* Generate EquivalenceClass members for partitionwise child join rels.Tom Lane2019-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d25ea0127 got rid of what I thought were entirely unnecessary derived child expressions in EquivalenceClasses for EC members that mention multiple baserels. But it turns out that some of the child expressions that code created are necessary for partitionwise joins, else we fail to find matching pathkeys for Sort nodes. (This happens only for certain shapes of the resulting plan; it may be that partitionwise aggregation is also necessary to show the failure, though I'm not sure of that.) Reverting that commit entirely would be quite painful performance-wise for large partition sets. So instead, add code that explicitly generates child expressions that match only partitionwise child join rels we have actually generated. Per report from Justin Pryzby. (Amit Langote noticed the problem earlier, though it's not clear if he recognized then that it could result in a planner error, not merely failure to exploit partitionwise join, in the code as-committed.) Back-patch to v12 where commit d25ea0127 came in. Amit Langote, with lots of kibitzing from me Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com Discussion: https://postgr.es/m/20191011143703.GN10470@telsasoft.com
* Remove useless bms_free() calls in build_child_join_rel().Etsuro Fujita2019-08-16
| | | | | | | These seem to be leftovers from the original partitionwise-join patch, perhaps. Discussion: https://postgr.es/m/CAPmGK145YiMTPRnvev1dLz8na_-0aZ=Xyqn8f2QsJFBUTObNow@mail.gmail.com
* Rationalize use of list_concat + list_copy combinations.Tom Lane2019-08-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In the wake of commit 1cff1b95a, the result of list_concat no longer shares the ListCells of the second input. Therefore, we can replace "list_concat(x, list_copy(y))" with just "list_concat(x, y)". To improve call sites that were list_copy'ing the first argument, or both arguments, invent "list_concat_copy()" which produces a new list sharing no ListCells with either input. (This is a bit faster than "list_concat(list_copy(x), y)" because it makes the result list the right size to start with.) In call sites that were not list_copy'ing the second argument, the new semantics mean that we are usually leaking the second List's storage, since typically there is no remaining pointer to it. We considered inventing another list_copy variant that would list_free the second input, but concluded that for most call sites it isn't worth worrying about, given the relative compactness of the new List representation. (Note that in cases where such leakage would happen, the old code already leaked the second List's header; so we're only discussing the size of the leak not whether there is one. I did adjust two or three places that had been troubling to free that header so that they manually free the whole second List.) Patch by me; thanks to David Rowley for review. Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
* Cosmetic improvements in setup of planner's per-RTE arrays.Tom Lane2019-08-09
| | | | | | | | | | | | | Merge setup_append_rel_array into setup_simple_rel_arrays. There's no particularly good reason to keep them separate, and it's inconsistent with the lack of separation in expand_planner_arrays. The only apparent benefit was that the fast path for trivial queries in query_planner() doesn't need to set up the append_rel_array; but all we're saving there is an if-test and NULL assignment, which surely ought to be negligible. Also improve some obsolete comments. Discussion: https://postgr.es/m/17220.1565301350@sss.pgh.pa.us
* Speed up finding EquivalenceClasses for a given set of relsDavid Rowley2019-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously in order to determine which ECs a relation had members in, we had to loop over all ECs stored in PlannerInfo's eq_classes and check if ec_relids mentioned the relation. For the most part, this was fine, as generally, unless queries were fairly complex, the overhead of performing the lookup would have not been that significant. However, when queries contained large numbers of joins and ECs, the overhead to find the set of classes matching a given set of relations could become a significant portion of the overall planning effort. Here we allow a much more efficient method to access the ECs which match a given relation or set of relations. A new Bitmapset field in RelOptInfo now exists to store the indexes into PlannerInfo's eq_classes list which each relation is mentioned in. This allows very fast lookups to find all ECs belonging to a single relation. When we need to lookup ECs belonging to a given pair of relations, we can simply bitwise-AND the Bitmapsets from each relation and use the result to perform the lookup. We also take the opportunity to write a new implementation of generate_join_implied_equalities which makes use of the new indexes. generate_join_implied_equalities_for_ecs must remain as is as it can be given a custom list of ECs, which we can't easily determine the indexes of. This was originally intended to fix the performance penalty of looking up foreign keys matching a join condition which was introduced by 100340e2d. However, we're speeding up much more than just that here. Author: David Rowley, Tom Lane Reviewed-by: Tom Lane, Tomas Vondra Discussion: https://postgr.es/m/6970.1545327857@sss.pgh.pa.us