aboutsummaryrefslogtreecommitdiff
path: root/src/backend/optimizer/util/inherit.c
Commit message (Collapse)AuthorAge
* Reduce "Var IS [NOT] NULL" quals during constant foldingRichard Guo14 days
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit b262ad440, we introduced an optimization that reduces an IS [NOT] NULL qual on a NOT NULL column to constant true or constant false, provided we can prove that the input expression of the NullTest is not nullable by any outer joins or grouping sets. This deduction happens quite late in the planner, during the distribution of quals to rels in query_planner. However, this approach has some drawbacks: we can't perform any further folding with the constant, and it turns out to be prone to bugs. Ideally, this deduction should happen during constant folding. However, the per-relation information about which columns are defined as NOT NULL is not available at that point. This information is currently collected from catalogs when building RelOptInfos for base or "other" relations. This patch moves the collection of NOT NULL attribute information for relations before pull_up_sublinks, storing it in a hash table keyed by relation OID. It then uses this information to perform the NullTest deduction for Vars during constant folding. This also makes it possible to leverage this information to pull up NOT IN subqueries. Note that this patch does not get rid of restriction_is_always_true and restriction_is_always_false. Removing them would prevent us from reducing some IS [NOT] NULL quals that we were previously able to reduce, because (a) the self-join elimination may introduce new IS NOT NULL quals after constant folding, and (b) if some outer joins are converted to inner joins, previously irreducible NullTest quals may become reducible. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Avoid failure to open dropped detached partitionAlvaro Herrera2024-08-19
| | | | | | | | | | | | | | | | | | | | When a partition is detached and immediately dropped, a prepared statement could try to compute a new partition descriptor that includes it. This leads to this kind of error: ERROR: could not open relation with OID 457639 Avoid this by skipping the partition in expand_partitioned_rtentry if it doesn't exist. Noted by me while investigating bug #18559. Kuntal Gosh helped to identify the exact failure. Backpatch to 14, where DETACH CONCURRENTLY was introduced. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/202408122233.bo4adt3vh5bi@alvherre.pgsql
* Fix IS [NOT] NULL qual optimization for inheritance tablesDavid Rowley2024-04-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | b262ad440 added code to have the planner remove redundant IS NOT NULL quals and eliminate needless scans for IS NULL quals on tables where the qual's column has a NOT NULL constraint. That commit failed to consider that an inheritance parent table could have differing NOT NULL constraints between the parent and the child. This caused issues as if we eliminated a qual on the parent, when applying the quals to child tables in apply_child_basequals(), the qual might not have been added to the parent's baserestrictinfo. Here we fix this by not applying the optimization to remove redundant quals to RelOptInfos belonging to inheritance parents and applying the optimization again in apply_child_basequals(). Effectively, this means that the parent and child are considered independently as the parent has both an inh=true and inh=false RTE and we still apply the optimization to the RelOptInfo corresponding to the inh=false RTE. We're able to still apply the optimization in add_base_clause_to_rel() for partitioned tables as the NULLability of partitions must match that of their parent. And, if we ever expand restriction_is_always_false() and restriction_is_always_true() to handle partition constraints then we can apply the same logic as, even in multi-level partitioned tables, there's no way to route values to a partition when the qual does not match the partition qual of the partitioned table's parent partition. The same is true for CHECK constraints as those must also match between arent partitioned tables and their partitions. Author: Richard Guo, David Rowley Discussion: https://postgr.es/m/CAMbWs4930gQSZmjR7aANzEapdy61gCg6z8dT-kAEYD0sYWKPdQ@mail.gmail.com
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Avoid compiler warning in non-assert buildsAmit Langote2023-10-26
| | | | | | | | | | | After 01575ad788e3, expand_single_inheritance_child()'s parentOID variable is read only in an Assert, provoking a compiler warning in non-assert builds. Fix that by marking the variable with PG_USED_FOR_ASSERTS_ONLY. Per report and suggestion from David Rowley Discussion: https://postgr.es/m/CAApHDvpjA_8Wxu4DCTRVAvPxC9atwMe6N%2ByvrcGsgb7mrfdpJA%40mail.gmail.com
* Prevent duplicate RTEPermissionInfo for plain-inheritance parentsAmit Langote2023-10-26
| | | | | | | | | | | | | | | | | | | Currently, expand_single_inheritance_child() doesn't reset perminfoindex in a plain-inheritance parent's child RTE, because prior to 387f9ed0a0, the executor would use the first child RTE to locate the parent's RTEPermissionInfo. That in turn causes add_rte_to_flat_rtable() to create an extra RTEPermissionInfo belonging to the parent's child RTE with the same content as the one belonging to the parent's original ("root") RTE. In 387f9ed0a0, we changed things so that the executor can now use the parent's "root" RTE for locating its RTEPermissionInfo instead of the child RTE, so the latter's perminfoindex need not be set anymore, so make it so. Reported-by: Tom Lane Discussion: https://postgr.es/m/839708.1698174464@sss.pgh.pa.us Backpatch-through: 16
* Fix filtering of "cloned" outer-join quals some more.Tom Lane2023-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've had multiple issues with the clause_is_computable_at logic that I introduced in 2489d76c4: it's been known to accept more than one clone of the same qual at the same plan node, and also to accept no clones at all. It's looking impractical to get it 100% right on the basis of the currently-stored information, so fix it by introducing a new RestrictInfo field "incompatible_relids" that explicitly shows which outer joins a given clone mustn't be pushed above. In principle we could populate this field in every RestrictInfo, but that would cost space and there doesn't presently seem to be a need for it in general. Also, while deconstruct_distribute_oj_quals can easily fill the field with the remaining members of the commutative join set that it's considering, computing it in the general case seems again pretty complicated. So for now, just fill it for clone quals. Along the way, fix a bug that may or may not be only latent: equivclass.c was generating replacement clauses with is_pushed_down and has_clone/is_clone markings that didn't match their required_relids. This led me to conclude that leaving the clone flags out of make_restrictinfo's purview wasn't such a great idea after all, so add them. Per report from Richard Guo. Discussion: https://postgr.es/m/CAMbWs48EYi_9-pSd0ORes1kTmTeAjT4Q3gu49hJtYCbSn2JyeA@mail.gmail.com
* Do assorted mop-up in the planner.Tom Lane2023-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove RestrictInfo.nullable_relids, along with a good deal of infrastructure that calculated it. One use-case for it was in join_clause_is_movable_to, but we can now replace that usage with a check to see if the clause's relids include any outer join that can null the target relation. The other use-case was in join_clause_is_movable_into, but that test can just be dropped entirely now that the clause's relids include outer joins. Furthermore, join_clause_is_movable_into should now be accurate enough that it will accept anything returned by generate_join_implied_equalities, so we can restore the Assert that was diked out in commit 95f4e59c3. Remove the outerjoin_delayed mechanism. We needed this before to prevent quals from getting evaluated below outer joins that should null some of their vars. Now that we consider varnullingrels while placing quals, that's taken care of automatically, so throw the whole thing away. Teach remove_useless_result_rtes to also remove useless FromExprs. Having done that, the delay_upper_joins flag serves no purpose any more and we can remove it, largely reverting 11086f2f2. Use constant TRUE for "dummy" clauses when throwing back outer joins. This improves on a hack I introduced in commit 6a6522529. If we have a left-join clause l.x = r.y, and a WHERE clause l.x = constant, we generate r.y = constant and then don't really have a need for the join clause. But we must throw the join clause back anyway after marking it redundant, so that the join search heuristics won't think this is a clauseless join and avoid it. That was a kluge introduced under time pressure, and after looking at it I thought of a better way: let's just introduce constant-TRUE "join clauses" instead, and get rid of them at the end. This improves the generated plans for such cases by not having to test a redundant join clause. We can also get rid of the ugly hack used to mark such clauses as redundant for selectivity estimation. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
* Fix calculation of which GENERATED columns need to be updated.Tom Lane2023-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were identifying the updatable generated columns of inheritance children by transposing the calculation made for their parent. However, there's nothing that says a traditional-inheritance child can't have generated columns that aren't there in its parent, or that have different dependencies than are in the parent's expression. (At present it seems that we don't enforce that for partitioning either, which is likely wrong to some degree or other; but the case clearly needs to be handled with traditional inheritance.) Hence, drop the very-klugy-anyway "extraUpdatedCols" RTE field in favor of identifying which generated columns depend on updated columns during executor startup. In HEAD we can remove extraUpdatedCols altogether; in back branches, it's still there but always empty. Another difference between the HEAD and back-branch versions of this patch is that in HEAD we can add the new bitmap field to ResultRelInfo, but that would cause an ABI break in back branches. Like 4b3e37993, add a List field at the end of struct EState instead. Back-patch to v13. The bogus calculation is also being made in v12, but it doesn't have the same visible effect because we don't use it to decide which generated columns to recalculate; as a consequence of which the patch doesn't apply easily. I think that there might still be a demonstrable bug associated with trigger firing conditions, but that's such a weird corner-case usage that I'm content to leave it unfixed in v12. Amit Langote and Tom Lane Discussion: https://postgr.es/m/CA+HiwqFshLKNvQUd1DgwJ-7tsTp=dwv7KZqXC4j2wYBV1aCDUA@mail.gmail.com Discussion: https://postgr.es/m/2793383.1672944799@sss.pgh.pa.us
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Fix bug in translate_col_privs_multilevelDavid Rowley2022-12-24
| | | | | | | | | | | | | Fix incorrect code which was trying to convert a Bitmapset of columns at the attnums according to a parent table and transform them into the equivalent Bitmapset with same attnums according to the given child table. This code is new as of a61b1f748 and was failing to do the correct translation when there was an intermediate parent table between 'rel' and 'top_parent_rel'. Reported-by: Ranier Vilela Author: Richard Guo, Amit Langote Discussion: https://postgr.es/m/CAEudQArohfB_Gy%2BhcH2-bANUkxgjJiP%3DABq01_LgTNTbcNijag%40mail.gmail.com
* Rework query relation permission checkingAlvaro Herrera2022-12-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, information about the permissions to be checked on relations mentioned in a query is stored in their range table entries. So the executor must scan the entire range table looking for relations that need to have permissions checked. This can make the permission checking part of the executor initialization needlessly expensive when many inheritance children are present in the range range. While the permissions need not be checked on the individual child relations, the executor still must visit every range table entry to filter them out. This commit moves the permission checking information out of the range table entries into a new plan node called RTEPermissionInfo. Every top-level (inheritance "root") RTE_RELATION entry in the range table gets one and a list of those is maintained alongside the range table. This new list is initialized by the parser when initializing the range table. The rewriter can add more entries to it as rules/views are expanded. Finally, the planner combines the lists of the individual subqueries into one flat list that is passed to the executor for checking. To make it quick to find the RTEPermissionInfo entry belonging to a given relation, RangeTblEntry gets a new Index field 'perminfoindex' that stores the corresponding RTEPermissionInfo's index in the query's list of the latter. ExecutorCheckPerms_hook has gained another List * argument; the signature is now: typedef bool (*ExecutorCheckPerms_hook_type) (List *rangeTable, List *rtePermInfos, bool ereport_on_violation); The first argument is no longer used by any in-core uses of the hook, but we leave it in place because there may be other implementations that do. Implementations should likely scan the rtePermInfos list to determine which operations to allow or deny. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
* Update some comments that should've covered MERGEAlvaro Herrera2022-10-24
| | | | | | | Oversight in 7103ebb7aae8. Backpatch to 15. Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAMbWs48gnDjZXq3-b56dVpQCNUJ5hD9kdtWN4QFwKCEapspNsA@mail.gmail.com
* Use an explicit state flag to control PlaceHolderInfo creation.Tom Lane2022-08-17
| | | | | | | | | | | | | | | | | | | Up to now, callers of find_placeholder_info() were required to pass a flag indicating if it's OK to make a new PlaceHolderInfo. That'd be fine if the callers had free choice, but they do not. Once we begin deconstruct_jointree() it's no longer OK to make more PHIs; while callers before that always want to create a PHI if it's not there already. So there's no freedom of action, only the opportunity to cause bugs by creating PHIs too late. Let's get rid of that in favor of adding a state flag PlannerInfo.placeholdersFrozen, which we can set at the point where it's no longer OK to make more PHIs. This patch also simplifies a couple of call sites that were using complicated logic to avoid calling find_placeholder_info() as much as possible. Now that that lookup is O(1) thanks to the previous commit, the extra bitmap manipulations are probably a net negative. Discussion: https://postgr.es/m/1405792.1660677844@sss.pgh.pa.us
* Fix various typos, grammar and code style in comments and docsMichael Paquier2022-01-25
| | | | | | | | | This fixes a set of issues that have accumulated over the past months (or years) in various code areas. Most fixes are related to some recent additions, as of the development of v15. Author: Justin Pryzby Discussion: https://postgr.es/m/20220124030001.GQ23027@telsasoft.com
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Track a Bitmapset of non-pruned partitions in RelOptInfoDavid Rowley2021-08-03
| | | | | | | | | | | | | | | | | | | | For partitioned tables with large numbers of partitions where queries are able to prune all but a very small number of partitions, the time spent in the planner looping over RelOptInfo.part_rels checking for non-NULL RelOptInfos could become a large portion of the overall planning time. Here we add a Bitmapset that records the non-pruned partitions. This allows us to more efficiently skip the pruned partitions by looping over the Bitmapset. This will cause a very slight slow down in cases where no or not many partitions could be pruned, however, those cases are already slow to plan anyway and the overhead of looping over the Bitmapset would be unmeasurable when compared with the other tasks such as path creation for a large number of partitions. Reviewed-by: Amit Langote, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqnPx6JnUuPwaf5ao38zczrAb9mxt9gj4U1EKFfd4AqLA@mail.gmail.com
* Fix planner's row-mark code for inheritance from a foreign table.Tom Lane2021-06-02
| | | | | | | | | | | | | | | | Commit 428b260f8 broke planning of cases where row marks are needed (SELECT FOR UPDATE, etc) and one of the query's tables is a foreign table that has regular table(s) as inheritance children. We got the reverse case right, but apparently were thinking that foreign tables couldn't be inheritance parents. Not so; so we need to be able to add a CTID junk column while adding a new child, not only a wholerow junk column. Back-patch to v12 where the faulty code came in. Amit Langote Discussion: https://postgr.es/m/CA+HiwqEmo3FV1LAQ4TVyS2h1WM=kMkZUmbNuZSCnfHvMcUcPeA@mail.gmail.com
* Rework planning and execution of UPDATE and DELETE.Tom Lane2021-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes two closely related sets of changes: 1. For UPDATE, the subplan of the ModifyTable node now only delivers the new values of the changed columns (i.e., the expressions computed in the query's SET clause) plus row identity information such as CTID. ModifyTable must re-fetch the original tuple to merge in the old values of any unchanged columns. The core advantage of this is that the changed columns are uniform across all tables of an inherited or partitioned target relation, whereas the other columns might not be. A secondary advantage, when the UPDATE involves joins, is that less data needs to pass through the plan tree. The disadvantage of course is an extra fetch of each tuple to be updated. However, that seems to be very nearly free in context; even worst-case tests don't show it to add more than a couple percent to the total query cost. At some point it might be interesting to combine the re-fetch with the tuple access that ModifyTable must do anyway to mark the old tuple dead; but that would require a good deal of refactoring and it seems it wouldn't buy all that much, so this patch doesn't attempt it. 2. For inherited UPDATE/DELETE, instead of generating a separate subplan for each target relation, we now generate a single subplan that is just exactly like a SELECT's plan, then stick ModifyTable on top of that. To let ModifyTable know which target relation a given incoming row refers to, a tableoid junk column is added to the row identity information. This gets rid of the horrid hack that was inheritance_planner(), eliminating O(N^2) planning cost and memory consumption in cases where there were many unprunable target relations. Point 2 of course requires point 1, so that there is a uniform definition of the non-junk columns to be returned by the subplan. We can't insist on uniform definition of the row identity junk columns however, if we want to keep the ability to have both plain and foreign tables in a partitioning hierarchy. Since it wouldn't scale very far to have every child table have its own row identity column, this patch includes provisions to merge similar row identity columns into one column of the subplan result. In particular, we can merge the whole-row Vars typically used as row identity by FDWs into one column by pretending they are type RECORD. (It's still okay for the actual composite Datums to be labeled with the table's rowtype OID, though.) There is more that can be done to file down residual inefficiencies in this patch, but it seems to be committable now. FDW authors should note several API changes: * The argument list for AddForeignUpdateTargets() has changed, and so has the method it must use for adding junk columns to the query. Call add_row_identity_var() instead of manipulating the parse tree directly. You might want to reconsider exactly what you're adding, too. * PlanDirectModify() must now work a little harder to find the ForeignScan plan node; if the foreign table is part of a partitioning hierarchy then the ForeignScan might not be the direct child of ModifyTable. See postgres_fdw for sample code. * To check whether a relation is a target relation, it's no longer sufficient to compare its relid to root->parse->resultRelation. Instead, check it against all_result_relids or leaf_result_relids, as appropriate. Amit Langote and Tom Lane Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
* Fix pull_varnos' miscomputation of relids set for a PlaceHolderVar.Tom Lane2021-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, pull_varnos() took the relids of a PlaceHolderVar as being equal to the relids in its contents, but that fails to account for the possibility that we have to postpone evaluation of the PHV due to outer joins. This could result in a malformed plan. The known cases end up triggering the "failed to assign all NestLoopParams to plan nodes" sanity check in createplan.c, but other symptoms may be possible. The right value to use is the join level we actually intend to evaluate the PHV at. We can get that from the ph_eval_at field of the associated PlaceHolderInfo. However, there are some places that call pull_varnos() before the PlaceHolderInfos have been created; in that case, fall back to the conservative assumption that the PHV will be evaluated at its syntactic level. (In principle this might result in missing some legal optimization, but I'm not aware of any cases where it's an issue in practice.) Things are also a bit ticklish for calls occurring during deconstruct_jointree(), but AFAICS the ph_eval_at fields should have reached their final values by the time we need them. The main problem in making this work is that pull_varnos() has no way to get at the PlaceHolderInfos. We can fix that easily, if a bit tediously, in HEAD by passing it the planner "root" pointer. In the back branches that'd cause an unacceptable API/ABI break for extensions, so leave the existing entry points alone and add new ones with the additional parameter. (If an old entry point is called and encounters a PHV, it'll fall back to using the syntactic level, again possibly missing some valid optimization.) Back-patch to v12. The computation is surely also wrong before that, but it appears that we cannot reach a bad plan thanks to join order restrictions imposed on the subquery that the PlaceHolderVar came from. The error only became reachable when commit 4be058fe9 allowed trivial subqueries to be collapsed out completely, eliminating their join order restrictions. Per report from Stephan Springl. Discussion: https://postgr.es/m/171041.1610849523@sss.pgh.pa.us
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Allow partitionwise joins in more cases.Etsuro Fujita2020-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | Previously, the partitionwise join technique only allowed partitionwise join when input partitioned tables had exactly the same partition bounds. This commit extends the technique to some cases when the tables have different partition bounds, by using an advanced partition-matching algorithm introduced by this commit. For both the input partitioned tables, the algorithm checks whether every partition of one input partitioned table only matches one partition of the other input partitioned table at most, and vice versa. In such a case the join between the tables can be broken down into joins between the matching partitions, so the algorithm produces the pairs of the matching partitions, plus the partition bounds for the join relation, to allow partitionwise join for computing the join. Currently, the algorithm works for list-partitioned and range-partitioned tables, but not hash-partitioned tables. See comments in partition_bounds_merge(). Ashutosh Bapat and Etsuro Fujita, most of regression tests by Rajkumar Raghuwanshi, some of the tests by Mark Dilger and Amul Sul, reviewed by Dmitry Dolgov and Amul Sul, with additional review at various points by Ashutosh Bapat, Mark Dilger, Robert Haas, Antonin Houska, Amit Langote, Justin Pryzby, and Tomas Vondra Discussion: https://postgr.es/m/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Fix EXPLAIN's column alias output for mismatched child tables.Tom Lane2019-12-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an inheritance/partitioning parent table is assigned some column alias names in the query, EXPLAIN mapped those aliases onto the child tables' columns by physical position, resulting in bogus output if a child table's columns aren't one-for-one with the parent's. To fix, make expand_single_inheritance_child() generate a correctly re-mapped column alias list, rather than just copying the parent RTE's alias node. (We have to fill the alias field, not just adjust the eref field, because ruleutils.c will ignore eref in favor of looking at the real column names.) This means that child tables will now always have alias fields in plan rtables, where before they might not have. That results in a rather substantial set of regression test output changes: EXPLAIN will now always show child tables with aliases that match the parent table (usually with "_N" appended for uniqueness). But that seems like a net positive for understandability, since the parent alias corresponds to something that actually appeared in the original query, while the child table names didn't. (Note that this does not change anything for cases where an explicit table alias was written in the query for the parent table; it just makes cases without such aliases behave similarly to that.) Hence, while we could avoid these subsidiary changes if we made inherit.c more complicated, we choose not to. Discussion: https://postgr.es/m/12424.1575168015@sss.pgh.pa.us
* Phase 2 pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | | Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
* Initial pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
* Speed up planning when partitions can be pruned at plan time.Tom Lane2019-03-30
| | | | | | | | | | | | | | | | | | | | | | Previously, the planner created RangeTblEntry and RelOptInfo structs for every partition of a partitioned table, even though many of them might later be deemed uninteresting thanks to partition pruning logic. This incurred significant overhead when there are many partitions. Arrange to postpone creation of these data structures until after we've processed the query enough to identify restriction quals for the partitioned table, and then apply partition pruning before not after creation of each partition's data structures. In this way we need not open the partition relations at all for partitions that the planner has no real interest in. For queries that can be proven at plan time to access only a small number of partitions, this patch improves the practical maximum number of partitions from under 100 to perhaps a few thousand. Amit Langote, reviewed at various times by Dilip Kumar, Jesper Pedersen, Yoshikazu Imai, and David Rowley Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
* Generated columnsPeter Eisentraut2019-03-30
| | | | | | | | | | | | | | This is an SQL-standard feature that allows creating columns that are computed from expressions rather than assigned, similar to a view or materialized view but on a column basis. This implements one kind of generated column: stored (computed on write). Another kind, virtual (computed on read), is planned for the future, and some room is left for it. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
* Get rid of duplicate child RTE for a partitioned table.Tom Lane2019-03-26
| | | | | | | | | | | | | | | | | | We've been creating duplicate RTEs for partitioned tables just because we do so for regular inheritance parent tables. But unlike regular-inheritance parents which are themselves regular tables and thus need to be scanned, partitioned tables don't need the extra RTE. This makes the conditions for building a child RTE the same as those for building an AppendRelInfo, allowing minor simplification in expand_single_inheritance_child. Since the planner's actual processing is driven off the AppendRelInfo list, nothing much changes beyond that, we just have one fewer useless RTE entry. Amit Langote, reviewed and hacked a bit by me Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
* Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.Robert Haas2019-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We still require AccessExclusiveLock on the partition itself, because otherwise an insert that violates the newly-imposed partition constraint could be in progress at the same time that we're changing that constraint; only the lock level on the parent relation is weakened. To make this safe, we have to cope with (at least) three separate problems. First, relevant DDL might commit while we're in the process of building a PartitionDesc. If so, find_inheritance_children() might see a new partition while the RELOID system cache still has the old partition bound cached, and even before invalidation messages have been queued. To fix that, if we see that the pg_class tuple seems to be missing or to have a null relpartbound, refetch the value directly from the table. We can't get the wrong value, because DETACH PARTITION still requires AccessExclusiveLock throughout; if we ever want to change that, this will need more thought. In testing, I found it quite difficult to hit even the null-relpartbound case; the race condition is extremely tight, but the theoretical risk is there. Second, successive calls to RelationGetPartitionDesc might not return the same answer. The query planner will get confused if lookup up the PartitionDesc for a particular relation does not return a consistent answer for the entire duration of query planning. Likewise, query execution will get confused if the same relation seems to have a different PartitionDesc at different times. Invent a new PartitionDirectory concept and use it to ensure consistency. This ensures that a single invocation of either the planner or the executor sees the same view of the PartitionDesc from beginning to end, but it does not guarantee that the planner and the executor see the same view. Since this allows pointers to old PartitionDesc entries to survive even after a relcache rebuild, also postpone removing the old PartitionDesc entry until we're certain no one is using it. For the most part, it seems to be OK for the planner and executor to have different views of the PartitionDesc, because the executor will just ignore any concurrently added partitions which were unknown at plan time; those partitions won't be part of the inheritance expansion, but invalidation messages will trigger replanning at some point. Normally, this happens by the time the very next command is executed, but if the next command acquires no locks and executes a prepared query, it can manage not to notice until a new transaction is started. We might want to tighten that up, but it's material for a separate patch. There would still be a small window where a query that started just after an ATTACH PARTITION command committed might fail to notice its results -- but only if the command starts before the commit has been acknowledged to the user. All in all, the warts here around serializability seem small enough to be worth accepting for the considerable advantage of being able to add partitions without a full table lock. Although in general the consequences of new partitions showing up between planning and execution are limited to the query not noticing the new partitions, run-time partition pruning will get confused in that case, so that's the third problem that this patch fixes. Run-time partition pruning assumes that indexes into the PartitionDesc are stable between planning and execution. So, add code so that if new partitions are added between plan time and execution time, the indexes stored in the subplan_map[] and subpart_map[] arrays within the plan's PartitionedRelPruneInfo get adjusted accordingly. There does not seem to be a simple way to generalize this scheme to cope with partitions that are removed, mostly because they could then get added back again with different bounds, but it works OK for added partitions. This code does not try to ensure that every backend participating in a parallel query sees the same view of the PartitionDesc. That currently doesn't matter, because we never pass PartitionDesc indexes between backends. Each backend will ignore the concurrently added partitions which it notices, and it doesn't matter if different backends are ignoring different sets of concurrently added partitions. If in the future that matters, for example because we allow writes in parallel query and want all participants to do tuple routing to the same set of partitions, the PartitionDirectory concept could be improved to share PartitionDescs across backends. There is a draft patch to serialize and restore PartitionDescs on the thread where this patch was discussed, which may be a useful place to start. Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs, Amit Langote, and Michael Paquier for discussion, and to Alvaro Herrera for some review. Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
* Change lock acquisition order in expand_inherited_rtentry.Robert Haas2019-02-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, this function acquired locks in the order using find_all_inheritors(), which locks the children of each table that it processes in ascending OID order, and which processes the inheritance hierarchy as a whole in a breadth-first fashion. Now, it processes the inheritance hierarchy in a depth-first fashion, and at each level it proceeds in the order in which tables appear in the PartitionDesc. If table inheritance rather than table partitioning is used, the old order is preserved. This change moves the locking of any given partition much closer to the code that actually expands that partition. This seems essential if we ever want to allow concurrent DDL to add or remove partitions, because if the set of partitions can change, we must use the same data to decide which partitions to lock as we do to decide which partitions to expand; otherwise, we might expand a partition that we haven't locked. It should hopefully also facilitate efforts to postpone inheritance expansion or locking for performance reasons, because there's really no way to postpone locking some partitions if we're blindly locking them all using find_all_inheritors(). The only downside of this change which is known to me is that it further deviates from the principle that we should always lock the inheritance hierarchy in find_all_inheritors() order to avoid deadlock risk. However, we've already crossed that bridge in commit 9eefba181f7782d27d85d7e94e6028371e7ab2d7 and there are futher patches pending that make similar changes, so this isn't really giving up anything that we haven't surrendered already -- and it seems entirely worth it, given the performance benefits some of those changes seem likely to bring. Patch by me; thanks to David Rowley for discussion of these issues. Discussion: http://postgr.es/m/CAKJS1f_eEYVEq5tM8sm1k-HOwG0AyCPwX54XG9x4w0zy_N4Q_Q@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com
* Move code for managing PartitionDescs into a new file, partdesc.cRobert Haas2019-02-21
| | | | | | | | | | This is similar in spirit to the existing partbounds.c file in the same directory, except that there's a lot less code in the new file created by this commit. Pending work in this area proposes to add a bunch more code related to PartitionDescs, though, and this will give us a good place to put it. Discussion: http://postgr.es/m/CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com
* Replace uses of heap_open et al with the corresponding table_* function.Andres Freund2019-01-21
| | | | | Author: Andres Freund Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
* Replace heapam.h includes with {table, relation}.h where applicable.Andres Freund2019-01-21
| | | | | | | | | A lot of files only included heapam.h for relation_open, heap_open etc - replace the heapam.h include in those files with the narrower header. Author: Andres Freund Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
* Reorganize planner code moved in b60c39759908Alvaro Herrera2019-01-16
| | | | | | | | It seems modules are better defined like this instead of the original split. Per complaints from David Rowley as well as Amit Langote's self review. Discussion: https://postgr.es/m/CAKJS1f988rsyhwvLgfT-y1UCYUfXDOv67ENQk=v24OxhsZOzZw@mail.gmail.com
* Move inheritance expansion code into its own fileAlvaro Herrera2019-01-10
This commit moves expand_inherited_tables and underlings from optimizer/prep/prepunionc.c to optimizer/utils/inherit.c. Also, all of the AppendRelInfo-based expression manipulation routines are moved to optimizer/utils/appendinfo.c. No functional code changes. One exception is the introduction of make_append_rel_info, but that's still just moving around code. Also, stop including <limits.h> in prepunion.c, which no longer needs it since 3fc6e2d7f5b6. I (Álvaro) noticed this because Amit was copying that to inherit.c, which likewise doesn't need it. Author: Amit Langote Discussion: https://postgr.es/m/3be67028-a00a-502c-199a-da00eec8fb6e@lab.ntt.co.jp