aboutsummaryrefslogtreecommitdiff
path: root/src/backend/optimizer/util/inherit.c
Commit message (Collapse)AuthorAge
* Phase 2 pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | | Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
* Initial pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
* Speed up planning when partitions can be pruned at plan time.Tom Lane2019-03-30
| | | | | | | | | | | | | | | | | | | | | | Previously, the planner created RangeTblEntry and RelOptInfo structs for every partition of a partitioned table, even though many of them might later be deemed uninteresting thanks to partition pruning logic. This incurred significant overhead when there are many partitions. Arrange to postpone creation of these data structures until after we've processed the query enough to identify restriction quals for the partitioned table, and then apply partition pruning before not after creation of each partition's data structures. In this way we need not open the partition relations at all for partitions that the planner has no real interest in. For queries that can be proven at plan time to access only a small number of partitions, this patch improves the practical maximum number of partitions from under 100 to perhaps a few thousand. Amit Langote, reviewed at various times by Dilip Kumar, Jesper Pedersen, Yoshikazu Imai, and David Rowley Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
* Generated columnsPeter Eisentraut2019-03-30
| | | | | | | | | | | | | | This is an SQL-standard feature that allows creating columns that are computed from expressions rather than assigned, similar to a view or materialized view but on a column basis. This implements one kind of generated column: stored (computed on write). Another kind, virtual (computed on read), is planned for the future, and some room is left for it. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
* Get rid of duplicate child RTE for a partitioned table.Tom Lane2019-03-26
| | | | | | | | | | | | | | | | | | We've been creating duplicate RTEs for partitioned tables just because we do so for regular inheritance parent tables. But unlike regular-inheritance parents which are themselves regular tables and thus need to be scanned, partitioned tables don't need the extra RTE. This makes the conditions for building a child RTE the same as those for building an AppendRelInfo, allowing minor simplification in expand_single_inheritance_child. Since the planner's actual processing is driven off the AppendRelInfo list, nothing much changes beyond that, we just have one fewer useless RTE entry. Amit Langote, reviewed and hacked a bit by me Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
* Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.Robert Haas2019-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We still require AccessExclusiveLock on the partition itself, because otherwise an insert that violates the newly-imposed partition constraint could be in progress at the same time that we're changing that constraint; only the lock level on the parent relation is weakened. To make this safe, we have to cope with (at least) three separate problems. First, relevant DDL might commit while we're in the process of building a PartitionDesc. If so, find_inheritance_children() might see a new partition while the RELOID system cache still has the old partition bound cached, and even before invalidation messages have been queued. To fix that, if we see that the pg_class tuple seems to be missing or to have a null relpartbound, refetch the value directly from the table. We can't get the wrong value, because DETACH PARTITION still requires AccessExclusiveLock throughout; if we ever want to change that, this will need more thought. In testing, I found it quite difficult to hit even the null-relpartbound case; the race condition is extremely tight, but the theoretical risk is there. Second, successive calls to RelationGetPartitionDesc might not return the same answer. The query planner will get confused if lookup up the PartitionDesc for a particular relation does not return a consistent answer for the entire duration of query planning. Likewise, query execution will get confused if the same relation seems to have a different PartitionDesc at different times. Invent a new PartitionDirectory concept and use it to ensure consistency. This ensures that a single invocation of either the planner or the executor sees the same view of the PartitionDesc from beginning to end, but it does not guarantee that the planner and the executor see the same view. Since this allows pointers to old PartitionDesc entries to survive even after a relcache rebuild, also postpone removing the old PartitionDesc entry until we're certain no one is using it. For the most part, it seems to be OK for the planner and executor to have different views of the PartitionDesc, because the executor will just ignore any concurrently added partitions which were unknown at plan time; those partitions won't be part of the inheritance expansion, but invalidation messages will trigger replanning at some point. Normally, this happens by the time the very next command is executed, but if the next command acquires no locks and executes a prepared query, it can manage not to notice until a new transaction is started. We might want to tighten that up, but it's material for a separate patch. There would still be a small window where a query that started just after an ATTACH PARTITION command committed might fail to notice its results -- but only if the command starts before the commit has been acknowledged to the user. All in all, the warts here around serializability seem small enough to be worth accepting for the considerable advantage of being able to add partitions without a full table lock. Although in general the consequences of new partitions showing up between planning and execution are limited to the query not noticing the new partitions, run-time partition pruning will get confused in that case, so that's the third problem that this patch fixes. Run-time partition pruning assumes that indexes into the PartitionDesc are stable between planning and execution. So, add code so that if new partitions are added between plan time and execution time, the indexes stored in the subplan_map[] and subpart_map[] arrays within the plan's PartitionedRelPruneInfo get adjusted accordingly. There does not seem to be a simple way to generalize this scheme to cope with partitions that are removed, mostly because they could then get added back again with different bounds, but it works OK for added partitions. This code does not try to ensure that every backend participating in a parallel query sees the same view of the PartitionDesc. That currently doesn't matter, because we never pass PartitionDesc indexes between backends. Each backend will ignore the concurrently added partitions which it notices, and it doesn't matter if different backends are ignoring different sets of concurrently added partitions. If in the future that matters, for example because we allow writes in parallel query and want all participants to do tuple routing to the same set of partitions, the PartitionDirectory concept could be improved to share PartitionDescs across backends. There is a draft patch to serialize and restore PartitionDescs on the thread where this patch was discussed, which may be a useful place to start. Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs, Amit Langote, and Michael Paquier for discussion, and to Alvaro Herrera for some review. Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
* Change lock acquisition order in expand_inherited_rtentry.Robert Haas2019-02-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, this function acquired locks in the order using find_all_inheritors(), which locks the children of each table that it processes in ascending OID order, and which processes the inheritance hierarchy as a whole in a breadth-first fashion. Now, it processes the inheritance hierarchy in a depth-first fashion, and at each level it proceeds in the order in which tables appear in the PartitionDesc. If table inheritance rather than table partitioning is used, the old order is preserved. This change moves the locking of any given partition much closer to the code that actually expands that partition. This seems essential if we ever want to allow concurrent DDL to add or remove partitions, because if the set of partitions can change, we must use the same data to decide which partitions to lock as we do to decide which partitions to expand; otherwise, we might expand a partition that we haven't locked. It should hopefully also facilitate efforts to postpone inheritance expansion or locking for performance reasons, because there's really no way to postpone locking some partitions if we're blindly locking them all using find_all_inheritors(). The only downside of this change which is known to me is that it further deviates from the principle that we should always lock the inheritance hierarchy in find_all_inheritors() order to avoid deadlock risk. However, we've already crossed that bridge in commit 9eefba181f7782d27d85d7e94e6028371e7ab2d7 and there are futher patches pending that make similar changes, so this isn't really giving up anything that we haven't surrendered already -- and it seems entirely worth it, given the performance benefits some of those changes seem likely to bring. Patch by me; thanks to David Rowley for discussion of these issues. Discussion: http://postgr.es/m/CAKJS1f_eEYVEq5tM8sm1k-HOwG0AyCPwX54XG9x4w0zy_N4Q_Q@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com
* Move code for managing PartitionDescs into a new file, partdesc.cRobert Haas2019-02-21
| | | | | | | | | | This is similar in spirit to the existing partbounds.c file in the same directory, except that there's a lot less code in the new file created by this commit. Pending work in this area proposes to add a bunch more code related to PartitionDescs, though, and this will give us a good place to put it. Discussion: http://postgr.es/m/CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com
* Replace uses of heap_open et al with the corresponding table_* function.Andres Freund2019-01-21
| | | | | Author: Andres Freund Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
* Replace heapam.h includes with {table, relation}.h where applicable.Andres Freund2019-01-21
| | | | | | | | | A lot of files only included heapam.h for relation_open, heap_open etc - replace the heapam.h include in those files with the narrower header. Author: Andres Freund Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
* Reorganize planner code moved in b60c39759908Alvaro Herrera2019-01-16
| | | | | | | | It seems modules are better defined like this instead of the original split. Per complaints from David Rowley as well as Amit Langote's self review. Discussion: https://postgr.es/m/CAKJS1f988rsyhwvLgfT-y1UCYUfXDOv67ENQk=v24OxhsZOzZw@mail.gmail.com
* Move inheritance expansion code into its own fileAlvaro Herrera2019-01-10
This commit moves expand_inherited_tables and underlings from optimizer/prep/prepunionc.c to optimizer/utils/inherit.c. Also, all of the AppendRelInfo-based expression manipulation routines are moved to optimizer/utils/appendinfo.c. No functional code changes. One exception is the introduction of make_append_rel_info, but that's still just moving around code. Also, stop including <limits.h> in prepunion.c, which no longer needs it since 3fc6e2d7f5b6. I (Álvaro) noticed this because Amit was copying that to inherit.c, which likewise doesn't need it. Author: Amit Langote Discussion: https://postgr.es/m/3be67028-a00a-502c-199a-da00eec8fb6e@lab.ntt.co.jp