aboutsummaryrefslogtreecommitdiff
path: root/src/include/executor
Commit message (Collapse)AuthorAge
...
* Clarify the distinction between public and private SPITupleTable fields.Tom Lane2019-07-17
| | | | | | | | | | | | | | | | The fields that we consider public are "tupdesc" and "vals", which historically are in the middle of the struct. Move them to the front (this should be perfectly safe to do in HEAD) and add comments to make it quite clear which fields are public or not. Also adjust spi.sgml's documentation of the struct to match. That doc had bit-rotted somewhat, as it was missing some fields. (Arguably we should just remove all the private fields from the docs, but for now I refrained.) Daniel Gustafsson, reviewed by Fabien Coelho Discussion: https://postgr.es/m/0D19F836-B743-4340-B6A2-F148CA3DD1F0@yesql.se
* Fix inconsistencies in the codeMichael Paquier2019-07-08
| | | | | | | | | | | This addresses a couple of issues in the code: - Typos and inconsistencies in comments and function declarations. - Removal of unreferenced function declarations. - Removal of unnecessary compile flags. - A cleanup error in regressplans.sh. Author: Alexander Lakhin Discussion: https://postgr.es/m/0c991fdf-2670-1997-c027-772a420c4604@gmail.com
* Fix many typos and inconsistenciesMichael Paquier2019-07-01
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/af27d1b3-a128-9d62-46e0-88f424397f44@gmail.com
* Fix more typos and inconsistencies in the treeMichael Paquier2019-06-17
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
* Fix double-word typosAlvaro Herrera2019-06-13
| | | | | Discussion: https://postgr.es/m/20190612184527.GA24266@alvherre.pgsql Reviewed-by: Michaël Paquier
* Fix assorted inconsistencies.Amit Kapila2019-06-08
| | | | | | | | | | There were a number of issues in the recent commits which include typos, code and comments mismatch, leftover function declarations. Fix them. Reported-by: Alexander Lakhin Author: Alexander Lakhin, Amit Kapila and Amit Langote Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/ef0c0232-0c1d-3a35-63d4-0ebd06e31387@gmail.com
* Fix typos in various placesMichael Paquier2019-06-03
| | | | | | Author: Andrea Gelmini Reviewed-by: Michael Paquier, Justin Pryzby Discussion: https://postgr.es/m/20190528181718.GA39034@glet
* Phase 2 pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | | Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
* Initial pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
* Restructure creation of run-time pruning steps.Tom Lane2019-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, gen_partprune_steps() always built executor pruning steps using all suitable clauses, including those containing PARAM_EXEC Params. This meant that the pruning steps were only completely safe for executor run-time (scan start) pruning. To prune at executor startup, we had to ignore the steps involving exec Params. But this doesn't really work in general, since there may be logic changes needed as well --- for example, pruning according to the last operator's btree strategy is the wrong thing if we're not applying that operator. The rules embodied in gen_partprune_steps() and its minions are sufficiently complicated that tracking their incremental effects in other logic seems quite impractical. Short of a complete redesign, the only safe fix seems to be to run gen_partprune_steps() twice, once to create executor startup pruning steps and then again for run-time pruning steps. We can save a few cycles however by noting during the first scan whether we rejected any clauses because they involved exec Params --- if not, we don't need to do the second scan. In support of this, refactor the internal APIs in partprune.c to make more use of passing information in the GeneratePruningStepsContext struct, rather than as separate arguments. This is, I hope, the last piece of our response to a bug report from Alan Jackson. Back-patch to v11 where this code came in. Discussion: https://postgr.es/m/FAD28A83-AC73-489E-A058-2681FA31D648@tvsquared.com
* Fix two memory leaks around force-storing tuples in slots.Andres Freund2019-04-19
| | | | | | | | | | | | | | | | | | | | As reported by Tom, when ExecStoreMinimalTuple() had to perform a conversion to store the minimal tuple in the slot, it forgot to respect the shouldFree flag, and leaked the tuple into the current memory context if true. Fix that by freeing the tuple in that case. Looking at the relevant code made me (Andres) realize that not having the shouldFree parameter to ExecForceStoreHeapTuple() was a bad idea. Some callers had to locally implement the necessary logic, and in one case it was missing, creating a potential per-group leak in non-hashed aggregation. The choice to not free the tuple in ExecComputeStoredGenerated() is not pretty, but not introduced by this commit - I'll start a separate discussion about it. Reported-By: Tom Lane Discussion: https://postgr.es/m/366.1555382816@sss.pgh.pa.us
* Generated columnsPeter Eisentraut2019-03-30
| | | | | | | | | | | | | | This is an SQL-standard feature that allows creating columns that are computed from expressions rather than assigned, similar to a view or materialized view but on a column basis. This implements one kind of generated column: stored (computed on write). Another kind, virtual (computed on read), is planned for the future, and some room is left for it. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
* Transaction chainingPeter Eisentraut2019-03-24
| | | | | | | | | | | | | Add command variants COMMIT AND CHAIN and ROLLBACK AND CHAIN, which start new transactions with the same transaction characteristics as the just finished one, per SQL standard. Support for transaction chaining in PL/pgSQL is also added. This functionality is especially useful when running COMMIT in a loop in PL/pgSQL. Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr> Discussion: https://www.postgresql.org/message-id/flat/28536681-324b-10dc-ade8-ab46f7645a5a@2ndquadrant.com
* tableam: Add tuple_{insert, delete, update, lock} and use.Andres Freund2019-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds new, required, table AM callbacks for insert/delete/update and lock_tuple. To be able to reasonably use those, the EvalPlanQual mechanism had to be adapted, moving more logic into the AM. Previously both delete/update/lock call-sites and the EPQ mechanism had to have awareness of the specific tuple format to be able to fetch the latest version of a tuple. Obviously that needs to be abstracted away. To do so, move the logic that find the latest row version into the AM. lock_tuple has a new flag argument, TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last version, rather than the current one. It'd have been possible to do so via a separate callback as well, but finding the last version usually also necessitates locking the newest version, making it sensible to combine the two. This replaces the previous use of EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously signaled either a concurrent update or delete, is now split into two, to avoid callers needing AM specific knowledge to differentiate. The move of finding the latest row version into tuple_lock means that encountering a row concurrently moved into another partition will now raise an error about "tuple to be locked" rather than "tuple to be updated/deleted" - which is accurate, as that always happens when locking rows. While possible slightly less helpful for users, it seems like an acceptable trade-off. As part of this commit HTSU_Result has been renamed to TM_Result, and its members been expanded to differentiated between updating and deleting. HeapUpdateFailureData has been renamed to TM_FailureData. The interface to speculative insertion is changed so nodeModifyTable.c does not have to set the speculative token itself anymore. Instead there's a version of tuple_insert, tuple_insert_speculative, that performs the speculative insertion (without requiring a flag to signal that fact), and the speculative insertion is either made permanent with table_complete_speculative(succeeded = true) or aborted with succeeded = false). Note that multi_insert is not yet routed through tableam, nor is COPY. Changing multi_insert requires changes to copy.c that are large enough to better be done separately. Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED VIEW are also only going to be adjusted in a later commit. Author: Andres Freund and Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
* Collations with nondeterministic comparisonPeter Eisentraut2019-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a flag "deterministic" to collations. If that is false, such a collation disables various optimizations that assume that strings are equal only if they are byte-wise equal. That then allows use cases such as case-insensitive or accent-insensitive comparisons or handling of strings with different Unicode normal forms. This functionality is only supported with the ICU provider. At least glibc doesn't appear to have any locales that work in a nondeterministic way, so it's not worth supporting this for the libc provider. The term "deterministic comparison" in this context is from Unicode Technical Standard #10 (https://unicode.org/reports/tr10/#Deterministic_Comparison). This patch makes changes in three areas: - CREATE COLLATION DDL changes and system catalog changes to support this new flag. - Many executor nodes and auxiliary code are extended to track collations. Previously, this code would just throw away collation information, because the eventually-called user-defined functions didn't use it since they only cared about equality, which didn't need collation information. - String data type functions that do equality comparisons and hashing are changed to take the (non-)deterministic flag into account. For comparison, this just means skipping various shortcuts and tie breakers that use byte-wise comparison. For hashing, we first need to convert the input string to a canonical "sort key" using the ICU analogue of strxfrm(). Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
* Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.Robert Haas2019-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We still require AccessExclusiveLock on the partition itself, because otherwise an insert that violates the newly-imposed partition constraint could be in progress at the same time that we're changing that constraint; only the lock level on the parent relation is weakened. To make this safe, we have to cope with (at least) three separate problems. First, relevant DDL might commit while we're in the process of building a PartitionDesc. If so, find_inheritance_children() might see a new partition while the RELOID system cache still has the old partition bound cached, and even before invalidation messages have been queued. To fix that, if we see that the pg_class tuple seems to be missing or to have a null relpartbound, refetch the value directly from the table. We can't get the wrong value, because DETACH PARTITION still requires AccessExclusiveLock throughout; if we ever want to change that, this will need more thought. In testing, I found it quite difficult to hit even the null-relpartbound case; the race condition is extremely tight, but the theoretical risk is there. Second, successive calls to RelationGetPartitionDesc might not return the same answer. The query planner will get confused if lookup up the PartitionDesc for a particular relation does not return a consistent answer for the entire duration of query planning. Likewise, query execution will get confused if the same relation seems to have a different PartitionDesc at different times. Invent a new PartitionDirectory concept and use it to ensure consistency. This ensures that a single invocation of either the planner or the executor sees the same view of the PartitionDesc from beginning to end, but it does not guarantee that the planner and the executor see the same view. Since this allows pointers to old PartitionDesc entries to survive even after a relcache rebuild, also postpone removing the old PartitionDesc entry until we're certain no one is using it. For the most part, it seems to be OK for the planner and executor to have different views of the PartitionDesc, because the executor will just ignore any concurrently added partitions which were unknown at plan time; those partitions won't be part of the inheritance expansion, but invalidation messages will trigger replanning at some point. Normally, this happens by the time the very next command is executed, but if the next command acquires no locks and executes a prepared query, it can manage not to notice until a new transaction is started. We might want to tighten that up, but it's material for a separate patch. There would still be a small window where a query that started just after an ATTACH PARTITION command committed might fail to notice its results -- but only if the command starts before the commit has been acknowledged to the user. All in all, the warts here around serializability seem small enough to be worth accepting for the considerable advantage of being able to add partitions without a full table lock. Although in general the consequences of new partitions showing up between planning and execution are limited to the query not noticing the new partitions, run-time partition pruning will get confused in that case, so that's the third problem that this patch fixes. Run-time partition pruning assumes that indexes into the PartitionDesc are stable between planning and execution. So, add code so that if new partitions are added between plan time and execution time, the indexes stored in the subplan_map[] and subpart_map[] arrays within the plan's PartitionedRelPruneInfo get adjusted accordingly. There does not seem to be a simple way to generalize this scheme to cope with partitions that are removed, mostly because they could then get added back again with different bounds, but it works OK for added partitions. This code does not try to ensure that every backend participating in a parallel query sees the same view of the PartitionDesc. That currently doesn't matter, because we never pass PartitionDesc indexes between backends. Each backend will ignore the concurrently added partitions which it notices, and it doesn't matter if different backends are ignoring different sets of concurrently added partitions. If in the future that matters, for example because we allow writes in parallel query and want all participants to do tuple routing to the same set of partitions, the PartitionDirectory concept could be improved to share PartitionDescs across backends. There is a draft patch to serialize and restore PartitionDescs on the thread where this patch was discussed, which may be a useful place to start. Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs, Amit Langote, and Michael Paquier for discussion, and to Alvaro Herrera for some review. Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
* Store tuples for EvalPlanQual in slots, rather than as HeapTuples.Andres Freund2019-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | For the upcoming pluggable table access methods it's quite inconvenient to store tuples as HeapTuples, as that'd require converting tuples from a their native format into HeapTuples. Instead use slots to manage epq tuples. To fit into that scheme, change the foreign data wrapper callback RefetchForeignRow, to store the tuple in a slot. Insist on using the caller provided slot, so it conveniently can be stored in the corresponding EPQ slot. As there is no in core user of RefetchForeignRow, that change was done blindly, but we plan to test that soon. To avoid duplicating that work for row locks, move row locks to just directly use the EPQ slots - it previously temporarily stored tuples in LockRowsState.lr_curtuples, but that doesn't seem beneficial, given we'd possibly end up with a significant number of additional slots. The behaviour of es_epqTupleSet[rti -1] is now checked by es_epqTupleSlot[rti -1] != NULL, as that is distinguishable from a slot containing an empty tuple. Author: Andres Freund, Haribabu Kommi, Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Use slots in trigger infrastructure, except for the actual invocation.Andres Freund2019-02-26
| | | | | | | | | | | | | | | | | | | | | In preparation for abstracting table storage, convert trigger.c to track tuples in slots. Which also happens to make code calling triggers simpler. As the calling interface for triggers themselves is not changed in this patch, HeapTuples still are extracted from the slot at that time. But that's handled solely inside trigger.c, not visible to callers. It's quite likely that we'll want to revise the external trigger interface, but that's a separate large project. As part of this work the slots used for old/new/return tuples are moved from EState into ResultRelInfo, as different updated tables might need different slots. The slots are now also now created on-demand, which is good both from an efficiency POV, but also makes the modifying code simpler. Author: Andres Freund, Amit Khandekar and Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Store table oid and tuple's tid in tuple slots directly.Andres Freund2019-02-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | After the introduction of tuple table slots all table AMs need to support returning the table oid of the tuple stored in a slot created by said AM. It does not make sense to re-implement that in every AM, therefore move handling of table OIDs into the TupleTableSlot structure itself. It's possible that we, at a later date, might want to get rid of HeapTupleData.t_tableOid entirely, but doing so before the abstractions for table AMs are integrated turns out to be too hard, so delay that for now. Similarly, every AM needs to support the concept of a tuple identifier (tid / item pointer) for its tuples. It's quite possible that we'll generalize the exact form of a tid at a future point (to allow for things like index organized tables), but for now many parts of the code know about tids, so there's not much point in abstracting tids away. Therefore also move into slot (rather than providing API to set/get the tid associated with the tuple in a slot). Once table AM includes insert/updating/deleting tuples, the responsibility to set the correct tid after such an action will move into that. After that change, code doing such modifications, should not have to deal with HeapTuples directly anymore. Author: Andres Freund, Haribabu Kommi and Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Allow to use HeapTupleData embedded in [Buffer]HeapTupleTableSlot.Andres Freund2019-02-26
| | | | | | | | | | | | | | | | That avoids having to care about the lifetime of the HeapTupleHeaderData passed to ExecStore[Buffer]HeapTuple(). That doesn't make a huge difference for a plain HeapTupleTableSlot, but for BufferHeapTupleTableSlot it can be a significant advantage, avoiding the need to materialize slots where it's inconvenient to provide a HeapTupleData with appropriate lifetime to point to the on-disk tuple. It's quite possible that we'll want to add support functions for constructing HeapTuples using that embedded HeapTupleData, but for now callers do so themselves. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Add ExecStorePinnedBufferHeapTuple.Andres Freund2019-02-26
| | | | | | | | | | | | This allows to avoid an unnecessary pin/unpin cycle when storing a tuple in an already pinned buffer into a slot, when the pin isn't further needed at the call site. Only a single caller for now (to ensure coverage), but upcoming patches will increase use of the new function. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Allow to reset execGrouping.c style tuple hashtables.Andres Freund2019-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | This has the advantage that the comparator expression, the table's slot, etc do not have to be rebuilt. Additionally the simplehash.h hashtable within the tuple hashtable now keeps its previous size and doesn't need to be reallocated. That both reduces allocator overhead, and improves performance in cases where the input estimation was off by a significant factor. To avoid an API/ABI break, the new parameter is exposed via the new BuildTupleHashTableExt(), and BuildTupleHashTable() now is a wrapper around the former, that continues to allocate the table itself in the tablecxt. Using this fixes performance issues discovered in the two bugs referenced. This commit however has not converted the callers, that's done in a separate commit. Bug: #15592 #15486 Reported-By: Jakub Janeček, Dmitry Marakasov Author: Andres Freund Discussion: https://postgr.es/m/15486-05850f065da42931@postgresql.org https://postgr.es/m/20190114180423.ywhdg2iagzvh43we@alap3.anarazel.de Backpatch: 11, this is a prerequisite for other fixes
* Renaming for new subscripting mechanismAlvaro Herrera2019-02-01
| | | | | | | | | | | | Over at patch https://commitfest.postgresql.org/21/1062/ Dmitry wants to introduce a more generic subscription mechanism, which allows subscripting not only arrays but also other object types such as JSONB. That functionality is introduced in a largish invasive patch, out of which this internal renaming patch was extracted. Author: Dmitry Dolgov Reviewed-by: Tom Lane, Arthur Zakirov Discussion: https://postgr.es/m/CA+q6zcUK4EqPAu7XRRO5CCjMwhz5zvg+rfWuLzVoxp_5sKS6=w@mail.gmail.com
* Rename nodes/relation.h to nodes/pathnodes.h.Tom Lane2019-01-29
| | | | | | | | | | | | | The old name of this file was never a very good indication of what it was for. Now that there's also access/relation.h, we have a potential confusion hazard as well, so let's rename it to something more apropos. Per discussion, "pathnodes.h" is reasonable, since a good fraction of the file is Path node definitions. While at it, tweak a couple of other headers that were gratuitously importing relation.h into modules that don't need it. Discussion: https://postgr.es/m/7719.1548688728@sss.pgh.pa.us
* Change function call information to be variable length.Andres Freund2019-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this change FunctionCallInfoData, the struct arguments etc for V1 function calls are stored in, always had space for FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two arrays. For nearly every function call 100 arguments is far more than needed, therefore wasting memory. Arg and argnull being two separate arrays also guarantees that to access a single argument, two cachelines have to be touched. Change the layout so there's a single variable-length array with pairs of value / isnull. That drastically reduces memory consumption for most function calls (on x86-64 a two argument function now uses 64bytes, previously 936 bytes), and makes it very likely that argument value and its nullness are on the same cacheline. Arguments are stored in a new NullableDatum struct, which, due to padding, needs more memory per argument than before. But as usually far fewer arguments are stored, and individual arguments are cheaper to access, that's still a clear win. It's likely that there's other places where conversion to NullableDatum arrays would make sense, e.g. TupleTableSlots, but that's for another commit. Because the function call information is now variable-length allocations have to take the number of arguments into account. For heap allocations that can be done with SizeForFunctionCallInfoData(), for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro that helps to allocate an appropriately sized and aligned variable. Some places with stack allocation function call information don't know the number of arguments at compile time, and currently variably sized stack allocations aren't allowed in postgres. Therefore allow for FUNC_MAX_ARGS space in these cases. They're not that common, so for now that seems acceptable. Because of the need to allocate FunctionCallInfo of the appropriate size, older extensions may need to update their code. To avoid subtle breakages, the FunctionCallInfoData struct has been renamed to FunctionCallInfoBaseData. Most code only references FunctionCallInfo, so that shouldn't cause much collateral damage. This change is also a prerequisite for more efficient expression JIT compilation (by allocating the function call information on the stack, allowing LLVM to optimize it away); previously the size of the call information caused problems inside LLVM's optimizer. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
* Don't include genam.h from execnodes.h and relscan.h anymore.Andres Freund2019-01-14
| | | | | | | | | | | | | | | | | | | This is the genam.h equivalent of 4c850ecec649c (which removed heapam.h from a lot of other headers). There's still a few header includes of genam.h, but not from central headers anymore. As a few headers are not indirectly included anymore, execnodes.h and relscan.h need a few additional includes. Some of the depended on types were replacable by using the underlying structs, but e.g. for Snapshot in execnodes.h that'd have gotten more invasive than reasonable in this commit. Like the aforementioned commit 4c850ecec649c, this requires adding new genam.h includes to a number of backend files, which likely is also required in a few external projects. Author: Andres Freund Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
* Don't include heapam.h from others headers.Andres Freund2019-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | heapam.h previously was included in a number of widely used headers (e.g. execnodes.h, indirectly in executor.h, ...). That's problematic on its own, as heapam.h contains a lot of low-level details that don't need to be exposed that widely, but becomes more problematic with the upcoming introduction of pluggable table storage - it seems inappropriate for heapam.h to be included that widely afterwards. heapam.h was largely only included in other headers to get the HeapScanDesc typedef (which was defined in heapam.h, even though HeapScanDescData is defined in relscan.h). The better solution here seems to be to just use the underlying struct (forward declared where necessary). Similar for BulkInsertState. Another problem was that LockTupleMode was used in executor.h - parts of the file tried to cope without heapam.h, but due to the fact that it indirectly included it, several subsequent violations of that goal were not not noticed. We could just reuse the approach of declaring parameters as int, but it seems nicer to move LockTupleMode to lockoptions.h - that's not a perfect location, but also doesn't seem bad. As a number of files relied on implicitly included heapam.h, a significant number of files grew an explicit include. It's quite probably that a few external projects will need to do the same. Author: Andres Freund Reviewed-By: Alvaro Herrera Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Add some const decorationsPeter Eisentraut2018-12-22
| | | | These mainly help understanding the function signatures better.
* Remove WITH OIDS support, change oid catalog column visibility.Andres Freund2018-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
* Make TupleTableSlots extensible, finish split of existing slot type.Andres Freund2018-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit completes the work prepared in 1a0586de36, splitting the old TupleTableSlot implementation (which could store buffer, heap, minimal and virtual slots) into four different slot types. As described in the aforementioned commit, this is done with the goal of making tuple table slots extensible, to allow for pluggable table access methods. To achieve runtime extensibility for TupleTableSlots, operations on slots that can differ between types of slots are performed using the TupleTableSlotOps struct provided at slot creation time. That includes information from the size of TupleTableSlot struct to be allocated, initialization, deforming etc. See the struct's definition for more detailed information about callbacks TupleTableSlotOps. I decided to rename TTSOpsBufferTuple to TTSOpsBufferHeapTuple and ExecCopySlotTuple to ExecCopySlotHeapTuple, as that seems more consistent with other naming introduced in recent patches. There's plenty optimization potential in the slot implementation, but according to benchmarking the state after this commit has similar performance characteristics to before this set of changes, which seems sufficient. There's a few changes in execReplication.c that currently need to poke through the slot abstraction, that'll be repaired once the pluggable storage patchset provides the necessary infrastructure. Author: Andres Freund and Ashutosh Bapat, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
* Inline hot path of slot_getsomeattrs().Andres Freund2018-11-16
| | | | | | | | This yields a minor speedup, which roughly balances the loss from the upcoming introduction of callbacks to do some operations on slots. Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
* Redesign initialization of partition routing structuresAlvaro Herrera2018-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well as the future MERGE) on partitioned tables. This changes the setup for tuple routing so that it does far less work during the initial setup and pushes more work out to when partitions receive tuples. PartitionDispatchData structs for sub-partitioned tables are only created when a tuple gets routed through it. The possibly large arrays in the PartitionTupleRouting struct have largely been removed. The partitions[] array remains but now never contains any NULL gaps. Previously the NULLs had to be skipped during ExecCleanupTupleRouting(), which could add a large overhead to the cleanup when the number of partitions was large. The partitions[] array is allocated small to start with and only enlarged when we route tuples to enough partitions that it runs out of space. This allows us to keep simple single-row partition INSERTs running quickly. Redesign The arrays in PartitionTupleRouting which stored the tuple translation maps have now been removed. These have been moved out into a PartitionRoutingInfo struct which is an additional field in ResultRelInfo. The find_all_inheritors() call still remains by far the slowest part of ExecSetupPartitionTupleRouting(). This commit just removes the other slow parts. In passing also rename the tuple translation maps from being ParentToChild and ChildToParent to being RootToPartition and PartitionToRoot. The old names mislead you into thinking that a partition of some sub-partitioned table would translate to the rowtype of the sub-partitioned table rather than the root partitioned table. Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera Testing help from Jesper Pedersen and Kato Sho. Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com
* Add dummy field to currently empty struct TupleTableSlotOps.Andres Freund2018-11-15
| | | | Per MSVC complaint on buildfarm member dory.
* Compute information about EEOP_*_FETCHSOME at expression init time.Andres Freund2018-11-15
| | | | | | | | | | | | | | Previously this information was computed when JIT compiling an expression. But the information is useful for assertions in the non-JIT case too (for assertions), therefore it makes sense to move it. This will, in a followup commit, allow to treat different slot types differently. E.g. for virtual slots there's no need to generate a JIT function to deform the slot. Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
* Introduce notion of different types of slots (without implementing them).Andres Freund2018-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upcoming work intends to allow pluggable ways to introduce new ways of storing table data. Accessing those table access methods from the executor requires TupleTableSlots to be carry tuples in the native format of such storage methods; otherwise there'll be a significant conversion overhead. Different access methods will require different data to store tuples efficiently (just like virtual, minimal, heap already require fields in TupleTableSlot). To allow that without requiring additional pointer indirections, we want to have different structs (embedding TupleTableSlot) for different types of slots. Thus different types of slots are needed, which requires adapting creators of slots. The slot that most efficiently can represent a type of tuple in an executor node will often depend on the type of slot a child node uses. Therefore we need to track the type of slot is returned by nodes, so parent slots can create slots based on that. Relatedly, JIT compilation of tuple deforming needs to know which type of slot a certain expression refers to, so it can create an appropriate deforming function for the type of tuple in the slot. But not all nodes will only return one type of slot, e.g. an append node will potentially return different types of slots for each of its subplans. Therefore add function that allows to query the type of a node's result slot, and whether it'll always be the same type (whether it's fixed). This can be queried using ExecGetResultSlotOps(). The scan, result, inner, outer type of slots are automatically inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(), left/right subtrees respectively. If that's not correct for a node, that can be overwritten using new fields in PlanState. This commit does not introduce the actually abstracted implementation of different kind of TupleTableSlots, that will be left for a followup commit. The different types of slots introduced will, for now, still use the same backing implementation. While this already partially invalidates the big comment in tuptable.h, it seems to make more sense to update it later, when the different TupleTableSlot implementations actually exist. Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
* Rejigger materializing and fetching a HeapTuple from a slot.Andres Freund2018-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | Previously materializing a slot always returned a HeapTuple. As current work aims to reduce the reliance on HeapTuples (so other storage systems can work efficiently), that needs to change. Thus split the tasks of materializing a slot (i.e. making it independent from the underlying storage / other memory contexts) from fetching a HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from a slot and materializing the slot at the same time, controlled by a parameter. For now some callers of ExecFetchSlotHeapTuple, with materialize = true, expect that changes to the heap tuple will be reflected in the underlying slot. Those places will be adapted in due course, so while not pretty, that's OK for now. Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely that future storage methods will need similar methods. There already is ExecFetchSlotMinimalTuple, so the new names make the naming scheme more coherent. Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
* Don't require return slots for nodes without projection.Andres Freund2018-11-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a lot of nodes the return slot is not required. That can either be because the node doesn't do any projection (say an Append node), or because the node does perform projections but the projection is optimized away because the projection would yield an identical row. Slots aren't that small, especially for wide rows, so it's worthwhile to avoid creating them. It's not possible to just skip creating the slot - it's currently used to determine the tuple descriptor returned by ExecGetResultType(). So separate the determination of the result type from the slot creation. The work previously done internally ExecInitResultTupleSlotTL() can now also be done separately with ExecInitResultTypeTL() and ExecInitResultSlot(). That way nodes that aren't guaranteed to need a result slot, can use ExecInitResultTypeTL() to determine the result type of the node, and ExecAssignScanProjectionInfo() (via ExecConditionalAssignProjectionInfo()) determines that a result slot is needed, it is created with ExecInitResultSlot(). Besides the advantage of avoiding to create slots that then are unused, this is necessary preparation for later patches around tuple table slot abstraction. In particular separating the return descriptor and slot is a prerequisite to allow JITing of tuple deforming with knowledge of the underlying tuple format, and to avoid unnecessarily creating JITed tuple deforming for virtual slots. This commit removes a redundant argument from ExecInitResultTupleSlotTL(). While this commit touches a lot of the relevant lines anyway, it'd normally still not worthwhile to cause breakage, except that aforementioned later commits will touch *all* ExecInitResultTupleSlotTL() callers anyway (but fits worse thematically). Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
* Move EEOP_*_SYSVAR evaluation out of line.Andres Freund2018-11-07
| | | | | | | | | | | | | This mainly de-duplicates code. As evaluating a system variable isn't the hottest path and the current inline implementation ends up calling out to an external function anyway, this is OK from a performance POV. The main motivation for de-duplicating is the upcoming slot abstraction work, after which there's not guaranteed to be a HeapTuple backing the slot. Author: Andres Freund, Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
* Move TupleTableSlots boolean member into one flag variable.Andres Freund2018-10-15
| | | | | | | | | | | | | | | There's several reasons for this change: 1) It reduces the total size of TupleTableSlot / reduces alignment padding, making the commonly accessed members fit into a single cacheline (but we currently do not force proper alignment, so that's not yet guaranteed to be helpful) 2) Combining the booleans into a flag allows to combine read/writes from memory. 3) With the upcoming slot abstraction changes, it allows to have core and extended flags, in a memory efficient way. Author: Ashutosh Bapat and Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
* Move generic slot support functions from heaptuple.c into execTuples.c.Andres Freund2018-10-15
| | | | | | | | | | | | | | | heaptuple.c was never a particular good fit for slot_getattr(), slot_getsomeattrs() and slot_getmissingattrs(), but in upcoming changes slots will be made more abstract (allowing slots that contain different types of tuples), making it clearly the wrong place. Note that slot_deform_tuple() remains in it's current place, as it clearly deals with a HeapTuple. getmissingattrs() also remains, but it's less clear that that's correct - but execTuples.c wouldn't be the right place. Author: Ashutosh Bapat. Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
* Advance transaction timestamp for intra-procedure transactions.Tom Lane2018-10-08
| | | | | | | | Per discussion, this behavior seems less astonishing than not doing so. Peter Eisentraut and Tom Lane Discussion: https://postgr.es/m/20180920234040.GC29981@momjian.us
* Remove more redundant relation locking during executor startup.Tom Lane2018-10-06
| | | | | | | | | | | | | | | | | We already have appropriate locks on every relation listed in the query's rangetable before we reach the executor. Take the next step in exploiting that knowledge by removing code that worries about taking locks on non-leaf result relations in a partitioned table. In particular, get rid of ExecLockNonLeafAppendTables and a stanza in InitPlan that asserts we already have locks on certain such tables. In passing, clean up some now-obsolete comments in InitPlan. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
* In the executor, use an array of pointers to access the rangetable.Tom Lane2018-10-04
| | | | | | | | | | | | | | Instead of doing a lot of list_nth() accesses to es_range_table, create a flattened pointer array during executor startup and index into that to get at individual RangeTblEntrys. This eliminates one source of O(N^2) behavior with lots of partitions. (I'm not exactly convinced that it's the most important source, but it's an easy one to fix.) Amit Langote and David Rowley Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
* Centralize executor's opening/closing of Relations for rangetable entries.Tom Lane2018-10-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | Create an array estate->es_relations[] paralleling the es_range_table, and store references to Relations (relcache entries) there, so that any given RT entry is opened and closed just once per executor run. Scan nodes typically still call ExecOpenScanRelation, but ExecCloseScanRelation is no more; relation closing is now done centrally in ExecEndPlan. This is slightly more complex than one would expect because of the interactions with relcache references held in ResultRelInfo nodes. The general convention is now that ResultRelInfo->ri_RelationDesc does not represent a separate relcache reference and so does not need to be explicitly closed; but there is an exception for ResultRelInfos in the es_trig_target_relations list, which are manufactured by ExecGetTriggerResultRel and have to be cleaned up by ExecCleanUpTriggerState. (That much was true all along, but these ResultRelInfos are now more different from others than they used to be.) To allow the partition pruning logic to make use of es_relations[] rather than having its own relcache references, adjust PartitionedRelPruneInfo to store an RT index rather than a relation OID. Amit Langote, reviewed by David Rowley and Jesper Pedersen, some mods by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
* Use slots more widely in tuple mapping code and make naming more consistent.Andres Freund2018-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It's inefficient to use a single slot for mapping between tuple descriptors for multiple tuples, as previously done when using ConvertPartitionTupleSlot(), as that means the slot's tuple descriptors change for every tuple. Previously we also, via ConvertPartitionTupleSlot(), built new tuples after the mapping even in cases where we, immediately afterwards, access individual columns again. Refactor the code so one slot, on demand, is used for each partition. That avoids having to change the descriptor (and allows to use the more efficient "fixed" tuple slots). Then use slot->slot mapping, to avoid unnecessarily forming a tuple. As the naming between the tuple and slot mapping functions wasn't consistent, rename them to execute_attr_map_{tuple,slot}. It's likely that we'll also rename convert_tuples_by_* to denote that these functions "only" build a map, but that's left for later. Author: Amit Khandekar and Amit Langote, editorialized by me Reviewed-By: Amit Langote, Amit Khandekar, Andres Freund Discussion: https://postgr.es/m/CAJ3gD9fR0wRNeAE8VqffNTyONS_UfFPRpqxhnD9Q42vZB+Jvpg@mail.gmail.com https://postgr.es/m/e4f9d743-cd4b-efb0-7574-da21d86a7f36%40lab.ntt.co.jp Backpatch: -
* Split ExecStoreTuple into ExecStoreHeapTuple and ExecStoreBufferHeapTuple.Andres Freund2018-09-25
| | | | | | | | | | | | | | | | | | | | Upcoming changes introduce further types of tuple table slots, in preparation of making table storage pluggable. New storage methods will have different representation of tuples, therefore the slot accessor should refer explicitly to heap tuples. Instead of just renaming the functions, split it into one function that accepts heap tuples not residing in buffers, and one accepting ones in buffers. Previously one function was used for both, but that was a bit awkward already, and splitting will allow us to represent slot types for tuples in buffers and normal memory separately. This is split out from the patch introducing abstract slots, as this largely consists out of mechanical changes. Author: Ashutosh Bapat Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
* Change TupleTableSlot->tts_nvalid to type AttrNumber.Andres Freund2018-09-25
| | | | | | | | | | Previously it was an int / 4 bytes. The maximum number of attributes in a tuple is restricted by the maximum value Var->varattno, which is an AttrNumber/int16. Hence use the same data type for TupleTableSlot->tts_nvalid. Author: Ashutosh Bapat Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
* Collect JIT instrumentation from workers.Andres Freund2018-09-25
| | | | | | | | | | | | | | | | Previously, when using parallel query, EXPLAIN (ANALYZE)'s JIT compilation timings did not include the overhead from doing so on the workers. Fix that. We do so by simply aggregating the cost of doing JIT compilation on workers and the leader together. Arguably that's not quite accurate, because the total time spend doing so is spent in parallel - but it's hard to do much better. For additional detail, when VERBOSE is specified, the stats for workers are displayed separately. Author: Amit Khandekar and Andres Freund Discussion: https://postgr.es/m/CAJ3gD9eLrz51RK_gTkod+71iDcjpB_N8eC6vU2AW-VicsAERpQ@mail.gmail.com Backpatch: 11-
* Fix failure with initplans used conditionally during EvalPlanQual rechecks.Tom Lane2018-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The EvalPlanQual machinery assumes that any initplans (that is, uncorrelated sub-selects) used during an EPQ recheck would have already been evaluated during the main query; this is implicit in the fact that execPlan pointers are not copied into the EPQ estate's es_param_exec_vals. But it's possible for that assumption to fail, if the initplan is only reached conditionally. For example, a sub-select inside a CASE expression could be reached during a recheck when it had not been previously, if the CASE test depends on a column that was just updated. This bug is old, appearing to date back to my rewrite of EvalPlanQual in commit 9f2ee8f28, but was not detected until Kyle Samson reported a case. To fix, force all not-yet-evaluated initplans used within the EPQ plan subtree to be evaluated at the start of the recheck, before entering the EPQ environment. This could be inefficient, if such an initplan is expensive and goes unused again during the recheck --- but that's piling one layer of improbability atop another. It doesn't seem worth adding more complexity to prevent that, at least not in the back branches. It was convenient to use the new-in-v11 ExecEvalParamExecParams function to implement this, but I didn't like either its name or the specifics of its API, so revise that. Back-patch all the way. Rather than rewrite the patch to avoid depending on bms_next_member() in the oldest branches, I chose to back-patch that function into 9.4 and 9.3. (This isn't the first time back-patches have needed that, and it exhausted my patience.) I also chose to back-patch some test cases added by commits 71404af2a and 342a1ffa2 into 9.4 and 9.3, so that the 9.x versions of eval-plan-qual.spec are all the same. Andrew Gierth diagnosed the problem and contributed the added test cases, though the actual code changes are by me. Discussion: https://postgr.es/m/A033A40A-B234-4324-BE37-272279F7B627@tripadvisor.com