aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor/execUtils.c
Commit message (Collapse)AuthorAge
* Revert "Don't lock partitions pruned by initial pruning"Amit Langote2025-05-22
| | | | | | | | | | | | | | | | As pointed out by Tom Lane, the patch introduced fragile and invasive design around plan invalidation handling when locking of prunable partitions was deferred from plancache.c to the executor. In particular, it violated assumptions about CachedPlan immutability and altered executor APIs in ways that are difficult to justify given the added complexity and overhead. This also removes the firstResultRels field added to PlannedStmt in commit 28317de72, which was intended to support deferred locking of certain ModifyTable result relations. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/605328.1747710381@sss.pgh.pa.us
* HashAgg: use Bump allocator for hash TupleHashTable entries.Jeff Davis2025-03-24
| | | | | | | | | | The entries aren't freed until the entire hash table is destroyed, so use the Bump allocator to improve allocation speed, avoid wasting space on the chunk header, and avoid wasting space due to the power-of-two allocations. Discussion: https://postgr.es/m/CAApHDvqv1aNB4cM36FzRwivXrEvBO_LsG_eQ3nqDXTjECaatOQ@mail.gmail.com Reviewed-by: David Rowley
* Ensure first ModifyTable rel initialized if all are prunedAmit Langote2025-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit cbc127917e introduced tracking of unpruned relids to avoid processing pruned relations, and changed ExecInitModifyTable() to initialize only unpruned result relations. As a result, MERGE statements that prune all target partitions can now lead to crashes or incorrect behavior during execution. The crash occurs because some executor code paths rely on ModifyTableState.resultRelInfo[0] being present and initialized, even when no result relations remain after pruning. For example, ExecMerge() and ExecMergeNotMatched() use the first resultRelInfo to determine the appropriate action. Similarly, ExecInitPartitionInfo() assumes that at least one result relation exists. To preserve these assumptions, ExecInitModifyTable() now includes the first result relation in the initialized result relation list if all result relations for that ModifyTable were pruned. To enable that, ExecDoInitialPruning() ensures the first relation is locked if it was pruned and locking is necessary. To support this exception to the pruning logic, PlannedStmt now includes a list of RT indexes identifying the first result relation of each ModifyTable node in the plan. This allows ExecDoInitialPruning() to check whether each such relation was pruned and, if so, lock it if necessary. Bug: #18830 Reported-by: Robins Tharakan <tharakan@gmail.com> Diagnozed-by: Tender Wang <tndrwang@gmail.com> Diagnozed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/18830-1f31ea1dc930d444%40postgresql.org
* Don't lock partitions pruned by initial pruningAmit Langote2025-02-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before executing a cached generic plan, AcquireExecutorLocks() in plancache.c locks all relations in a plan's range table to ensure the plan is safe for execution. However, this locks runtime-prunable relations that will later be pruned during "initial" runtime pruning, introducing unnecessary overhead. This commit defers locking for such relations to executor startup and ensures that if the CachedPlan is invalidated due to concurrent DDL during this window, replanning is triggered. Deferring these locks avoids unnecessary locking overhead for pruned partitions, resulting in significant speedup, particularly when many partitions are pruned during initial runtime pruning. * Changes to locking when executing generic plans: AcquireExecutorLocks() now locks only unprunable relations, that is, those found in PlannedStmt.unprunableRelids (introduced in commit cbc127917e), to avoid locking runtime-prunable partitions unnecessarily. The remaining locks are taken by ExecDoInitialPruning(), which acquires them only for partitions that survive pruning. This deferral does not affect the locks required for permission checking in InitPlan(), which takes place before initial pruning. ExecCheckPermissions() now includes an Assert to verify that all relations undergoing permission checks, none of which can be in the set of runtime-prunable relations, are properly locked. * Plan invalidation handling: Deferring locks introduces a window where prunable relations may be altered by concurrent DDL, invalidating the plan. A new function, ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle invalidation caused by deferred locking. If invalidation occurs, ExecutorStartCachedPlan() updates CachedPlan using the new UpdateCachedPlan() function and retries execution with the updated plan. To ensure all code paths that may be affected by this handle invalidation properly, all callers of ExecutorStart that may execute a PlannedStmt from a CachedPlan have been updated to use ExecutorStartCachedPlan() instead. UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new CachedPlan.stmt_context, created as a child of CachedPlan.context, allows freeing old PlannedStmts while preserving the CachedPlan structure and its statement list. This ensures that loops over statements in upstream callers of ExecutorStartCachedPlan() remain intact. ExecutorStart() and ExecutorStart_hook implementations now return a boolean value indicating whether plan initialization succeeded with a valid PlanState tree in QueryDesc.planstate, or false otherwise, in which case QueryDesc.planstate is NULL. Hook implementations are required to call standard_ExecutorStart() at the beginning, and if it returns false, they should do the same without proceeding. * Testing: To verify these changes, the delay_execution module tests scenarios where cached plans become invalid due to changes in prunable relations after deferred locks. * Note to extension authors: ExecutorStart_hook implementations must verify plan validity after calling standard_ExecutorStart(), as explained earlier. For example: if (prev_ExecutorStart) plan_valid = prev_ExecutorStart(queryDesc, eflags); else plan_valid = standard_ExecutorStart(queryDesc, eflags); if (!plan_valid) return false; <extension-code> return true; Extensions accessing child relations, especially prunable partitions, via ExecGetRangeTableRelation() must now ensure their RT indexes are present in es_unpruned_relids (introduced in commit cbc127917e), or they will encounter an error. This is a strict requirement after this change, as only relations in that set are locked. The idea of deferring some locks to executor startup, allowing locks for prunable partitions to be skipped, was first proposed by Tom Lane. Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions) Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
* Virtual generated columnsPeter Eisentraut2025-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new variant of generated columns that are computed on read (like a view, unlike the existing stored generated columns, which are computed on write, like a materialized view). The syntax for the column definition is ... GENERATED ALWAYS AS (...) VIRTUAL and VIRTUAL is also optional. VIRTUAL is the default rather than STORED to match various other SQL products. (The SQL standard makes no specification about this, but it also doesn't know about VIRTUAL or STORED.) (Also, virtual views are the default, rather than materialized views.) Virtual generated columns are stored in tuples as null values. (A very early version of this patch had the ambition to not store them at all. But so much stuff breaks or gets confused if you have tuples where a column in the middle is completely missing. This is a compromise, and it still saves space over being forced to use stored generated columns. If we ever find a way to improve this, a bit of pg_upgrade cleverness could allow for upgrades to a newer scheme.) The capabilities and restrictions of virtual generated columns are mostly the same as for stored generated columns. In some cases, this patch keeps virtual generated columns more restricted than they might technically need to be, to keep the two kinds consistent. Some of that could maybe be relaxed later after separate careful considerations. Some functionality that is currently not supported, but could possibly be added as incremental features, some easier than others: - index on or using a virtual column - hence also no unique constraints on virtual columns - extended statistics on virtual columns - foreign-key constraints on virtual columns - not-null constraints on virtual columns (check constraints are supported) - ALTER TABLE / DROP EXPRESSION - virtual column cannot have domain type - virtual columns are not supported in logical replication The tests in generated_virtual.sql have been copied over from generated_stored.sql with the keyword replaced. This way we can make sure the behavior is mostly aligned, and the differences can be visible. Some tests for currently not supported features are currently commented out. Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Tested-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org
* Track unpruned relids to avoid processing pruned relationsAmit Langote2025-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit introduces changes to track unpruned relations explicitly, making it possible for top-level plan nodes, such as ModifyTable and LockRows, to avoid processing partitions pruned during initial pruning. Scan-level nodes, such as Append and MergeAppend, already avoid the unnecessary processing by accessing partition pruning results directly via part_prune_index. In contrast, top-level nodes cannot access pruning results directly and need to determine which partitions remain unpruned. To address this, this commit introduces a new bitmapset field, es_unpruned_relids, which the executor uses to track the set of unpruned relations. This field is referenced during plan initialization to skip initializing certain nodes for pruned partitions. It is initialized with PlannedStmt.unprunableRelids, a new field that the planner populates with RT indexes of relations that cannot be pruned during runtime pruning. These include relations not subject to partition pruning and those required for execution regardless of pruning. PlannedStmt.unprunableRelids is computed during set_plan_refs() by removing the RT indexes of runtime-prunable relations, identified from PartitionPruneInfos, from the full set of relation RT indexes. ExecDoInitialPruning() then updates es_unpruned_relids by adding partitions that survive initial pruning. To support this, PartitionedRelPruneInfo and PartitionedRelPruningData now include a leafpart_rti_map[] array that maps partition indexes to their corresponding RT indexes. The former is used in set_plan_refs() when constructing unprunableRelids, while the latter is used in ExecDoInitialPruning() to convert partition indexes returned by get_matching_partitions() into RT indexes, which are then added to es_unpruned_relids. These changes make it possible for ModifyTable and LockRows nodes to process only relations that remain unpruned after initial pruning. ExecInitModifyTable() trims lists, such as resultRelations, withCheckOptionLists, returningLists, and updateColnosLists, to consider only unpruned partitions. It also creates ResultRelInfo structs only for these partitions. Similarly, child RowMarks for pruned relations are skipped. By avoiding unnecessary initialization of structures for pruned partitions, these changes improve the performance of updates and deletes on partitioned tables during initial runtime pruning. Due to ExecInitModifyTable() changes as described above, EXPLAIN on a plan for UPDATE and DELETE that uses runtime initial pruning no longer lists partitions pruned during initial pruning. Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
* Get rid of our dependency on type "long" for memory size calculations.Tom Lane2025-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | Consistently use "Size" (or size_t, or in some places int64 or double) as the type for variables holding memory allocation sizes. In most places variables' data types were fine already, but we had an ancient habit of computing bytes from kilobytes-units GUCs with code like "work_mem * 1024L". That risks overflow on Win64 where they did not make "long" as wide as "size_t". We worked around that by restricting such GUCs' ranges, so you couldn't set work_mem et al higher than 2GB on Win64. This patch removes that restriction, after replacing such calculations with "work_mem * (Size) 1024" or variants of that. It should be noted that this patch was constructed by searching outwards from the GUCs that have MAX_KILOBYTES as upper limit. So I can't positively guarantee there are no other places doing memory-size arithmetic in int or long variables. I do however feel pretty confident that increasing MAX_KILOBYTES on Win64 is safe now. Also, nothing in our code should be dealing in multiple-gigabyte allocations without authorization from a relevant GUC, so it seems pretty likely that this search caught everything that could be at risk of overflow. Author: Vladlen Popolitov <v.popolitov@postgrespro.ru> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1a01f0-66ec2d80-3b-68487680@27595217
* Move PartitionPruneInfo out of plan nodes into PlannedStmtAmit Langote2025-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves PartitionPruneInfo from plan nodes to PlannedStmt, simplifying traversal by centralizing all PartitionPruneInfo structures in a single list in it, which holds all instances for the main query and its subqueries. Instead of plan nodes (Append or MergeAppend) storing PartitionPruneInfo pointers, they now reference an index in this list. A bitmapset field is added to PartitionPruneInfo to store the RT indexes corresponding to the apprelids field in Append or MergeAppend. This allows execution pruning logic to verify that it operates on the correct plan node, mainly to facilitate debugging. Duplicated code in set_append_references() and set_mergeappend_references() is refactored into a new function, register_pruneinfo(). This updates RT indexes by applying rtoffet and adds PartitionPruneInfo to the global list in PlannerGlobal. By allowing pruning to be performed without traversing the plan tree, this change lays the groundwork for runtime initial pruning to occur independently of plan tree initialization. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> (earlier version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
* Add OLD/NEW support to RETURNING in DML queries.Dean Rasheed2025-01-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows the RETURNING list of INSERT/UPDATE/DELETE/MERGE queries to explicitly return old and new values by using the special aliases "old" and "new", which are automatically added to the query (if not already defined) while parsing its RETURNING list, allowing things like: RETURNING old.colname, new.colname, ... RETURNING old.*, new.* Additionally, a new syntax is supported, allowing the names "old" and "new" to be changed to user-supplied alias names, e.g.: RETURNING WITH (OLD AS o, NEW AS n) o.colname, n.colname, ... This is useful when the names "old" and "new" are already defined, such as inside trigger functions, allowing backwards compatibility to be maintained -- the interpretation of any existing queries that happen to already refer to relations called "old" or "new", or use those as aliases for other relations, is not changed. For an INSERT, old values will generally be NULL, and for a DELETE, new values will generally be NULL, but that may change for an INSERT with an ON CONFLICT ... DO UPDATE clause, or if a query rewrite rule changes the command type. Therefore, we put no restrictions on the use of old and new in any DML queries. Dean Rasheed, reviewed by Jian He and Jeff Davis. Discussion: https://postgr.es/m/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Convert SetOp to read its inputs as outerPlan and innerPlan.Tom Lane2024-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original design for set operations involved appending the two input relations into one and adding a flag column that allows distinguishing which side each row came from. Then the SetOp node pries them apart again based on the flag. This is bizarre. The only apparent reason to do it is that when sorting, we'd only need one Sort node not two. But since sorting is at least O(N log N), sorting all the data is actually worse than sorting each side separately --- plus, we have no chance of taking advantage of presorted input. On top of that, adding the flag column frequently requires an additional projection step that adds cycles, and then the Append node isn't free either. Let's get rid of all of that and make the SetOp node have two separate children, using the existing outerPlan/innerPlan infrastructure. This initial patch re-implements nodeSetop.c and does a bare minimum of work on the planner side to generate correctly-shaped plans. In particular, I've tried not to change the cost estimates here, so that the visible changes in the regression test results will only involve removal of useless projection steps and not any changes in whether to use sorted vs hashed mode. For SORTED mode, we combine successive identical tuples from each input into groups, and then merge-join the groups. The tuple comparisons now use SortSupport instead of simple equality, but the group-formation part should involve roughly the same number of tuple comparisons as before. The cross-comparisons between left and right groups probably add to that, but I'm not sure to quantify how many more comparisons we might need. For HASHED mode, nodeSetop's logic is almost the same as before, just refactored into two separate loops instead of one loop that has an assumption that it will see all the left-hand inputs first. In both modes, I added early-exit logic to not bother reading the right-hand relation if the left-hand input is empty, since neither INTERSECT nor EXCEPT modes can produce any output if the left input is empty. This could have been done before in the hashed mode, but not in sorted mode. Sorted mode can also stop as soon as it exhausts the left input; any remaining right-hand tuples cannot have matches. Also, this patch adds some infrastructure for detecting whether child plan nodes all output the same type of tuple table slot. If they do, the hash table logic can use slightly more efficient code based on assuming that that's the input slot type it will see. We'll make use of that infrastructure in other plan node types later. Patch by me; thanks to Richard Guo and David Rowley for review. Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
* Remove unnecessary word in a commentAmit Langote2024-10-23
| | | | | | | | | Relations opened by the executor are only closed once in ExecCloseRangeTableRelations(), so the word "again" in the comment for ExecGetRangeTableRelation() is misleading and unnecessary. Discussion: https://postgr.es/m/CA+HiwqHnw-zR+u060i3jp4ky5UR0CjByRFQz50oZ05de7wUg=Q@mail.gmail.com Backpatch-through: 12
* Introduce two fields in EState to track parallel worker activityMichael Paquier2024-10-09
| | | | | | | | | | | | | | | These fields can be set by executor nodes to record how many parallel workers were planned to be launched and how many of them have been actually launched within the number initially planned. This data is able to give an approximation of the parallel worker draught a system is facing, making easier the tuning of related configuration parameters. These fields will be used by some follow-up patches to populate other parts of the system with their data. Author: Guillaume Lelarge, Benoit Lobréau Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com Discussion: https://postgr.es/m/CAECtzeWtTGOK0UgKXdDGpfTVSa5bd_VbUt6K6xn8P7X+_dZqKw@mail.gmail.com
* Remove unused #include's from backend .c filesPeter Eisentraut2024-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Remove obsolete executor cleanup codeAmit Langote2023-09-28
| | | | | | | | | | | | | | | | | | | This commit removes unnecessary ExecExprFreeContext() calls in ExecEnd* routines because the actual cleanup is managed by FreeExecutorState(). With no callers remaining for ExecExprFreeContext(), this commit also removes the function. This commit also drops redundant ExecClearTuple() calls, because ExecResetTupleTable() in ExecEndPlan() already takes care of resetting and dropping all TupleTableSlots initialized with ExecInitScanTupleSlot() and ExecInitExtraTupleSlot(). After these modifications, the ExecEnd*() routines for ValuesScan, NamedTuplestoreScan, and WorkTableScan became redundant. So, this commit removes them. Reviewed-by: Robert Haas Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
* Pre-beta2 mechanical code beautification.Tom Lane2023-06-20
| | | | | | | | | Run pgindent and pgperltidy. It seems we're still some ways away from all committers doing this automatically. Now that we have a buildfarm animal that will whine about poorly-indented code, we'll try to keep the tree more tidy. Discussion: https://postgr.es/m/3156045.1687208823@sss.pgh.pa.us
* Use per-tuple context in ExecGetAllUpdatedColsTomas Vondra2023-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit fc22b6623b (generated columns) replaced ExecGetUpdatedCols() with ExecGetAllUpdatedCols() in a couple places handling UPDATE (triggers and lock mode). However, ExecGetUpdatedCols() did exec_rt_fetch() while ExecGetAllUpdatedCols() also allocates memory through bms_union() without paying attention to the memory context and happened to use the long-lived ExecutorState, leaking the memory until the end of the query. The amount of leaked memory is proportional to the number of (updated) attributes, types of UPDATE triggers, and the number of processed rows (which for UPDATE ... FROM ... may be much higher than updated rows). Fixed by switching to the per-tuple context in GetAllUpdatedColumns(). This is fine for all in-core callers, but external callers may need to copy the result. But we're not aware of any such callers. Note the issue was introduced by fc22b6623b, but the macros were later renamed by f50e888990. Backpatch to 12, where the issue was introduced. Reported-by: Tomas Vondra Reviewed-by: Andres Freund, Tom Lane, Jakub Wartak Backpatch-through: 12 Discussion: https://postgr.es/m/222a3442-7f7d-246c-ed9b-a76209d19239@enterprisedb.com
* Revert "Move PartitionPruneInfo out of plan nodes into PlannedStmt"Alvaro Herrera2023-05-04
| | | | | | | | | | | | | This reverts commit ec386948948c and its fixup 589bb816499e. This change was intended to support query planning avoiding acquisition of locks on partitions that were going to be pruned; however, the overall project took a different direction at [1] and this bit is no longer needed. Put things back the way they were as agreed in [2], to avoid unnecessary complexity. Discussion: [1] https://postgr.es/m/4191508.1674157166@sss.pgh.pa.us Discussion: [2] https://postgr.es/m/20230502175409.kcoirxczpdha26wt@alvherre.pgsql
* Fix various typosDavid Rowley2023-04-18
| | | | | | | | | | | | This fixes many spelling mistakes in comments, but a few references to invalid parameter names, function names and option names too in comments and also some in string constants Also, fix an #undef that was undefining the incorrect definition Author: Alexander Lakhin Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/d5f68d19-c0fc-91a9-118d-7c6a5a3f5fad@gmail.com
* Fix row tracking in pg_stat_statements with extended query protocolMichael Paquier2023-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pg_stat_statements relies on EState->es_processed to count the number of rows processed by ExecutorRun(). This proves to be a problem under the extended query protocol when the result of a query is fetched through more than one call of ExecutorRun(), as es_processed is reset each time ExecutorRun() is called. This causes pg_stat_statements to report the number of rows calculated in the last execute fetch, rather than the global sum of all the rows processed. As pquery.c tells, this is a problem when a portal does not use holdStore. For example, DMLs with RETURNING would report a correct tuple count as these do one execution cycle when the query is first executed to fill in the portal's store with one ExecutorRun(), feeding on the portal's store for each follow-up execute fetch depending on the fetch size requested by the client. The fix proposed for this issue is simple with the addition of an extra counter in EState that's preserved across multiple ExecutorRun() calls, incremented with the value calculated in es_processed. This approach is not back-patchable, unfortunately. Note that libpq does not currently give any way to control the fetch size when using the extended v3 protocol, meaning that in-core testing is not possible yet. This issue can be easily verified with the JDBC driver, though, with *autocommit disabled*. Hence, having in-core tests requires more features, left for future discussion: - At least two new libpq routines splitting PQsendQueryGuts(), one for the bind/describe and a second for a series of execute fetches with a custom fetch size, likely in a fashion similar to what JDBC does. - A psql meta-command for the execute phase. This part is not strictly mandatory, still it could be handy. Reported-by: Andrew Dunstan (original discovery by Simon Siggs) Author: Sami Imseih Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/EBE6C507-9EB6-4142-9E4D-38B1673363A7@amazon.com Discussion: https://postgr.es/m/c90890e7-9c89-c34f-d3c5-d5c763a34bd8@dunslane.net
* Fix some more cases of missed GENERATED-column updates.Tom Lane2023-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If UPDATE is forced to retry after an EvalPlanQual check, it neglected to repeat GENERATED-column computations, even though those might well have changed since we're dealing with a different tuple than before. Fixing this is mostly a matter of looping back a bit further when we retry. In v15 and HEAD that's most easily done by altering the API of ExecUpdateAct so that it includes computing GENERATED expressions. Also, if an UPDATE in a partitioned table turns into a cross-partition INSERT operation, we failed to recompute GENERATED columns. That's a bug since 8bf6ec3ba allowed partitions to have different generation expressions; although it seems to have no ill effects before that. Fixing this is messier because we can now have situations where the same query needs both the UPDATE-aligned set of GENERATED columns and the INSERT-aligned set, and it's unclear which set will be generated first (else we could hack things by forcing the INSERT-aligned set to be generated, which is indeed how fe9e658f4 made it work for MERGE). The best fix seems to be to build and store separate sets of expressions for the INSERT and UPDATE cases. That would create ABI issues in the back branches, but so far it seems we can leave this alone in the back branches. Per bug #17823 from Hisahiro Kauchi. The first part of this affects all branches back to v12 where GENERATED columns were added. Discussion: https://postgr.es/m/17823-b64909cf7d63de84@postgresql.org
* Fill EState.es_rteperminfos more systematically.Tom Lane2023-03-06
| | | | | | | | | | | | | | | | | | While testing a fix for bug #17823, I discovered that EvalPlanQualStart failed to copy es_rteperminfos from the parent EState, resulting in failure if anything in EPQ execution wanted to consult that information. This led me to conclude that commit a61b1f748 had been too haphazard about where to fill es_rteperminfos, and that we need to be sure that that happens exactly where es_range_table gets filled. So I changed the signature of ExecInitRangeTable to help ensure that this new requirement doesn't get missed. (Indeed, pgoutput.c was also failing to fill it. Maybe we don't ever need it there, but I wouldn't bet on that.) No test case yet; one will arrive with the fix for #17823. But that needs to be back-patched, while this fix is HEAD-only. Discussion: https://postgr.es/m/17823-b64909cf7d63de84@postgresql.org
* Remove local optimizations of empty Bitmapsets into null pointers.Tom Lane2023-03-02
| | | | | | | | These are all dead code now that it's done centrally. Patch by me; thanks to Nathan Bossart and Richard Guo for review. Discussion: https://postgr.es/m/1159933.1677621588@sss.pgh.pa.us
* Make new GENERATED-expressions code more bulletproof.Tom Lane2023-01-15
| | | | | | | | | | | | | | | | | In commit 8bf6ec3ba I assumed that no code path could reach ExecGetExtraUpdatedCols without having gone through ExecInitStoredGenerated. That turns out not to be the case in logical replication: if there's an ON UPDATE trigger on the target table, trigger.c will call this code before anybody has set up its generated columns. Having seen that, I don't have a lot of faith in there not being other such paths. ExecGetExtraUpdatedCols can call ExecInitStoredGenerated for itself, as long as we are willing to assume that it is only called in CMD_UPDATE operations, which on the whole seems like a safer leap of faith. Per report from Vitaly Davydov. Discussion: https://postgr.es/m/d259d69652b8c2ff50e14cda3c236c7f@postgrespro.ru
* Fix calculation of which GENERATED columns need to be updated.Tom Lane2023-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were identifying the updatable generated columns of inheritance children by transposing the calculation made for their parent. However, there's nothing that says a traditional-inheritance child can't have generated columns that aren't there in its parent, or that have different dependencies than are in the parent's expression. (At present it seems that we don't enforce that for partitioning either, which is likely wrong to some degree or other; but the case clearly needs to be handled with traditional inheritance.) Hence, drop the very-klugy-anyway "extraUpdatedCols" RTE field in favor of identifying which generated columns depend on updated columns during executor startup. In HEAD we can remove extraUpdatedCols altogether; in back branches, it's still there but always empty. Another difference between the HEAD and back-branch versions of this patch is that in HEAD we can add the new bitmap field to ResultRelInfo, but that would cause an ABI break in back branches. Like 4b3e37993, add a List field at the end of struct EState instead. Back-patch to v13. The bogus calculation is also being made in v12, but it doesn't have the same visible effect because we don't use it to decide which generated columns to recalculate; as a consequence of which the patch doesn't apply easily. I think that there might still be a demonstrable bug associated with trigger firing conditions, but that's such a weird corner-case usage that I'm content to leave it unfixed in v12. Amit Langote and Tom Lane Discussion: https://postgr.es/m/CA+HiwqFshLKNvQUd1DgwJ-7tsTp=dwv7KZqXC4j2wYBV1aCDUA@mail.gmail.com Discussion: https://postgr.es/m/2793383.1672944799@sss.pgh.pa.us
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Remove new structure member from ResultRelInfo.Etsuro Fujita2022-12-08
| | | | | | | | | | | | | | | In commit ffbb7e65a, I added a ModifyTableState member to ResultRelInfo to save the owning ModifyTableState for use by nodeModifyTable.c when performing batch inserts, but as pointed out by Tom Lane, that changed the array stride of es_result_relations, and that would break any previously-compiled extension code that accesses that array. Fix by removing that member from ResultRelInfo and instead adding a List member at the end of EState to save such ModifyTableStates. Per report from Tom Lane. Back-patch to v14, like the previous commit; I chose to apply the patch to HEAD as well, to make back-patching easy. Discussion: http://postgr.es/m/4065383.1669395453%40sss.pgh.pa.us
* Rework query relation permission checkingAlvaro Herrera2022-12-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, information about the permissions to be checked on relations mentioned in a query is stored in their range table entries. So the executor must scan the entire range table looking for relations that need to have permissions checked. This can make the permission checking part of the executor initialization needlessly expensive when many inheritance children are present in the range range. While the permissions need not be checked on the individual child relations, the executor still must visit every range table entry to filter them out. This commit moves the permission checking information out of the range table entries into a new plan node called RTEPermissionInfo. Every top-level (inheritance "root") RTE_RELATION entry in the range table gets one and a list of those is maintained alongside the range table. This new list is initialized by the parser when initializing the range table. The rewriter can add more entries to it as rules/views are expanded. Finally, the planner combines the lists of the individual subqueries into one flat list that is passed to the executor for checking. To make it quick to find the RTEPermissionInfo entry belonging to a given relation, RangeTblEntry gets a new Index field 'perminfoindex' that stores the corresponding RTEPermissionInfo's index in the query's list of the latter. ExecutorCheckPerms_hook has gained another List * argument; the signature is now: typedef bool (*ExecutorCheckPerms_hook_type) (List *rangeTable, List *rtePermInfos, bool ereport_on_violation); The first argument is no longer used by any in-core uses of the hook, but we leave it in place because there may be other implementations that do. Implementations should likely scan the rtePermInfos list to determine which operations to allow or deny. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
* Generalize ri_RootToPartitionMap to use for non-partition childrenAlvaro Herrera2022-12-02
| | | | | | | | | | | | | | | | | | | | | | | ri_RootToPartitionMap is currently only initialized for tuple routing target partitions, though a future commit will need the ability to use it even for the non-partition child tables, so make adjustments to the decouple it from the partitioning code. Also, make it lazily initialized via ExecGetRootToChildMap(), making that function its preferred access path. Existing third-party code accessing it directly should no longer do so; consequently, it's been renamed to ri_RootToChildMap, which also makes it consistent with ri_ChildToRootMap. ExecGetRootToChildMap() houses the logic of setting the map appropriately depending on whether a given child relation is partition or not. To support this, also add a separate entry point for TupleConversionMap creation that receives an AttrMap. No new code here, just split an existing function in two. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqEYUhDXSK5BTvG_xk=eaAEJCD4GS3C6uH7ybBvv+Z_Tmg@mail.gmail.com
* Move PartitioPruneInfo out of plan nodes into PlannedStmtAlvaro Herrera2022-12-01
| | | | | | | | | | | | | | | | | | | | The planner will now add a given PartitioPruneInfo to PlannedStmt.partPruneInfos instead of directly to the Append/MergeAppend plan node. What gets set instead in the latter is an index field which points to the list element of PlannedStmt.partPruneInfos containing the PartitioPruneInfo belonging to the plan node. A later commit will make AcquireExecutorLocks() do the initial partition pruning to determine a minimal set of partitions to be locked when validating a plan tree and it will need to consult the PartitioPruneInfos referenced therein to do so. It would be better for the PartitioPruneInfos to be accessible directly than requiring a walk of the plan tree to find them, which is easier when it can be done by simply iterating over PlannedStmt.partPruneInfos. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
* Fix handling of pending inserts in nodeModifyTable.c.Etsuro Fujita2022-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b663a4136, which allowed FDWs to INSERT rows in bulk, added to nodeModifyTable.c code to flush pending inserts to the foreign-table result relation(s) before completing processing of the ModifyTable node, but the code failed to take into account the case where the INSERT query has modifying CTEs, leading to incorrect results. Also, that commit failed to flush pending inserts before firing BEFORE ROW triggers so that rows are visible to such triggers. In that commit we scanned through EState's es_tuple_routing_result_relations or es_opened_result_relations list to find the foreign-table result relations to which pending inserts are flushed, but that would be inefficient in some cases. So to fix, 1) add a List member to EState to record the insert-pending result relations, and 2) modify nodeModifyTable.c so that it adds the foreign-table result relation to the list in ExecInsert() if appropriate, and flushes pending inserts properly using the list where needed. While here, fix a copy-and-pasteo in a comment in ExecBatchInsert(), which was added by that commit. Back-patch to v14 where that commit appeared. Discussion: https://postgr.es/m/CAPmGK16qutyCmyJJzgQOhfBq%3DNoGDqTB6O0QBZTihrbqre%2BoxA%40mail.gmail.com
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Remove arbitrary 64K-or-so limit on rangetable size.Tom Lane2021-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now the size of a query's rangetable has been limited by the constants INNER_VAR et al, which mustn't be equal to any real rangetable index. 65000 doubtless seemed like enough for anybody, and it still is orders of magnitude larger than the number of joins we can realistically handle. However, we need a rangetable entry for each child partition that is (or might be) processed by a query. Queries with a few thousand partitions are getting more realistic, so that the day when that limit becomes a problem is in sight, even if it's not here yet. Hence, let's raise the limit. Rather than just increase the values of INNER_VAR et al, this patch adopts the approach of making them small negative values, so that rangetables could theoretically become as long as INT_MAX. The bulk of the patch is concerned with changing Var.varno and some related variables from "Index" (unsigned int) to plain "int". This is basically cosmetic, with little actual effect other than to help debuggers print their values nicely. As such, I've only bothered with changing places that could actually see INNER_VAR et al, which the parser and most of the planner don't. We do have to be careful in places that are performing less/greater comparisons on varnos, but there are very few such places, other than the IS_SPECIAL_VARNO macro itself. A notable side effect of this patch is that while it used to be possible to add INNER_VAR et al to a Bitmapset, that will now draw an error. I don't see any likelihood that it wouldn't be a bug to include these fake varnos in a bitmapset of real varnos, so I think this is all to the good. Although this touches outfuncs/readfuncs, I don't think a catversion bump is required, since stored rules would never contain Vars with these fake varnos. Andrey Lepikhov and Tom Lane, after a suggestion by Peter Eisentraut Discussion: https://postgr.es/m/43c7f2f5-1e27-27aa-8c65-c91859d15190@postgrespro.ru
* Postpone some stuff out of ExecInitModifyTable.Tom Lane2021-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arrange to do some things on-demand, rather than immediately during executor startup, because there's a fair chance of never having to do them at all: * Don't open result relations' indexes until needed. * Don't initialize partition tuple routing, nor the child-to-root tuple conversion map, until needed. This wins in UPDATEs on partitioned tables when only some of the partitions will actually receive updates; with larger partition counts the savings is quite noticeable. Also, we can remove some sketchy heuristics in ExecInitModifyTable about whether to set up tuple routing. Also, remove execPartition.c's private hash table tracking which partitions were already opened by the ModifyTable node. Instead use the hash added to ModifyTable itself by commit 86dc90056. To allow lazy computation of the conversion maps, we now set ri_RootResultRelInfo in all child ResultRelInfos. We formerly set it only in some, not terribly well-defined, cases. This has user-visible side effects in that now more error messages refer to the root relation instead of some partition (and provide error data in the root's column order, too). It looks to me like this is a strict improvement in consistency, so I don't have a problem with the output changes visible in this commit. Extracted from a larger patch, which seemed to me to be too messy to push in one commit. Amit Langote, reviewed at different times by Heikki Linnakangas and myself Discussion: https://postgr.es/m/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com
* Make ExecGetInsertedCols() and friends more robust and improve comments.Heikki Linnakangas2021-02-15
| | | | | | | | | | | | | | | | | | If ExecGetInsertedCols(), ExecGetUpdatedCols() or ExecGetExtraUpdatedCols() were called with a ResultRelInfo that's not in the range table and isn't a partition routing target, the functions would dereference a NULL pointer, relinfo->ri_RootResultRelInfo. Such ResultRelInfos are created when firing RI triggers in tables that are not modified directly. None of the current callers of these functions pass such relations, so this isn't a live bug, but let's make them more robust. Also update comment in ResultRelInfo; after commit 6214e2b228, ri_RangeTableIndex is zero for ResultRelInfos created for partition tuple routing. Noted by Coverity. Backpatch down to v11, like commit 6214e2b228. Reviewed-by: Tom Lane, Amit Langote
* Fix permission checks on constraint violation errors on partitions.Heikki Linnakangas2021-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a cross-partition UPDATE violates a constraint on the target partition, and the columns in the new partition are in different physical order than in the parent, the error message can reveal columns that the user does not have SELECT permission on. A similar bug was fixed earlier in commit 804b6b6db4. The cause of the bug is that the callers of the ExecBuildSlotValueDescription() function got confused when constructing the list of modified columns. If the tuple was routed from a parent, we converted the tuple to the parent's format, but the list of modified columns was grabbed directly from the child's RTE entry. ExecUpdateLockMode() had a similar issue. That lead to confusion on which columns are key columns, leading to wrong tuple lock being taken on tables referenced by foreign keys, when a row is updated with INSERT ON CONFLICT UPDATE. A new isolation test is added for that corner case. With this patch, the ri_RangeTableIndex field is no longer set for partitions that don't have an entry in the range table. Previously, it was set to the RTE entry of the parent relation, but that was confusing. NOTE: This modifies the ResultRelInfo struct, replacing the ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to backpatch, because it breaks any extensions accessing the field. The change that ri_RangeTableIndex is not set for partitions could potentially break extensions, too. The ResultRelInfos are visible to FDWs at least, and this patch required small changes to postgres_fdw. Nevertheless, this seem like the least bad option. I don't think these fields widely used in extensions; I don't think there are FDWs out there that uses the FDW "direct update" API, other than postgres_fdw. If there is, you will get a compilation error, so hopefully it is caught quickly. Backpatch to 11, where support for both cross-partition UPDATEs, and unique indexes on partitioned tables, were added. Reviewed-by: Amit Langote Security: CVE-2021-3393
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Remove es_result_relation_info from EState.Heikki Linnakangas2020-10-14
| | | | | | | | | | | | | | Maintaining 'es_result_relation_info' correctly at all times has become cumbersome, especially with partitioning where each partition gets its own result relation info. Having to set and reset it across arbitrary operations has caused bugs in the past. This changes all the places that used 'es_result_relation_info', to receive the currently active ResultRelInfo via function parameters instead. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
* Create ResultRelInfos later in InitPlan, index them by RT index.Heikki Linnakangas2020-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of allocating all the ResultRelInfos upfront in one big array, allocate them in ExecInitModifyTable(). es_result_relations is now an array of ResultRelInfo pointers, rather than an array of structs, and it is indexed by the RT index. This simplifies things: we get rid of the separate concept of a "result rel index", and don't need to set it in setrefs.c anymore. This also allows follow-up optimizations (not included in this commit yet) to skip initializing ResultRelInfos for target relations that were not needed at runtime, and removal of the es_result_relation_info pointer. The EState arrays of regular result rels and root result rels are merged into one array. Similarly, the resultRelations and rootResultRelations lists in PlannedStmt are merged into one. It's not actually clear to me why they were kept separate in the first place, but now that the es_result_relations array is indexed by RT index, it certainly seems pointless. The PlannedStmt->resultRelations list is now only needed for ExecRelationIsTargetRelation(). One visible effect of this change is that ExecRelationIsTargetRelation() will now return 'true' also for the partition root, if a partitioned table is updated. That seems like a good thing, although the function isn't used in core code, and I don't see any reason for an FDW to call it on a partition root. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
* Initial pgindent and pgperltidy run for v13.Tom Lane2020-05-14
| | | | | | | | | | | Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit c9d297751, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.
* Create memory context for HashAgg with a reasonable maxBlockSize.Jeff Davis2020-04-07
| | | | | | | | | | | | | | | | If the memory context's maxBlockSize is too big, a single block allocation can suddenly exceed work_mem. For Hash Aggregation, this can mean spilling to disk too early or reporting a confusing memory usage number for EXPLAN ANALYZE. Introduce CreateWorkExprContext(), which is like CreateExprContext(), except that it creates the AllocSet with a maxBlockSize that is reasonable in proportion to work_mem. Right now, CreateWorkExprContext() is only used by Hash Aggregation, but it may be generally useful in the future. Discussion: https://postgr.es/m/412a3fbf306f84d8d78c4009e11791867e62b87c.camel@j-davis.com
* Go back to returning int from ereport auxiliary functions.Tom Lane2020-03-25
| | | | | | | | | | | | | | | | | | | This reverts the parts of commit 17a28b03645e27d73bf69a95d7569b61e58f06eb that changed ereport's auxiliary functions from returning dummy integer values to returning void. It turns out that a minority of compilers complain (not entirely unreasonably) about constructs such as (condition) ? errdetail(...) : 0 if errdetail() returns void rather than int. We could update those call sites to say "(void) 0" perhaps, but the expectation for this patch set was that ereport callers would not have to change anything. And this aspect of the patch set was already the most invasive and least compelling part of it, so let's just drop it. Per buildfarm. Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
* Improve the internal implementation of ereport().Tom Lane2020-03-24
| | | | | | | | | | | | | | | | | | | | | | Change all the auxiliary error-reporting routines to return void, now that we no longer need to pretend they are passing something useful to errfinish(). While this probably doesn't save anything significant at the machine-code level, it allows detection of some additional types of mistakes. Pass the error location details (__FILE__, __LINE__, PG_FUNCNAME_MACRO) to errfinish not errstart. This shaves a few cycles off the case where errstart decides we're not going to emit anything. Re-implement elog() as a trivial wrapper around ereport(), removing the separate support infrastructure it used to have. Aside from getting rid of some now-surplus code, this means that elog() now really does have exactly the same semantics as ereport(), in particular that it can skip evaluation work if the message is not to be emitted. Andres Freund and Tom Lane Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Reorder EPQ work, to fix rowmark related bugs and improve efficiency.Andres Freund2019-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ad0bda5d24ea I changed the EvalPlanQual machinery to store substitution tuples in slot, instead of using plain HeapTuples. The main motivation for that was that using HeapTuples will be inefficient for future tableams. But it turns out that that conversion was buggy for non-locking rowmarks - the wrong tuple descriptor was used to create the slot. As a secondary issue 5db6df0c0 changed ExecLockRows() to begin EPQ earlier, to allow to fetch the locked rows directly into the EPQ slots, instead of having to copy tuples around. Unfortunately, as Tom complained, that forces some expensive initialization to happen earlier. As a third issue, the test coverage for EPQ was clearly insufficient. Fixing the first issue is unfortunately not trivial: Non-locked row marks were fetched at the start of EPQ, and we don't have the type information for the rowmarks available at that point. While we could change that, it's not easy. It might be worthwhile to change that at some point, but to fix this bug, it seems better to delay fetching non-locking rowmarks when they're actually needed, rather than eagerly. They're referenced at most once, and in cases where EPQ fails, might never be referenced. Fetching them when needed also increases locality a bit. To be able to fetch rowmarks during execution, rather than initialization, we need to be able to access the active EPQState, as that contains necessary data. To do so move EPQ related data from EState to EPQState, and, only for EStates creates as part of EPQ, reference the associated EPQState from EState. To fix the second issue, change EPQ initialization to allow use of EvalPlanQualSlot() to be used before EvalPlanQualBegin() (but obviously still requiring EvalPlanQualInit() to have been done). As these changes made struct EState harder to understand, e.g. by adding multiple EStates, significantly reorder the members, and add a lot more comments. Also add a few more EPQ tests, including one that fails for the first issue above. More is needed. Reported-By: yi huang Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/CAHU7rYZo_C4ULsAx_LAj8az9zqgrD8WDd4hTegDTMM1LMqrBsg@mail.gmail.com https://postgr.es/m/24530.1562686693@sss.pgh.pa.us Backpatch: 12-, where the EPQ changes were introduced
* Remove EState.es_range_table_array.Tom Lane2019-08-12
| | | | | | | | | | Now that list_nth is O(1), there's no good reason to maintain a separate array of RTE pointers rather than indexing into estate->es_range_table. Deleting the array doesn't save all that much either; but just on cleanliness grounds, it's better not to have duplicate representations of the identical information. Discussion: https://postgr.es/m/14960.1565384592@sss.pgh.pa.us
* Represent Lists as expansible arrays, not chains of cons-cells.Tom Lane2019-07-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally, Postgres Lists were a more or less exact reimplementation of Lisp lists, which consist of chains of separately-allocated cons cells, each having a value and a next-cell link. We'd hacked that once before (commit d0b4399d8) to add a separate List header, but the data was still in cons cells. That makes some operations -- notably list_nth() -- O(N), and it's bulky because of the next-cell pointers and per-cell palloc overhead, and it's very cache-unfriendly if the cons cells end up scattered around rather than being adjacent. In this rewrite, we still have List headers, but the data is in a resizable array of values, with no next-cell links. Now we need at most two palloc's per List, and often only one, since we can allocate some values in the same palloc call as the List header. (Of course, extending an existing List may require repalloc's to enlarge the array. But this involves just O(log N) allocations not O(N).) Of course this is not without downsides. The key difficulty is that addition or deletion of a list entry may now cause other entries to move, which it did not before. For example, that breaks foreach() and sister macros, which historically used a pointer to the current cons-cell as loop state. We can repair those macros transparently by making their actual loop state be an integer list index; the exposed "ListCell *" pointer is no longer state carried across loop iterations, but is just a derived value. (In practice, modern compilers can optimize things back to having just one loop state value, at least for simple cases with inline loop bodies.) In principle, this is a semantics change for cases where the loop body inserts or deletes list entries ahead of the current loop index; but I found no such cases in the Postgres code. The change is not at all transparent for code that doesn't use foreach() but chases lists "by hand" using lnext(). The largest share of such code in the backend is in loops that were maintaining "prev" and "next" variables in addition to the current-cell pointer, in order to delete list cells efficiently using list_delete_cell(). However, we no longer need a previous-cell pointer to delete a list cell efficiently. Keeping a next-cell pointer doesn't work, as explained above, but we can improve matters by changing such code to use a regular foreach() loop and then using the new macro foreach_delete_current() to delete the current cell. (This macro knows how to update the associated foreach loop's state so that no cells will be missed in the traversal.) There remains a nontrivial risk of code assuming that a ListCell * pointer will remain good over an operation that could now move the list contents. To help catch such errors, list.c can be compiled with a new define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents whenever that could possibly happen. This makes list operations significantly more expensive so it's not normally turned on (though it is on by default if USE_VALGRIND is on). There are two notable API differences from the previous code: * lnext() now requires the List's header pointer in addition to the current cell's address. * list_delete_cell() no longer requires a previous-cell argument. These changes are somewhat unfortunate, but on the other hand code using either function needs inspection to see if it is assuming anything it shouldn't, so it's not all bad. Programmers should be aware of these significant performance changes: * list_nth() and related functions are now O(1); so there's no major access-speed difference between a list and an array. * Inserting or deleting a list element now takes time proportional to the distance to the end of the list, due to moving the array elements. (However, it typically *doesn't* require palloc or pfree, so except in long lists it's probably still faster than before.) Notably, lcons() used to be about the same cost as lappend(), but that's no longer true if the list is long. Code that uses lcons() and list_delete_first() to maintain a stack might usefully be rewritten to push and pop at the end of the list rather than the beginning. * There are now list_insert_nth...() and list_delete_nth...() functions that add or remove a list cell identified by index. These have the data-movement penalty explained above, but there's no search penalty. * list_concat() and variants now copy the second list's data into storage belonging to the first list, so there is no longer any sharing of cells between the input lists. The second argument is now declared "const List *" to reflect that it isn't changed. This patch just does the minimum needed to get the new implementation in place and fix bugs exposed by the regression tests. As suggested by the foregoing, there's a fair amount of followup work remaining to do. Also, the ENABLE_LIST_COMPAT macros are finally removed in this commit. Code using those should have been gone a dozen years ago. Patch by me; thanks to David Rowley, Jesper Pedersen, and others for review. Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
* Fix more typos and inconsistencies in the treeMichael Paquier2019-06-17
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
* Make queries' locking of indexes more consistent.Tom Lane2019-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The assertions added by commit b04aeb0a0 exposed that there are some code paths wherein the executor will try to open an index without holding any lock on it. We do have some lock on the index's table, so it seems likely that there's no fatal problem with this (for instance, the index couldn't get dropped from under us). Still, it's bad practice and we should fix it. To do so, remove the optimizations in ExecInitIndexScan and friends that tried to avoid taking a lock on an index belonging to a target relation, and just take the lock always. In non-bug cases, this will result in no additional shared-memory access, since we'll find in the local lock table that we already have a lock of the desired type; hence, no significant performance degradation should occur. Also, adjust the planner and executor so that the type of lock taken on an index is always identical to the type of lock taken for its table, by relying on the recently added RangeTblEntry.rellockmode field. This avoids some corner cases where that might not have been true before (possibly resulting in extra locking overhead), and prevents future maintenance issues from having multiple bits of logic that all needed to be in sync. In addition, this change removes all core calls to ExecRelationIsTargetRelation, which avoids a possible O(N^2) startup penalty for queries with large numbers of target relations. (We'd probably remove that function altogether, were it not that we advertise it as something that FDWs might want to use.) Also adjust some places in selfuncs.c to not take any lock on indexes they are transiently opening, since we can assume that plancat.c did that already. In passing, change gin_clean_pending_list() to take RowExclusiveLock not AccessShareLock on its target index. Although it's not clear that that's actually a bug, it seemed very strange for a function that's explicitly going to modify the index to use only AccessShareLock. David Rowley, reviewed by Julien Rouhaud and Amit Langote, a bit of further tweaking by me Discussion: https://postgr.es/m/19465.1541636036@sss.pgh.pa.us