postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Reduce "Var IS [NOT] NULL" quals during constant folding	Richard Guo	2025-07-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit b262ad440, we introduced an optimization that reduces an IS [NOT] NULL qual on a NOT NULL column to constant true or constant false, provided we can prove that the input expression of the NullTest is not nullable by any outer joins or grouping sets. This deduction happens quite late in the planner, during the distribution of quals to rels in query_planner. However, this approach has some drawbacks: we can't perform any further folding with the constant, and it turns out to be prone to bugs. Ideally, this deduction should happen during constant folding. However, the per-relation information about which columns are defined as NOT NULL is not available at that point. This information is currently collected from catalogs when building RelOptInfos for base or "other" relations. This patch moves the collection of NOT NULL attribute information for relations before pull_up_sublinks, storing it in a hash table keyed by relation OID. It then uses this information to perform the NullTest deduction for Vars during constant folding. This also makes it possible to leverage this information to pull up NOT IN subqueries. Note that this patch does not get rid of restriction_is_always_true and restriction_is_always_false. Removing them would prevent us from reducing some IS [NOT] NULL quals that we were previously able to reduce, because (a) the self-join elimination may introduce new IS NOT NULL quals after constant folding, and (b) if some outer joins are converted to inner joins, previously irreducible NullTest quals may become reducible. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
*	Centralize collection of catalog info needed early in the planner	Richard Guo	2025-07-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several pieces of catalog information that need to be retrieved for a relation during the early stage of planning. These include relhassubclass, which is used to clear the inh flag if the relation has no children, as well as a column's attgenerated and default value, which are needed to expand virtual generated columns. More such information may be required in the future. Currently, these pieces of catalog data are collected in multiple places, resulting in repeated table_open/table_close calls for each relation in the rangetable. This patch centralizes the collection of all required early-stage catalog information into a single loop over the rangetable, allowing each relation to be opened and closed only once. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
*	Expand virtual generated columns before sublink pull-up	Richard Guo	2025-07-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we expand virtual generated columns after we have pulled up any SubLinks within the query's quals. This ensures that the virtual generated column references within SubLinks that should be transformed into joins are correctly expanded. This approach works well and has posed no issues. In an upcoming patch, we plan to centralize the collection of catalog information needed early in the planner. This will help avoid repeated table_open/table_close calls for relations in the rangetable. Since this information is required during sublink pull-up, we are moving the expansion of virtual generated columns to occur beforehand. To achieve this, if any EXISTS SubLinks can be pulled up, their rangetables are processed just before pulling them up. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAMbWs4-bFJ1At4btk5wqbezdu8PLtQ3zv-aiaY3ry9Ymm=jgFQ@mail.gmail.com
*	Fix failure for generated column with a not-null domain constraint.	Tom Lane	2025-04-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a GENERATED column is declared to have a domain data type where the domain's constraints disallow null values, INSERT commands failed because we built a targetlist that included coercing a null constant to the domain's type. The failure occurred even when the generated value would have been perfectly OK. This is adjacent to the issues fixed in 0da39aa76, but we didn't notice for lack of testing a domain with such a constraint. We aren't going to use the result of the targetlist entry for the generated column --- ExecComputeStoredGenerated will overwrite it. So it's not really necessary that it have the exact datatype of the generated column. This patch fixes the problem by changing the targetlist entry to be a null Const of the domain's base type, which should be sufficiently legal. (We do have to tweak ExecCheckPlanOutput to accept the situation, though.) This has been broken since we implemented generated columns. However, this patch only applies easily as far back as v14, partly because I (tgl) only carried 0da39aa76 back that far, but mostly because v14 significantly refactored the handling of INSERT/UPDATE targetlists. Given the lack of field complaints and the short remaining support lifetime of v13, I judge the cost-benefit ratio not good for devising a version that would work in v13. Reported-by: jian he <jian.universality@gmail.com> Author: jian he <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACJufxG59tip2+9h=rEv-ykOFjt0cbsPVchhi0RTij8bABBA0Q@mail.gmail.com Backpatch-through: 14
*	Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate	Alexander Korotkov	2025-04-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit implements the automatic conversion of 'x IN (VALUES ...)' into ScalarArrayOpExpr. That simplifies the query tree, eliminating the appearance of an unnecessary join. Since VALUES describes a relational table, and the value of such a list is a table row, the optimizer will likely face an underestimation problem due to the inability to estimate cardinality through MCV statistics. The cardinality evaluation mechanism can work with the array inclusion check operation. If the array is small enough (< 100 elements), it will perform a statistical evaluation element by element. We perform the transformation in the convert_ANY_sublink_to_join() if VALUES RTE is proper and the transformation is convertible. The conversion is only possible for operations on scalar values, not rows. Also, we currently support the transformation only when it ends up with a constant array. Otherwise, the evaluation of non-hashed SAOP might be slower than the corresponding Hash Join with VALUES. Discussion: https://postgr.es/m/0184212d-1248-4f1f-a42d-f5cb1c1976d2%40tantorlabs.com Author: Alena Rybakina <a.rybakina@postgrespro.ru> Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Ivan Kush <ivan.kush@tantorlabs.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
*	Fix incorrect handling of subquery pullup	Richard Guo	2025-03-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When pulling up a subquery, if the subquery's target list items are used in grouping set columns, we need to wrap them in PlaceHolderVars. This ensures that expressions retain their separate identity so that they will match grouping set columns when appropriate. In 90947674f, we decided to wrap subquery outputs that are non-var expressions in PlaceHolderVars. This prevents const-simplification from merging them into the surrounding expressions after subquery pullup, which could otherwise lead to failing to match those subexpressions to grouping set columns, with the effect that they'd not go to null when expected. However, that left some loose ends. If the subquery's target list contains two or more identical Var expressions, we can still fail to match the Var expression to the expected grouping set expression. This is not related to const-simplification, but rather to how we match expressions to lower target items in setrefs.c. For sort/group expressions, we use ressortgroupref matching, which works well. For other expressions, we primarily rely on comparing the expressions to determine if they are the same. Therefore, we need a way to prevent setrefs.c from matching the expression to some other identical ones. To fix, wrap all subquery outputs in PlaceHolderVars if the parent query uses grouping sets, ensuring that they preserve their separate identity throughout the whole planning process. Reported-by: Dean Rasheed <dean.a.rasheed@gmail.com> Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/CAMbWs4-meSahaanKskpBn0KKxdHAXC1_EJCVWHxEodqirrGJnw@mail.gmail.com
*	Remove code setting wrap_non_vars to true for UNION ALL subqueries	Richard Guo	2025-03-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In pull_up_simple_subquery and pull_up_constant_function, there is code that sets wrap_non_vars to true when dealing with an appendrel member. The goal is to wrap subquery outputs that are not simple Vars in PlaceHolderVars, ensuring that what we pull up doesn't get merged into a surrounding expression during later processing, which could cause it to fail to match the expression actually available from the appendrel. However, this is unnecessary. When pulling up an appendrel child subquery, the only part of the upper query that could reference the appendrel child yet is the translated_vars list of the associated AppendRelInfo that we just made for this child. Furthermore, we do not want to force use of PHVs in the AppendRelInfo, as there is no outer join between. In fact, perform_pullup_replace_vars always sets wrap_non_vars to false before performing pullup_replace_vars on the AppendRelInfo. This patch simply removes the code that sets wrap_non_vars to true for UNION ALL subqueries. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/CAMbWs4-VXDEi1v+hZYLxpOv0riJxHsCkCH1f46tLnhonEAyGCQ@mail.gmail.com
*	Build whole-row Vars the same way during parsing and planning.	Tom Lane	2025-03-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	makeWholeRowVar() has different rules for constructing a whole-row Var depending on the kind of RTE it's representing. This turns out to be problematic because the rewriter and planner can convert view RTEs and set-returning-function RTEs into subquery RTEs; so a whole-row Var made during planning might look different from one made by the parser. In isolation this doesn't cause any problem, but if a query contains Vars made both ways for the same varno, there are cross-checks in the executor that will complain. This manifests for UPDATE, DELETE, and MERGE queries that use whole-row table references. To fix, we need makeWholeRowVar() to produce the same result from an inlined RTE as it would have for the original. For an inlined view, we can use RangeTblEntry.relid to detect that this had been a view RTE. For inlined SRFs, make a data structure definition change akin to commit 47bb9db75, and say that we won't clear RangeTblEntry.functions until the end of planning. That allows makeWholeRowVar() to repeat what it would have done with the unmodified RTE. Reported-by: Duncan Sands <duncan.sands@deepbluecap.com> Reported-by: Dean Rasheed <dean.a.rasheed@gmail.com> Diagnosed-by: Tender Wang <tndrwang@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/3518c50a-ab18-482f-b916-a37263622501@deepbluecap.com Backpatch-through: 13
*	Eliminate code duplication in replace_rte_variables callbacks	Richard Guo	2025-02-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The callback functions ReplaceVarsFromTargetList_callback and pullup_replace_vars_callback are both used to replace Vars in an expression tree that reference a particular RTE with items from a targetlist, and they both need to expand whole-tuple references and deal with OLD/NEW RETURNING list Vars. As a result, currently there is significant code duplication between these two functions. This patch introduces a new function, ReplaceVarFromTargetList, to perform the replacement and calls it from both callback functions, thereby eliminating code duplication. Author: Dean Rasheed <dean.a.rasheed@gmail.com> Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CAEZATCWhr=FM4X5kCPvVs-g2XEk+ceLsNtBK_zZMkqFn9vUjsw@mail.gmail.com
*	Expand virtual generated columns in the planner	Richard Guo	2025-02-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 83ea6c540 added support for virtual generated columns that are computed on read. All Var nodes in the query that reference virtual generated columns must be replaced with the corresponding generation expressions. Currently, this replacement occurs in the rewriter. However, this approach has several issues. If a Var referencing a virtual generated column has any varnullingrels, those varnullingrels need to be propagated into the generation expression. Failing to do so can lead to "wrong varnullingrels" errors and improper outer-join removal. Additionally, if such a Var comes from the nullable side of an outer join, we may need to wrap the generation expression in a PlaceHolderVar to ensure that it is evaluated at the right place and hence is forced to null when the outer join should do so. In certain cases, such as when the query uses grouping sets, we also need a PlaceHolderVar for anything that is not a simple Var to isolate subexpressions. Failure to do so can result in incorrect results. To fix these issues, this patch expands the virtual generated columns in the planner rather than in the rewriter, and leverages the pullup_replace_vars architecture to avoid code duplication. The generation expressions will be correctly marked with nullingrel bits and wrapped in PlaceHolderVars when needed by the pullup_replace_vars callback function. This requires handling the OLD/NEW RETURNING list Vars in pullup_replace_vars_callback, as it may now deal with Vars referencing the result relation instead of a subquery. The "wrong varnullingrels" error was reported by Alexander Lakhin. The incorrect result issue and the improper outer-join removal issue were reported by Richard Guo. Author: Richard Guo <guofenglinux@gmail.com> Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/75eb1a6f-d59f-42e6-8a78-124ee808cda7@gmail.com
*	Implement Self-Join Elimination	Alexander Korotkov	2025-02-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Self-Join Elimination (SJE) feature removes an inner join of a plain table to itself in the query tree if it is proven that the join can be replaced with a scan without impacting the query result. Self-join and inner relation get replaced with the outer in query, equivalence classes, and planner info structures. Also, the inner restrictlist moves to the outer one with the removal of duplicated clauses. Thus, this optimization reduces the length of the range table list (this especially makes sense for partitioned relations), reduces the number of restriction clauses and, in turn, selectivity estimations, and potentially improves total planner prediction for the query. This feature is dedicated to avoiding redundancy, which can appear after pull-up transformations or the creation of an EquivalenceClass-derived clause like the below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3); SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x); SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x; In the future, we could also reduce redundancy caused by subquery pull-up after unnecessary outer join removal in cases like the one below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x); Also, it can drastically help to join partitioned tables, removing entries even before their expansion. The SJE proof is based on innerrel_is_unique() machinery. We can remove a self-join when for each outer row: 1. At most, one inner row matches the join clause; 2. Each matched inner row must be (physically) the same as the outer one; 3. Inner and outer rows have the same row mark. In this patch, we use the next approach to identify a self-join: 1. Collect all merge-joinable join quals which look like a.x = b.x; 2. Add to the list above the baseretrictinfo of the inner table; 3. Check innerrel_is_unique() for the qual list. If it returns false, skip this pair of joining tables; 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the possibility of self-join elimination, the inner and outer clauses must match exactly. The relation replacement procedure is not trivial and is partly combined with the one used to remove useless left joins. Tests covering this feature were added to join.sql. Some of the existing regression tests changed due to self-join removal logic. Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Simon Riggs <simon@2ndquadrant.com> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org> Reviewed-by: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Hywel Carver <hywel@skillerwhale.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec> Reviewed-by: Michał Kłeczek <michal@kleczek.org> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
*	Handle default NULL insertion a little better.	Tom Lane	2025-01-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a column is omitted in an INSERT, and there's no column default, the code in preptlist.c generates a NULL Const to be inserted. Furthermore, if the column is of a domain type, we wrap the Const in CoerceToDomain, so as to throw a run-time error if the domain has a NOT NULL constraint. That's fine as far as it goes, but there are two problems: 1. We're being sloppy about the type/typmod that the Const is labeled with. It really should have the domain's base type/typmod, since it's the input to CoerceToDomain not the output. This can result in coerce_to_domain inserting a useless length-coercion function (useless because it's being applied to a null). The coercion would typically get const-folded away later, but it'd be better not to create it in the first place. 2. We're not applying expression preprocessing (specifically, eval_const_expressions) to the resulting expression tree. The planner's primary expression-preprocessing pass already happened, so that means the length coercion step and CoerceToDomain node miss preprocessing altogether. This is at the least inefficient, since it means the length coercion and CoerceToDomain will actually be executed for each inserted row, though they could be const-folded away in most cases. Worse, it seems possible that missing preprocessing for the length coercion could result in an invalid plan (for example, due to failing to perform default-function-argument insertion). I'm not aware of any live bug of that sort with core datatypes, and it might be unreachable for extension types as well because of restrictions of CREATE CAST, but I'm not entirely convinced that it's unreachable. Hence, it seems worth back-patching the fix (although I only went back to v14, as the patch doesn't apply cleanly at all in v13). There are several places in the rewriter that are building null domain constants the same way as preptlist.c. While those are before the planner and hence don't have any reachable bug, they're still applying a length coercion that will be const-folded away later, uselessly wasting cycles. Hence, make a utility routine that all of these places can call to do it right. Making this code more careful about the typmod assigned to the generated NULL constant has visible but cosmetic effects on some of the plans shown in contrib/postgres_fdw's regression tests. Discussion: https://postgr.es/m/1865579.1738113656@sss.pgh.pa.us Backpatch-through: 14
*	Add OLD/NEW support to RETURNING in DML queries.	Dean Rasheed	2025-01-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows the RETURNING list of INSERT/UPDATE/DELETE/MERGE queries to explicitly return old and new values by using the special aliases "old" and "new", which are automatically added to the query (if not already defined) while parsing its RETURNING list, allowing things like: RETURNING old.colname, new.colname, ... RETURNING old., new. Additionally, a new syntax is supported, allowing the names "old" and "new" to be changed to user-supplied alias names, e.g.: RETURNING WITH (OLD AS o, NEW AS n) o.colname, n.colname, ... This is useful when the names "old" and "new" are already defined, such as inside trigger functions, allowing backwards compatibility to be maintained -- the interpretation of any existing queries that happen to already refer to relations called "old" or "new", or use those as aliases for other relations, is not changed. For an INSERT, old values will generally be NULL, and for a DELETE, new values will generally be NULL, but that may change for an INSERT with an ON CONFLICT ... DO UPDATE clause, or if a query rewrite rule changes the command type. Therefore, we put no restrictions on the use of old and new in any DML queries. Dean Rasheed, reviewed by Jian He and Jeff Davis. Discussion: https://postgr.es/m/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
*	Update copyright for 2025	Bruce Momjian	2025-01-01
\| \| \| \|	Backpatch-through: 13
*	Improve planner's handling of SetOp plans.	Tom Lane	2024-12-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the code for inserting flag columns in the inputs of a SetOp. That was the only reason why there would be resjunk columns in a set-operations plan tree, so we can get rid of some code that supported that, too. Get rid of choose_hashed_setop() in favor of building Paths for the hashed and sorted alternatives, and letting them fight it out within add_path(). Remove set_operation_ordered_results_useful(), which was giving wrong answers due to examining the wrong ancestor node: we need to examine the immediate SetOperationStmt parent not the topmost node. Instead make each caller of recurse_set_operations() pass down the relevant parent node. (This thinko seems to have led only to wasted planning cycles and possibly-inferior plans, not wrong query answers. Perhaps we should back-patch it, but I'm not doing so right now.) Teach generate_nonunion_paths() to consider pre-sorted inputs for sorted SetOps, rather than always generating a Sort node. Patch by me; thanks to Richard Guo and David Rowley for review. Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
*	Convert SetOp to read its inputs as outerPlan and innerPlan.	Tom Lane	2024-12-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original design for set operations involved appending the two input relations into one and adding a flag column that allows distinguishing which side each row came from. Then the SetOp node pries them apart again based on the flag. This is bizarre. The only apparent reason to do it is that when sorting, we'd only need one Sort node not two. But since sorting is at least O(N log N), sorting all the data is actually worse than sorting each side separately --- plus, we have no chance of taking advantage of presorted input. On top of that, adding the flag column frequently requires an additional projection step that adds cycles, and then the Append node isn't free either. Let's get rid of all of that and make the SetOp node have two separate children, using the existing outerPlan/innerPlan infrastructure. This initial patch re-implements nodeSetop.c and does a bare minimum of work on the planner side to generate correctly-shaped plans. In particular, I've tried not to change the cost estimates here, so that the visible changes in the regression test results will only involve removal of useless projection steps and not any changes in whether to use sorted vs hashed mode. For SORTED mode, we combine successive identical tuples from each input into groups, and then merge-join the groups. The tuple comparisons now use SortSupport instead of simple equality, but the group-formation part should involve roughly the same number of tuple comparisons as before. The cross-comparisons between left and right groups probably add to that, but I'm not sure to quantify how many more comparisons we might need. For HASHED mode, nodeSetop's logic is almost the same as before, just refactored into two separate loops instead of one loop that has an assumption that it will see all the left-hand inputs first. In both modes, I added early-exit logic to not bother reading the right-hand relation if the left-hand input is empty, since neither INTERSECT nor EXCEPT modes can produce any output if the left input is empty. This could have been done before in the hashed mode, but not in sorted mode. Sorted mode can also stop as soon as it exhausts the left input; any remaining right-hand tuples cannot have matches. Also, this patch adds some infrastructure for detecting whether child plan nodes all output the same type of tuple table slot. If they do, the hash table logic can use slightly more efficient code based on assuming that that's the input slot type it will see. We'll make use of that infrastructure in other plan node types later. Patch by me; thanks to Richard Guo and David Rowley for review. Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
*	Avoid unnecessary wrapping for more complex expressions	Richard Guo	2024-12-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When pulling up a subquery that is under an outer join, if the subquery's target list contains a strict expression that uses a subquery variable, it's okay to pull up the expression without wrapping it in a PlaceHolderVar: if the subquery variable is forced to NULL by the outer join, the expression result will come out as NULL too. If the strict expression does not contain any subquery variables, the current code always wraps it in a PlaceHolderVar. While this is not incorrect, the analysis could be tighter: if the strict expression contains any variables of rels that are under the same lowest nulling outer join as the subquery, we can also avoid wrapping it. This is safe because if the subquery variable is forced to NULL by the outer join, the variables of rels that are under the same lowest nulling outer join will also be forced to NULL, resulting in the expression evaluating to NULL as well. Therefore, it's not necessary to force the expression to be evaluated below the outer join. It could be beneficial to get rid of such PHVs because they could imply lateral dependencies, which force us to resort to nestloop joins. This patch checks if the lateral references in the strict expression contain any variables of rels under the same lowest nulling outer join as the subquery, and avoids wrapping the expression in that case. This is fundamentally a generalization of the optimizations for bare Vars and PHVs introduced in commit f64ec81a8. No backpatch as this could result in plan changes. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs4_ENtfRdLaM_bXAxiKRYO7DmwDBDG4_2=VTDi0mJP-jAw@mail.gmail.com
*	Avoid unnecessary wrapping for Vars and PHVs	Richard Guo	2024-12-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When pulling up a lateral subquery that is under an outer join, the current code always wraps a Var or PHV in the subquery's targetlist into a new PlaceHolderVar if it is a lateral reference to something outside the subquery. This is necessary when the Var/PHV references the non-nullable side of the outer join from the nullable side: we need to ensure that it is evaluated at the right place and hence is forced to null when the outer join should do so. However, if the referenced rel is under the same lowest nulling outer join, we can actually omit the wrapping. That's safe because if the subquery variable is forced to NULL by the outer join, the lateral reference variable will come out as NULL too. It could be beneficial to get rid of such PHVs because they imply lateral dependencies, which force us to resort to nestloop joins. This patch leverages the newly introduced nullingrel_info to check if the nullingrels of the subquery RTE are a subset of those of the laterally referenced rel, in order to determine if the referenced rel is under the same lowest nulling outer join. No backpatch as this could result in plan changes. Author: Richard Guo Reviewed-by: James Coleman, Dmitry Dolgov, Andrei Lepikhov Discussion: https://postgr.es/m/CAMbWs48uk6C7Z9m_FNT8_21CMCk68hrgAsz=z6zpP1PNZMkeoQ@mail.gmail.com
*	Avoid mislabeling of lateral references, redux.	Tom Lane	2024-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As I'd feared, commit 5c9d8636d was still a few bricks shy of a load. We can't just leave pulled-up lateral-reference Vars with no new nullingrels: we have to carefully compute what subset of the to-be-replaced Var's nullingrels apply to them, else we still get "wrong varnullingrels" errors. This is a bit tedious, but it looks like we can use the nullingrel data this patch computes for other purposes, enabling better optimization. We don't want to inject unnecessary plan changes into stable branches though, so leave that idea for a later HEAD-only patch. Patch by me, but thanks to Richard Guo for devising a test case that broke 5c9d8636d, and for preliminary investigation about how to fix it. As before, back-patch to v16. Discussion: https://postgr.es/m/E1tGn4j-0003zi-MP@gemulon.postgresql.org
*	Fix typo in header comment for set_operation_ordered_results_useful	David Rowley	2024-11-29
\| \| \| \| \|	Reported-by: Richard Guo Discussion: https://postgr.es/m/CAMbWs492vMy3XNjDZRtqtHfFTK6HVeDwhrEQH7eXGgF_h5Jnzw@mail.gmail.com
*	Avoid mislabeling of lateral references when pulling up a subquery.	Tom Lane	2024-11-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we are pulling up a subquery that's under an outer join, and the subquery's target list contains a strict expression that uses both a subquery variable and a lateral-reference variable, it's okay to pull up the expression without wrapping it in a PlaceHolderVar. That's safe because if the subquery variable is forced to NULL by the outer join, the expression result will come out as NULL too, so we don't have to force that outcome by evaluating the expression below the outer join. It'd be correct to wrap in a PHV, but that can lead to very significantly worse plans, since we'd then have to use a nestloop plan to pass down the lateral reference to where the expression will be evaluated. However, when we do that, we should not mark the lateral reference variable as being nulled by the outer join, because it isn't after we pull up the expression in this way. So the marking logic added by cb8e50a4a was incorrect in this detail, leading to "wrong varnullingrels" errors from the consistency-checking logic in setrefs.c. It seems to be sufficient to just not mark lateral references at all in this case. (I have a nagging feeling that more complexity may be needed in cases where there are several levels of outer join, but some attempts to break it with that didn't succeed.) Per report from Bertrand Mamasam. Back-patch to v16, as the previous patch was. Discussion: https://postgr.es/m/CACZ67_UA_EVrqiFXJu9XK50baEpH=ofEPJswa2kFxg6xuSw-ww@mail.gmail.com
*	Remove useless casts to (void *)	Peter Eisentraut	2024-11-28
\| \| \| \| \| \| \| \|	Many of them just seem to have been copied around for no real reason. Their presence causes (small) risks of hiding actual type mismatches or silently discarding qualifiers Discussion: https://www.postgresql.org/message-id/flat/461ea37c-8b58-43b4-9736-52884e862820@eisentraut.org
*	Compare collations before merging UNION operations.	Tom Lane	2024-11-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the dim past we figured it was okay to ignore collations when combining UNION set-operation nodes into a single N-way UNION operation. I believe that was fine at the time, but it stopped being fine when we added nondeterministic collations: the semantics of distinct-ness are affected by those. v17 made it even less fine by allowing per-child sorting operations to be merged via MergeAppend, although I think we accidentally avoided any live bug from that. Add a check that collations match before deciding that two UNION nodes are equivalent. I also failed to resist the temptation to comment plan_union_children() a little better. Back-patch to all supported branches (v13 now), since they all have nondeterministic collations. Discussion: https://postgr.es/m/3605568.1731970579@sss.pgh.pa.us
*	Remove duplicate words in comments	Daniel Gustafsson	2024-10-31
\| \| \| \| \| \| \| \|	A few comments contained duplicate "the" in sentences, fix by removing one occurrence. Author: Vignesh C <vignesh21@gmail.com> Discussion: https://postgr.es/m/CALDaNm2aEEiPwGJmPdzBxROVvs8n75yCjKz4K1f1B2TdWpzxTA@mail.gmail.com
*	Fix wrong varnullingrels error for MERGE WHEN NOT MATCHED BY SOURCE.	Dean Rasheed	2024-10-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a MERGE command contains WHEN NOT MATCHED BY SOURCE actions, the source relation appears on the outer side of the join. Thus, any Vars referring to the source in the merge join condition, actions, and RETURNING list should be marked as nullable by the join, since they are used in the ModifyTable node above the join. Note that this only applies to the copy of join condition used in the executor to distinguish MATCHED from NOT MATCHED BY SOURCE cases. Vars in the original join condition, inside the join node itself, should not be marked. Failure to correctly mark these Vars led to a "wrong varnullingrels" error in the final stage of query planning, in some circumstances. We happened to get away without this in all previous tests, since they all involved a ModifyTable node directly on top of the join node, so that the top plan targetlist coincided with the output of the join, and the varnullingrels check was more lax. However, if another plan node, such as a one-time filter Result node, gets inserted between the ModifyTable node and the join node, then a stricter check is applied, which fails. Per bug #18634 from Alexander Lakhin. Thanks to Tom Lane and Richard Guo for review and analysis. Back-patch to v17, where WHEN NOT MATCHED BY SOURCE support was added to MERGE. Discussion: https://postgr.es/m/18634-db5299c937877f2b%40postgresql.org
*	Fix incorrect non-strict join recheck in MERGE WHEN NOT MATCHED BY SOURCE.	Dean Rasheed	2024-10-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a MERGE command contains WHEN NOT MATCHED BY SOURCE actions, the merge join condition is used by the executor to distinguish MATCHED from NOT MATCHED BY SOURCE cases. However, this qual is executed using the output from the join subplan node, which nulls the output from the source relation in the not matched case, and so the result may be incorrect if the join condition is "non-strict" -- for example, something like "src.col IS NOT DISTINCT FROM tgt.col". Fix this by enhancing the join recheck condition with an additional "src IS NOT NULL" check, so that it does the right thing when evaluated using the output from the join subplan. Noted by Tom Lane while investigating bug #18634 from Alexander Lakhin. Back-patch to v17, where WHEN NOT MATCHED BY SOURCE support was added to MERGE. Discussion: https://postgr.es/m/18634-db5299c937877f2b%40postgresql.org
*	Introduce an RTE for the grouping step	Richard Guo	2024-09-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If there are subqueries in the grouping expressions, each of these subqueries in the targetlist and HAVING clause is expanded into distinct SubPlan nodes. As a result, only one of these SubPlan nodes would be converted to reference to the grouping key column output by the Agg node; others would have to get evaluated afresh. This is not efficient, and with grouping sets this can cause wrong results issues in cases where they should go to NULL because they are from the wrong grouping set. Furthermore, during re-evaluation, these SubPlan nodes might use nulled column values from grouping sets, which is not correct. This issue is not limited to subqueries. For other types of expressions that are part of grouping items, if they are transformed into another form during preprocessing, they may fail to match lower target items. This can also lead to wrong results with grouping sets. To fix this issue, we introduce a new kind of RTE representing the output of the grouping step, with columns that are the Vars or expressions being grouped on. In the parser, we replace the grouping expressions in the targetlist and HAVING clause with Vars referencing this new RTE, so that the output of the parser directly expresses the semantic requirement that the grouping expressions be gotten from the grouping output rather than computed some other way. In the planner, we first preprocess all the columns of this new RTE and then replace any Vars in the targetlist and HAVING clause that reference this new RTE with the underlying grouping expressions, so that we will have only one instance of a SubPlan node for each subquery contained in the grouping expressions. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com
*	Avoid inserting PlaceHolderVars in cases where pre-v16 PG did not.	Tom Lane	2024-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 2489d76c4 removed some logic from pullup_replace_vars() that avoided wrapping a PlaceHolderVar around a pulled-up subquery output expression if the expression could be proven to go to NULL anyway (because it contained Vars or PHVs of the pulled-up relation and did not contain non-strict constructs). But removing that logic turns out to cause performance regressions in some cases, because the extra PHV blocks subexpression folding, and will do so even if outer-join reduction later turns it into a no-op with no phnullingrels bits. This can for example prevent an expression from being matched to an index. The reason for always adding a PHV was to ensure we had someplace to put the varnullingrels marker bits of the Var being replaced. However, it turns out we can optimize in exactly the same cases that the previous code did, because we can instead attach the needed varnullingrels bits to the contained Var(s)/PHV(s). This is not a complete solution --- it would be even better if we could remove PHVs after reducing them to no-ops. It doesn't look practical to back-patch such an improvement, but this change seems safe and at least gets rid of the performance-regression cases. Per complaint from Nikhil Raj. Back-patch to v16 where the problem appeared. Discussion: https://postgr.es/m/CAG1ps1xvnTZceKK24OUfMKLPvDP2vjT-d+F2AOCWbw_v3KeEgg@mail.gmail.com
*	Treat number of disabled nodes in a path as a separate cost metric.	Robert Haas	2024-08-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, when a path type was disabled by e.g. enable_seqscan=false, we either avoided generating that path type in the first place, or more commonly, we added a large constant, called disable_cost, to the estimated startup cost of that path. This latter approach can distort planning. For instance, an extremely expensive non-disabled path could seem to be worse than a disabled path, especially if the full cost of that path node need not be paid (e.g. due to a Limit). Or, as in the regression test whose expected output changes with this commit, the addition of disable_cost can make two paths that would normally be distinguishible in cost seem to have fuzzily the same cost. To fix that, we now count the number of disabled path nodes and consider that a high-order component of both the startup cost and the total cost. Hence, the path list is now sorted by disabled_nodes and then by total_cost, instead of just by the latter, and likewise for the partial path list. It is important that this number is a count and not simply a Boolean; else, as soon as we're unable to respect disabled path types in all portions of the path, we stop trying to avoid them where we can. Because the path list is now sorted by the number of disabled nodes, the join prechecks must compute the count of disabled nodes during the initial cost phase instead of postponing it to final cost time. Counts of disabled nodes do not cross subquery levels; at present, there is no reason for them to do so, since the we do not postpone path selection across subquery boundaries (see make_subplan). Reviewed by Andres Freund, Heikki Linnakangas, and David Rowley. Discussion: http://postgr.es/m/CA+TgmoZ_+MS+o6NeGK2xyBv-xM+w1AfFVuHE4f_aq6ekHv7YSQ@mail.gmail.com
*	Support "Right Semi Join" plan shapes	Richard Guo	2024-07-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hash joins can support semijoin with the LHS input on the right, using the existing logic for inner join, combined with the assurance that only the first match for each inner tuple is considered, which can be achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag. This can be very useful in some cases since we may now have the option to hash the smaller table instead of the larger. Merge join could likely support "Right Semi Join" too. However, the benefit of swapping inputs tends to be small here, so we do not address that in this patch. Note that this patch also modifies a test query in join.sql to ensure it continues testing as intended. With this patch the original query would result in a right-semi-join rather than semi-join, compromising its original purpose of testing the fix for neqjoinsel's behavior for semi-joins. Author: Richard Guo Reviewed-by: wenhui qiu, Alena Rybakina, Japin Li Discussion: https://postgr.es/m/CAMbWs4_X1mN=ic+SxcyymUqFx9bB8pqSLTGJ-F=MHy4PW3eRXw@mail.gmail.com
*	Fix generate_union_paths for non-sortable types.REL_17_BETA1	Robert Haas	2024-05-21
\| \| \| \| \| \| \| \| \| \| \| \|	The previous logic would fail to set groupList when grouping_is_sortable() returned false, which could cause queries to return wrong answers when some columns were not sortable. David Rowley, reviewed by Heikki Linnakangas and Álvaro Herrera. Reported by Hubert "depesz" Lubaczewski. Discussion: http://postgr.es/m/Zktzf926vslR35Fv@depesz.com Discussion: http://postgr.es/m/CAApHDvra=c8_zZT0J-TftByWN2Y+OJfnjNJFg4Dfdi2s+QzmqA@mail.gmail.com
*	Re-allow planner to use Merge Append to efficiently implement UNION.	Robert Haas	2024-05-21
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit 7204f35919b7e021e8d1bc9f2d76fd6bfcdd2070, thus restoring 66c0185a3 (Allow planner to use Merge Append to efficiently implement UNION) as well as the follow-on commits d5d2205c8, 3b1a7eb28, 7487044d6. Per further discussion on pgsql-release, we wish to ship beta1 with this feature, and patch the bug that was found just before wrap, rather than shipping beta1 with the feature reverted.
*	Revert commit 66c0185a3 and follow-on patches.	Tom Lane	2024-05-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts 66c0185a3 (Allow planner to use Merge Append to efficiently implement UNION) as well as the follow-on commits d5d2205c8, 3b1a7eb28, 7487044d6. In addition to those, 07746a8ef had to be removed then re-applied in a different place, because 66c0185a3 moved the relevant code. The reason for this last-minute thrashing is that depesz found a case in which the patched code creates a completely wrong plan that silently gives incorrect query results. It's unclear what the cause is or how many cases are affected, but with beta1 wrap staring us in the face, there's no time for closer investigation. After we figure that out, we can decide whether to un-revert this for beta2 or hold it for v18. Discussion: https://postgr.es/m/Zktzf926vslR35Fv@depesz.com (also some private discussion among pgsql-release)
*	Finish incomplete revert of ec63622c0.	Tom Lane	2024-05-06
\| \| \| \| \| \| \| \| \| \|	The code change this made might well be fine to keep, but the comment justifying it by reference to self-join removal isn't. Let's just go back to the status quo ante, pending a more thorough review/redesign of SJE. (I found this by grepping to see if any references to self-join removal remained in the tree.)
*	Fix query pullup issue with WindowClause runCondition	David Rowley	2024-05-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	94985c210 added code to detect when WindowFuncs were monotonic and allowed additional quals to be "pushed down" into the subquery to be used as WindowClause runConditions in order to short-circuit execution in nodeWindowAgg.c. The Node representation of runConditions wasn't well selected and because we do qual pushdown before planning the subquery, the planning of the subquery could perform subquery pull-up of nested subqueries. For WindowFuncs with args, the arguments could be changed after pushing the qual down to the subquery. This was made more difficult by the fact that the code duplicated the WindowFunc inside an OpExpr to include in the WindowClauses runCondition field. This could result in duplication of subqueries and a pull-up of such a subquery could result in another initplan parameter being issued for the 2nd version of the subplan. This could result in errors such as: ERROR: WindowFunc not found in subplan target lists To fix this, we change the node representation of these run conditions and instead of storing an OpExpr containing the WindowFunc in a list inside WindowClause, we now store a new node type named WindowFuncRunCondition within a new field in the WindowFunc. These get transformed into OpExprs later in planning once subquery pull-up has been performed. This problem did exist in v15 and v16, but that was fixed by 9d36b883b and e5d20bbd. Cat version bump due to new node type and modifying WindowFunc struct. Bug: #18305 Reported-by: Zuming Jiang Discussion: https://postgr.es/m/18305-33c49b4c830b37b3%40postgresql.org
*	Use macro NUM_MERGE_MATCH_KINDS instead of '3' in MERGE code.	Dean Rasheed	2024-04-19
\| \| \| \| \| \| \| \|	Code quality improvement for 0294df2f1f84. Aleksander Alekseev, reviewed by Richard Guo. Discussion: https://postgr.es/m/CAJ7c6TMsiaV5urU_Pq6zJ2tXPDwk69-NKVh4AMN5XrRiM7N%2BGA%40mail.gmail.com
*	Fix typos and duplicate words	Daniel Gustafsson	2024-04-18
\| \| \| \| \| \| \| \| \| \| \| \|	This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
*	Fix type-checking of RECORD-returning functions in FROM, redux.	Tom Lane	2024-04-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 2ed8f9a01 intended to institute a policy that if a RangeTblFunction has a coldeflist, then the function return type is certainly RECORD, and we should use the coldeflist as the source of truth about what the columns of the record type are. When the original function has been folded to a constant, inspection of the constant might give a different answer. This situation will lead to a tuple-type-mismatch error at execution, but up until that point we need to consistently believe the coldeflist, or we'll have problems from different bits of code reaching different conclusions. expandRTE didn't get that memo though, and would try to produce a tupdesc based on the constant in this situation, leading to an assertion failure. (Desultory testing suggests that non-assert builds often manage to give the expected error, although I also saw a "cache lookup failed for type 0" error, and it seems at least possible that a crash could happen.) Some other callers of get_expr_result_type and get_expr_result_tupdesc were also being incautious about this. While none of them seem to have actual bugs, they're working harder than necessary in this case, besides which it seems safest to have an explicit policy of not using those functions on an RTE with a coldeflist. Adjust the code accordingly, and add commentary to funcapi.c about this policy. Also fix an obsolete comment that claimed "get_expr_result_type() doesn't know how to extract type info from a RECORD constant". That hasn't been true since commit d57534740. Per bug #18422 from Alexander Lakhin. As with the previous commit, back-patch to all supported branches. Discussion: https://postgr.es/m/18422-89ca86c8eac5246d@postgresql.org
*	revert: Transform OR clauses to ANY expression	Alexander Korotkov	2024-04-10
\| \| \| \| \| \| \|	This commit reverts 72bd38cc99 due to implementation and design issues. Reported-by: Tom Lane Discussion: https://postgr.es/m/3604469.1712628736%40sss.pgh.pa.us
*	Fix usage of same ListCell transform_or_to_any()'s in nested loops	Alexander Korotkov	2024-04-08
\| \| \| \| \|	Discussion: https://postgr.es/m/CAAKRu_b4SXNW4GAM0bv3e6wcL5ODSXg1ZdRCn6uyLLjSPbveBg%40mail.gmail.com Author: Melanie Plageman
*	Transform OR clauses to ANY expression	Alexander Korotkov	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) on the preliminary stage of optimization when we are still working with the expression tree. Here Cn is a n-th constant expression, 'expr' is non-constant expression, 'op' is an operator which returns boolean result and has a commuter (for the case of reverse order of constant and non-constant parts of the expression, like 'Cn op expr'). Sometimes it can lead to not optimal plan. This is why there is a or_to_any_transform_limit GUC. It specifies a threshold value of length of arguments in an OR expression that triggers the OR-to-ANY transformation. Generally, more groupable OR arguments mean that transformation will be more likely to win than to lose. Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru Author: Alena Rybakina <lena.ribackina@yandex.ru> Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com>
*	Fix assert failure when planning setop subqueries with CTEs	David Rowley	2024-04-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	66c0185a3 adjusted the UNION planner to request that union child queries produce Paths correctly ordered to implement the UNION by way of MergeAppend followed by Unique. The code there made a bad assumption that if the root->parent_root->parse had setOperations set that the query must be the child subquery of a set operation. That's not true when it comes to planning a non-inlined CTE which is parented by a set operation. This causes issues as the CTE's targetlist has no requirement to match up to the SetOperationStmt's groupClauses Fix this by adding a new parameter to both subquery_planner() and grouping_planner() to explicitly pass the SetOperationStmt only when planning set operation child subqueries. Thank you to Tom Lane for helping to rationalize the decision on the best function signature for subquery_planner(). Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/242fc7c6-a8aa-2daf-ac4c-0a231e2619c1@gmail.com
*	Add support for MERGE ... WHEN NOT MATCHED BY SOURCE.	Dean Rasheed	2024-03-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows MERGE commands to include WHEN NOT MATCHED BY SOURCE actions, which operate on rows that exist in the target relation, but not in the data source. These actions can execute UPDATE, DELETE, or DO NOTHING sub-commands. This is in contrast to already-supported WHEN NOT MATCHED actions, which operate on rows that exist in the data source, but not in the target relation. To make this distinction clearer, such actions may now be written as WHEN NOT MATCHED BY TARGET. Writing WHEN NOT MATCHED without specifying BY SOURCE or BY TARGET is equivalent to writing WHEN NOT MATCHED BY TARGET. Dean Rasheed, reviewed by Alvaro Herrera, Ted Yu and Vik Fearing. Discussion: https://postgr.es/m/CAEZATCWqnKGc57Y_JanUBHQXNKcXd7r=0R4NEZUVwP+syRkWbA@mail.gmail.com
*	Allow planner to use Merge Append to efficiently implement UNION	David Rowley	2024-03-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Until now, UNION queries have often been suboptimal as the planner has only ever considered using an Append node and making the results unique by either using a Hash Aggregate, or by Sorting the entire Append result and running it through the Unique operator. Both of these methods always require reading all rows from the union subqueries. Here we adjust the union planner so that it can request that each subquery produce results in target list order so that these can be Merge Appended together and made unique with a Unique node. This can improve performance significantly as the union child can make use of the likes of btree indexes and/or Merge Joins to provide the top-level UNION with presorted input. This is especially good if the top-level UNION contains a LIMIT node that limits the output rows to a small subset of the unioned rows as cheap startup plans can be used. Author: David Rowley Reviewed-by: Richard Guo, Andy Fan Discussion: https://postgr.es/m/CAApHDvpb_63XQodmxKUF8vb9M7CxyUyT4sWvEgqeQU-GB7QFoQ@mail.gmail.com
*	Remove unused #include's from backend .c files	Peter Eisentraut	2024-03-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	as determined by include-what-you-use (IWYU) While IWYU also suggests to add a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
*	Support MERGE into updatable views.	Dean Rasheed	2024-02-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows the target relation of MERGE to be an auto-updatable or trigger-updatable view, and includes support for WITH CHECK OPTION, security barrier views, and security invoker views. A trigger-updatable view must have INSTEAD OF triggers for every type of action (INSERT, UPDATE, and DELETE) mentioned in the MERGE command. An auto-updatable view must not have any INSTEAD OF triggers. Mixing auto-update and trigger-update actions (i.e., having a partial set of INSTEAD OF triggers) is not supported. Rule-updatable views are also not supported, since there is no rewriter support for non-SELECT rules with MERGE operations. Dean Rasheed, reviewed by Jian He and Alvaro Herrera. Discussion: https://postgr.es/m/CAEZATCVcB1g0nmxuEc-A+gGB0HnfcGQNGYH7gS=7rq0u0zOBXA@mail.gmail.com
*	Allow subquery pullup to wrap a PlaceHolderVar in another one.	Tom Lane	2024-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code for wrapping subquery output expressions in PlaceHolderVars believed that if the expression already was a PlaceHolderVar, it was never necessary to wrap that in another one. That's wrong if the expression is underneath an outer join and involves a lateral reference to outside that scope: failing to add an additional PHV risks evaluating the expression at the wrong place and hence not forcing it to null when the outer join should do so. This is an oversight in commit 9e7e29c75, which added logic to forcibly wrap lateral-reference Vars in PlaceHolderVars, but didn't see that the adjacent case for PlaceHolderVars needed the same treatment. The test case we have for this doesn't fail before 4be058fe9, but now that I see the problem I wonder if it is possible to demonstrate related errors before that. That's moot though, since all such branches are out of support. Per bug #18284 from Holger Reise. Back-patch to all supported branches. Discussion: https://postgr.es/m/18284-47505a20c23647f8@postgresql.org
*	Update copyright for 2024	Bruce Momjian	2024-01-03
\| \| \| \| \| \| \| \|	Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
*	Fix usage of the parse tree for estimate_num_groups() in set operations	Alexander Korotkov	2023-11-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	recurse_set_operations() uses the parse tree for the group number estimation, because of the "varno 0" hack. At the same time 2489d76c49 made root->parse and corresponding parent_root->simple_rte_array[]->subquery distinct copies of the parse tree, while d3d55ce571 introduced self-join removal replacing relid of removed relation only in one of the copies. The present commit fixes this bug by making recurse_set_operations() call estimate_num_groups() with the copy of the parse tree processed by self-join removal. In future, we may think about maintaining just one copy of the parse tree and/or keeping removed relids as aliases. Reported-by: Zuming Jiang Bug: #18170 Discussion: https://postgr.es/m/flat/18170-f1d17bf9a0d58b24%40postgresql.org Author: Richard Guo, Alexander Korotkov Reviewed-by: Andrei Lepikhov
*	Fix another cause of "wrong varnullingrels" planner failures.	Tom Lane	2023-06-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I removed the delay_upper_joins mechanism in commit b448f1c8d, reasoning that it was only needed when we have a single-table (SELECT ... WHERE) as the immediate RHS child of a left join, and we could get rid of that by hoisting the WHERE condition into the parent join's quals. However that new code missed a case: we could have "foo LEFT JOIN ((SELECT ... WHERE) LEFT JOIN bar)", and if the two left joins can be commuted then we now have the problematic query shape. We can fix this too easily enough, by allowing the syntactically-lower left join to pass through its parent qual location pointer recursively. That lets prepjointree.c discard the SELECT by temporarily hoisting the WHERE condition into the ancestor join's qual. Per bug #17978 from Zuming Jiang. Discussion: https://postgr.es/m/17978-12f3d93a55297266@postgresql.org