| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fc069a3a6319 implemented Self-Join Elimination (SJE) and put related logic
to ChangeVarNodes_walker(). This commit provides refactoring to remove the
SJE-related logic from ChangeVarNodes_walker() but adds a custom callback to
ChangeVarNodesExtended(), which has a chance to process a node before
ChangeVarNodes_walker(). Passing this callback to ChangeVarNodesExtended()
allows SJE-related node handling to be kept within the analyzejoins.c.
Reported-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49PE3CvnV8vrQ0Dr%3DHqgZZmX0tdNbzVNJxqc8yg-8kDQQ%40mail.gmail.com
Author: Andrei Lepikhov <lepihov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
|
|
|
|
|
|
|
|
| |
This reverts commit 250a718aadad68793e82103282247556a46a3cfc.
It shouldn't be pushed during the release freeze.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/E1uBIbY-000owH-0O%40gemulon.postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fc069a3a6319 implemented Self-Join Elimination (SJE) and put related logic
to ChangeVarNodes_walker(). This commit provides refactoring to remove the
SJE-related logic from ChangeVarNodes_walker() but adds a custom callback to
ChangeVarNodesExtended(), which has a chance to process a node before
ChangeVarNodes_walker(). Passing this callback to ChangeVarNodesExtended()
allows SJE-related node handling to be kept within the analyzejoins.c.
Reported-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49PE3CvnV8vrQ0Dr%3DHqgZZmX0tdNbzVNJxqc8yg-8kDQQ%40mail.gmail.com
Author: Andrei Lepikhov <lepihov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fc069a3a6319 implements Self-Join Elimination (SJE), which can remove base
relations when appropriate. However, regressions tests for SJE only cover
the case when placeholder variables (PHVs) are evaluated and needed only
in a single base rel. If this baserel is removed due to SJE, its clauses,
including PHVs, will be transferred to the keeping relation. Removing these
PHVs may trigger an error on plan creation -- thanks to the b3ff6c742f6c for
detecting that.
This commit skips removal of PHVs during SJE. This might also happen that
we skip the removal of some PHVs that could be removed. However, the overhead
of extra PHVs is small compared to the complexity of analysis needed to remove
them.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When planning queries to partitioned tables, we clone all
EquivalenceMembers belonging to the partitioned table into em_is_child
EquivalenceMembers for each non-pruned partition. For partitioned tables
with large numbers of partitions, this meant the ec_members list could
become large and code searching that list would become slow. Effectively,
the more partitions which were present, the more searches needed to be
performed for operations such as find_ec_member_matching_expr() during
create_plan() and the more partitions present, the longer these searches
would take, i.e., a quadratic slowdown.
To fix this, here we adjust how we store EquivalenceMembers for
em_is_child members. Instead of storing these directly in ec_members,
these are now stored in a new array of Lists in the EquivalenceClass,
which is indexed by the relid. When we want to find EquivalenceMembers
belonging to a certain child relation, we can narrow the search to the
array element for that relation.
To make EquivalenceMember lookup easier and to reduce the amount of code
change, this commit provides a pair of functions to allow iteration over
the EquivalenceMembers of an EC which also handles finding the child
members, if required. Callers that never need to look at child members
can remain using the foreach loop over ec_members, which will now often
be faster due to only parent-level members being stored there.
The actual performance increases here are highly dependent on the number
of partitions and the query being planned. Performance increases can be
visible with as few as 8 partitions, but the speedup is marginal for
such low numbers of partitions. The speedups become much more visible
with a few dozen to hundreds of partitions. With some tested queries
using 56 partitions, the planner was around 3x faster than before. For
use cases with thousands of partitions, these are likely to become
significantly faster. Some testing has shown planner speedups of 60x or
more with 8192 partitions.
Author: Yuya Watari <watari.yuya@gmail.com>
Co-authored-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Tested-by: Thom Brown <thom@linux.com>
Tested-by: newtglobal postgresql_contributors <postgresql_contributors@newtglobalcorp.com>
Discussion: https://postgr.es/m/CAJ2pMkZNCgoUKSE%2B_5LthD%2BKbXKvq6h2hQN8Esxpxd%2Bcxmgomg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Derived clauses are stored in ec_derives, a List of RestrictInfos.
These clauses are later looked up by matching the left and right
EquivalenceMembers along with the clause's parent EC.
This linear search becomes expensive in queries with many joins or
partitions, where ec_derives may contain thousands of entries. In
particular, create_join_clause() can spend significant time scanning
this list.
To improve performance, introduce a hash table (ec_derives_hash) that
is built when the list reaches 32 entries -- the same threshold used
for join_rel_hash. The original list is retained alongside the hash
table to support EC merging and serialization
(_outEquivalenceClass()).
Each clause is stored in the hash table using a canonicalized key: the
EquivalenceMember with the lower memory address is placed in the key
before the one with the higher memory address. This avoids storing or
searching for both permutations of the same clause. For clauses
involving a constant EM, the key places NULL in the first slot and the
non-constant EM in the second.
The hash table is initialized using list_length(ec_derives_list) as
the size hint. simplehash internally adjusts this to the next power of
two after dividing by the fillfactor, so this typically results in at
least 64 buckets near the threshold -- avoiding immediate resizing
while adapting to the actual number of entries.
The lookup logic for derived clauses is now centralized in
ec_search_derived_clause_for_ems(), which consults the hash table when
available and falls back to the list otherwise.
The new ec_clear_derived_clauses() always frees ec_derives_list, even
though some of the original code paths that cleared the old
ec_derives field did not. This ensures consistent cleanup and avoids
leaking memory when large lists are discarded.
An assertion originally placed in find_derived_clause_for_ec_member()
is moved into ec_search_derived_clause_for_ems() so that it is
enforced consistently, regardless of whether the hash table or list is
used for lookup.
This design incorporates suggestions by David Rowley, who proposed
both the key canonicalization and the initial sizing approach to
balance memory usage and CPU efficiency.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Tested-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Tested-by: Amit Langote <amitlangote09@gmail.com>
Tested-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAExHW5vZiQtWU6moszLP5iZ8gLX_ZAUbgEX0DxGLx9PGWCtqUg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
After more discussion about commit ce62f2f2a0a, rename the index AM
property amcancrosscompare to two separate properties
amconsistentequality and amconsistentordering. Also improve the
documentation and update some comments that were previously missed.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
| |
As spotted by Coverity, the calculation of ojrelid mixes signed and unsigned
types causes possible overflow and undefined behavior. Instead of trying to
fix the expression, this commit eliminates the relied local variable. The
explicit branching is used to replace the -1 value. That, in turn, requires
changing the signature of the remove_rel_from_eclass() function.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/914330.1740330169%40sss.pgh.pa.us
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Self-Join Elimination (SJE) feature removes an inner join of a plain
table to itself in the query tree if it is proven that the join can be
replaced with a scan without impacting the query result. Self-join and
inner relation get replaced with the outer in query, equivalence classes,
and planner info structures. Also, the inner restrictlist moves to the
outer one with the removal of duplicated clauses. Thus, this optimization
reduces the length of the range table list (this especially makes sense for
partitioned relations), reduces the number of restriction clauses and,
in turn, selectivity estimations, and potentially improves total planner
prediction for the query.
This feature is dedicated to avoiding redundancy, which can appear after
pull-up transformations or the creation of an EquivalenceClass-derived clause
like the below.
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x);
SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x;
In the future, we could also reduce redundancy caused by subquery pull-up
after unnecessary outer join removal in cases like the one below.
SELECT * FROM t1 WHERE x IN
(SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x);
Also, it can drastically help to join partitioned tables, removing entries
even before their expansion.
The SJE proof is based on innerrel_is_unique() machinery.
We can remove a self-join when for each outer row:
1. At most, one inner row matches the join clause;
2. Each matched inner row must be (physically) the same as the outer one;
3. Inner and outer rows have the same row mark.
In this patch, we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x;
2. Add to the list above the baseretrictinfo of the inner table;
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables;
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
possibility of self-join elimination, the inner and outer clauses must
match exactly.
The relation replacement procedure is not trivial and is partly combined
with the one used to remove useless left joins. Tests covering this feature
were added to join.sql. Some of the existing regression tests changed due
to self-join removal logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Reviewed-by: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Hywel Carver <hywel@skillerwhale.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Michał Kłeczek <michal@kleczek.org>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
|
|
|
|
| |
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two near-identical copies of clause_sides_match_join() existed in
joinpath.c and analyzejoins.c. Deduplicate this by moving the function
into restrictinfo.h.
It isn't quite clear that keeping the inline property of this function
is worthwhile, but this commit is just an exercise in code
deduplication. More effort would be required to determine if the inline
property is worth keeping.
Author: James Hunter <james.hunter.pg@gmail.com>
Discussion: https://postgr.es/m/CAJVSvF7Nm_9kgMLOch4c-5fbh3MYg%3D9BdnDx3Dv7Fcb64zr64Q%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Up to now, remove_rel_from_query() has done a pretty shoddy job
of updating our where-needed bitmaps (per-Var attr_needed and
per-PlaceHolderVar ph_needed relid sets). It removed direct mentions
of the to-be-removed baserel and outer join, which is the minimum
amount of effort needed to keep the data structures self-consistent.
But it didn't account for the fact that the removed join ON clause
probably mentioned Vars of other relations, and those Vars might now
not be needed as high up in the join tree as before. It's easy to
show cases where this results in failing to remove a lower outer join
that could also have been removed.
To fix, recalculate the where-needed bitmaps from scratch after
each successful join removal. This sounds expensive, but it seems
to add only negligible planner runtime. (We cheat a little bit
by preserving "relation 0" entries in the bitmaps, allowing us to
skip re-scanning the targetlist and HAVING qual.)
The submitted test case drew attention because we had successfully
optimized away the lower join prior to v16. I suspect that that's
somewhat accidental and there are related cases that were never
optimized before and now can be. I've not tried to come up with
one, though.
Perhaps we should back-patch this into v16 and v17 to repair the
performance regression. However, since it took a year for anyone
to notice the problem, it can't be affecting too many people. Let's
let the patch bake awhile in HEAD, and see if we get more complaints.
Per bug #18627 from Mikaël Gourlaouen. No back-patch for now.
Discussion: https://postgr.es/m/18627-44f950eb6a8416c2@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The initial building of RestrictInfos and SpecialJoinInfos tends to
create structures that share relid sets (such as syn_lefthand and
outer_relids). There's nothing wrong with that in itself, but when
we modify those relid sets during join removal, we have to be sure
not to corrupt the values that other structures are pointing at.
remove_rel_from_query() was careless about this. It accidentally
worked anyway because (1) we'd never be reducing the sets to empty,
so they wouldn't get pfree'd; and (2) the in-place modification is the
same one that we did or will apply to the other struct's relid set,
so that there wasn't visible corruption at the end of the process.
While there's no live bug in a standard build, of course this is way
too fragile to accept going forward. (Maybe we should back-patch
this change too for safety, but I've refrained for now.)
This bug was exposed by the recent invention of REALLOCATE_BITMAPSETS.
Commit e0477837c had installed a fix, but that went away with the
reversion of SJE, so now we need to fix it again.
David Rowley and Tom Lane
Discussion: https://postgr.es/m/CACJufxFVQmr4=JWHAOSLuKA5Zy9H26nY6tVrRFBOekHoALyCkQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit reverts d3d55ce5713 and subsequent fixes 2b26a694554, 93c85db3b5b,
b44a1708abe, b7f315c9d7d, 8a8ed916f73, b5fb6736ed3, 0a93f803f45, e0477837ce4,
a7928a57b9f, 5ef34a8fc38, 30b4955a466, 8c441c08279, 028b15405b4, fe093994db4,
489072ab7a9, and 466979ef031.
We are quite late in the release cycle and new bugs continue to appear. Even
though we have fixes for all known bugs, there is a risk of throwing many
bugs to end users.
The plan for self-join elimination would be to do more review and testing,
then re-commit in the early v18 cycle.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/2422119.1714691974%40sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
as determined by include-what-you-use (IWYU)
While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that. In some cases, a more
specific #include replaces another less specific one.
Some manual adjustments of the automatic result:
- IWYU currently doesn't know about includes that provide global
variable declarations (like -Wmissing-variable-declarations), so
those includes are being kept manually.
- All includes for port(ability) headers are being kept for now, to
play it safe.
- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
patch from exploding in size.
Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.
As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.
Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit introduces a new field 'sublevels_up' in ReplaceVarnoContext,
and enhances replace_varno_walker() to:
1) recurse into subselects with sublevels_up increased, and
2) perform the replacement only when varlevelsup is equal to sublevels_up.
This commit also fixes some outdated comments. And besides adding relevant
test cases, it makes some unification over existing SJE test cases.
Discussion: https://postgr.es/m/CAMbWs4-%3DPO6Mm9gNnySbx0VHyXjgnnYYwbN9dth%3DTLQweZ-M%2Bg%40mail.gmail.com
Author: Richard Guo
Reviewed-by: Andrei Lepikhov, Alexander Korotkov
|
|
|
|
| |
for commit 489072ab7a
|
|
|
|
|
|
| |
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/56ee4520-e9d1-d519-54fe-c8bff880ce9b%40gmail.com
Author: Alexander Korotkov, Andrei Lepikhov
|
|
|
|
|
|
|
|
|
| |
Calls to this function might give the impression that pg_qsort()
is somehow different than qsort(), when in fact there is a qsort()
macro in port.h that expands all in-tree uses to pg_qsort().
Reviewed-by: Mats Kindahl
Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When removing a useless join, we'd remove PHVs that are not used at join
partner rels or above the join. A PHV that references the join's relid
in ph_eval_at is logically "above" the join and thus should not be
removed. We have the following check for that:
!bms_is_member(ojrelid, phinfo->ph_eval_at)
However, in the case of SJE removing a useless inner join, 'ojrelid' is
set to -1, which would trigger the "negative bitmapset member not
allowed" error in bms_is_member().
Fix it by skipping examining ojrelid for inner joins in this check.
Reported-by: Zuming Jiang
Bug: #18260
Discussion: https://postgr.es/m/18260-1b6a0c4ae311b837%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
|
|
|
|
|
| |
Given that now SJE doesn't work with result relation, turn a code dealing with
that into an assert that it shouldn't happen.
|
|
|
|
|
|
|
|
|
|
| |
The target relation for INSERT/UPDATE/DELETE/MERGE has a different behavior
than other relations in EvalPlanQual() and RETURNING clause. This is why we
forbid target relation to be either source or target relation in SJE.
It's not clear if we could ever support this.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/b9e8f460-f9a6-0e9b-e8ba-60d59f0bc22c%40gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
When SJE uses RelOptInfo.unique_for_rels cache, it passes filtered quals to
innerrel_is_unique_ext(). That might lead to an invalid match to cache entries
made by previous non self-join checking calls. Add UniqueRelInfo.self_join
flag to prevent such cases. Also, fix that SJE should require a strict match
of outerrelids to make sure UniqueRelInfo.extra_clauses are valid.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/4788f781-31bd-9796-d7d6-588a751c8787%40gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the SJE code handles the transfer of qual clauses from the removed
relation to the remaining one, it replaces the Vars of the removed
relation with the Vars of the remaining relation for each clause, and
then reintegrates these clauses into the appropriate restriction or join
clause lists, while attempting to avoid duplicates.
However, the code compares RestrictInfo->clause to determine if two
clauses are duplicates. This is just flat wrong. Two RestrictInfos
with the same clause can have different required_relids,
incompatible_relids, is_pushed_down, and so on. This can cause qual
clauses to be mistakenly omitted, leading to wrong results.
This patch fixes it by comparing the entire RestrictInfos not just their
clauses ignoring 'rinfo_serial' field (otherwise almost all RestrictInfos will
be unique). Making 'rinfo_serial' equal_ignore would break other code. This
is why this commit implements our own comparison function for checking the
equality of RestrictInfos.
Reported-by: Zuming Jiang
Bug: #18261
Discussion: https://postgr.es/m/18261-2a75d748c928609b%40postgresql.org
Author: Richard Guo
|
|
|
|
|
|
|
|
| |
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz
Backpatch-through: 12
|
|
|
|
|
|
| |
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/a89f480f-8143-0965-f22d-0a892777f501%40gmail.com
Author: Andrei Lepikhov
|
|
|
|
|
|
|
|
|
|
| |
There are a lot of situations when we share the same pointer to a Bitmapset
structure across different places. In order to evade undesirable side effects
replace_relid() function should always return a copy.
Reported-by: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_wJthNtYBL%2BSsebpgF-5L2r5zFFk6xYbS0A78GKOTFHw%40mail.gmail.com
Reviewed-by: Richard Guo, Andres Freund, Ashutosh Bapat, Andrei Lepikhov
|
|
|
|
|
|
| |
Discussion: https://postgr.es/m/18187-831da249cbd2ff8e%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
|
|
|
|
|
|
|
|
|
|
| |
Self-join removal appears to be safe to apply with placeholder variables
as long as we handle PlaceHolderVar in replace_varno_walker() and replace
relid in phinfo->ph_lateral.
Discussion: https://postgr.es/m/18187-831da249cbd2ff8e%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
|
|
|
|
|
|
|
|
|
| |
This commit also retires sje_walker. This increases the generalty of replacing
varno in the parse tree and simplifies the code.
Discussion: https://postgr.es/m/18187-831da249cbd2ff8e%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
|
|
|
|
|
|
|
|
|
|
| |
It seems that a PHV evaluated/needed at or below the self join should not have
a problem if we remove the self join. But this requires further investigation.
For now, we just do not remove self joins if the rel to be removed is laterally
referenced by PHVs.
Discussion: https://postgr.es/m/CAMbWs4-ns73VF9gi37q61G3dS6Xuos+HtryMaBh37WQn=BsaJw@mail.gmail.com
Author: Richard Guo
|
|
|
|
|
|
|
|
|
|
|
| |
Add missing replacement of relids in phv->phexpr. Also, remove extra
replace_relid() over phv->phrels.
Reported-by: Zuming Jiang
Bug: #18187
Discussion: https://postgr.es/m/flat/18187-831da249cbd2ff8e%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
|
|
|
|
|
| |
Reported-by: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_STsG1PKQBuvQC8W4sPo3KvML3=jOTjKLUYQuK3g8cpQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
d3d55ce571 changed RelOptInfo.unique_for_rels from the list of Relid sets to
the list of UniqueRelInfo's. But it didn't make UniqueRelInfo a node.
This commit makes UniqueRelInfo a node. Also this commit revises some
comments related to RelOptInfo.unique_for_rels.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/flat/1189851.1698340331%40sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here we adjust relation_has_unique_index_for() so that it no longer makes
use of partial unique indexes as uniqueness proofs. It is incorrect to
use these as the predicates used by check_index_predicates() to set
predOK makes use of not only baserestrictinfo quals as proofs, but also
qual from join conditions. For relation_has_unique_index_for()'s case, we
need to know the relation is unique for a given set of columns before any
joins are evaluated, so if predOK was only set to true due to some join
qual, then it's unsafe to use such indexes in
relation_has_unique_index_for(). The final plan may not even make use
of that index, which could result in reading tuples that are not as
unique as the planner previously expected them to be.
Bug: #17975
Reported-by: Tor Erik Linnerud
Backpatch-through: 11, all supported versions
Discussion: https://postgr.es/m/17975-98a90c156f25c952%40postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since commit b448f1c8d, we've been able to remove left joins
(that are otherwise removable) even when they are underneath
other left joins, a case that was previously prevented by a
delay_upper_joins check. This is a clear improvement, but
it has a surprising side-effect: it's now possible that there
are EquivalenceClasses whose relid sets mention the removed
baserel and/or outer join. If we fail to clean those up,
we may drop essential join quals due to not having any join
level that appears to satisfy their relid sets.
(It's not quite 100% clear that this was impossible before.
But the lack of complaints since we added join removal a dozen
years ago strongly suggests that it was impossible.)
Richard Guo and Tom Lane, per bug #17976 from Zuming Jiang
Discussion: https://postgr.es/m/17976-4b638b525e9a983b@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A placeholder that references the outer join's relid in ph_eval_at
is logically "above" the join, and therefore we can't remove its
PlaceHolderInfo: it might still be used somewhere in the query.
This was not an issue pre-v16 because we failed to remove the join
at all in such cases. The new outer-join-aware-Var infrastructure
permits deducing that it's okay to remove the join, but then we
have to clean up correctly afterwards.
Report and fix by Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_tuVn9EwwMcggGiZJWWstdXX_ci8FeEU17vs+4nLgw3w@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we're deleting a no-op LEFT JOIN from the query, we must remove
the join's joinclauses from surviving relations' joininfo lists.
The invention of "cloned" clauses in 2489d76c4 broke the logic for
that; it'd fail to remove clones that include OJ relids outside the
doomed join's min relid sets, which could happen if that join was
previously discovered to commute with some other join.
This accidentally failed to cause problems in the majority of cases,
because we'd never decide that such a cloned clause was evaluatable at
any surviving join. However, Richard Guo discovered a case where that
did happen, leading to "no relation entry for relid" errors later.
Also, adding assertions that a non-removed clause contains no Vars from
the doomed join exposes that there are quite a few existing regression
test cases where the problem happens but is accidentally not exposed.
The fix for this is just to include the target join's commute_above_r
and commute_below_l sets in the relid set we test against when
deciding whether a join clause is "pushed down" and thus not
removable.
While at it, do a little refactoring: the join's relid set can be
computed inside remove_rel_from_query rather than in the caller.
Patch by me; thanks to Richard Guo for review.
Discussion: https://postgr.es/m/CAMbWs4_PHrRqTKDNnTRsxxQy6BtYCVKsgXm1_gdN2yQ=kmcO5g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 9df8f903e I (tgl) switched join_is_removable() from
using the min relid sets of the join under consideration to
using its full syntactic relid sets. This was a mistake,
as it allowed join removal in cases where a reference to the
join output would survive in some syntactically-lower join
condition. Revert to the former coding.
Richard Guo
Discussion: https://postgr.es/m/CAMbWs4-EU9uBGSP7G-iTwLBhRQ=rnZKvFDhD+n+xhajokyPCKg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After applying outer-join identity 3 in the forward direction,
it was possible for the planner to mistakenly apply a qual clause
from above the two outer joins at the now-lower join level.
This can give the wrong answer, since a value that would get nulled
by the now-upper join might not yet be null.
To fix, when we perform such a transformation, consider that the
now-lower join hasn't really completed the outer join it's nominally
responsible for and thus its relid set should not include that OJ's
relid (nor should its output Vars have that nullingrel bit set).
Instead we add those bits when the now-upper join is performed.
The existing rules for qual placement then suffice to prevent
higher qual clauses from dropping below the now-upper join.
There are a few complications from needing to consider transitive
closures in case multiple pushdowns have happened, but all in all
it's not a very complex patch.
This is all new logic (from 2489d76c4) so no need to back-patch.
The added test cases all have the same results as in v15.
Tom Lane and Richard Guo
Discussion: https://postgr.es/m/0b819232-4b50-f245-1c7d-c8c61bf41827@postgrespro.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've had a bee in my bonnet for some time about getting rid of
RestrictInfo.is_pushed_down, because it's squishily defined and
requires not-inexpensive extra tests to use (cf RINFO_IS_PUSHED_DOWN).
In commit 2489d76c4, I tried to make remove_rel_from_query() not
depend on that macro; but the replacement test is buggy,
as exposed by a report from Rushabh Lathia and Robert Haas.
That change was pretty incidental to the main goal of 2489d76c4,
so let's just revert it for now. (Getting rid of is_pushed_down
is still far away, anyway.)
Discussion: https://postgr.es/m/CA+TgmoYco=hmg+iX1CW9Y1_CzNoSL81J03wUG-d2_3=rue+L2A@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's possible, in admittedly-rather-contrived cases, for an eclass
to generate a derived "join" qual that constrains the post-outer-join
value(s) of some RHS variable(s) without mentioning the LHS at all.
While the mechanisms were set up to work for this, we fell foul of
the "get_common_eclass_indexes" filter installed by commit 3373c7155:
it could decide that such an eclass wasn't relevant to the join, so
that the required qual clause wouldn't get emitted there or anywhere
else.
To fix, apply get_common_eclass_indexes only at inner joins, where
its rule is still valid. At an outer join, fall back to examining all
eclasses that mention either input (or the OJ relid, though it should
be impossible for an eclass to mention that without mentioning either
input). Perhaps we can improve on that later, but the cost/benefit of
adding more complexity to skip some irrelevant eclasses is dubious.
To allow cheaply distinguishing outer from inner joins, pass the
ojrelid to generate_join_implied_equalities as a separate argument.
This also allows cleaning up some sloppiness that had crept into
the definition of its join_relids argument, and it allows accurate
calculation of nominal_join_relids for a child outer join. (The
latter oversight seems not to have been a live bug, but it certainly
could have caused problems in future.)
Also fix what might be a live bug in check_index_predicates: it was
being sloppy about what it passed to generate_join_implied_equalities.
Per report from Richard Guo.
Discussion: https://postgr.es/m/CAMbWs4-DsTBfOvXuw64GdFss2=M5cwtEhY=0DCS7t2gT7P6hSA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was not something that required consideration before MERGE
was invented; but MERGE builds a join tree that left-joins to the
result relation, meaning that remove_useless_joins will consider
removing it. That should generally be stopped by the query's use
of output variables from the result relation. However, if the
result relation is inherited (e.g. a partitioned table) then
we don't add any row identity variables to the query until
expand_inherited_rtentry, which happens after join removal.
This was exposed as of commit 3c569049b, which made it possible
to deduce that a partitioned table could contain at most one row
matching a join key, enabling removal of the not-yet-expanded
result relation. Ooops.
To fix, let's just teach join_is_removable that the query result
rel is never removable. It's a cheap enough test in any case,
and it'll save some cycles that we'd otherwise expend in proving
that it's not removable, even in the cases we got right.
Back-patch to v15 where MERGE was added. Although I think the
case cannot be reached in v15, this seems like cheap insurance.
Per investigation of a report from Alexander Lakhin.
Discussion: https://postgr.es/m/36bee393-b351-16ac-93b2-d46d83637e45@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit b78f6264e I opined that it was "too risky" to delete a
relation's RelOptInfo from the planner's data structures when we have
realized that we don't need to join to it; so instead we just marked
it as a dead relation. In hindsight that judgment seems flawed: any
subsequent access to such a dead relation is arguably a bug in
itself, so leaving the RelOptInfo present just helps to mask bugs.
Let's delete it instead, allowing removal of the whole notion of a
"dead relation". So far as the regression tests can find, this
requires no other code changes, except for one Assert in equivclass.c
that was very dubiously not complaining about access to a dead rel.
Discussion: https://postgr.es/m/229905.1676062220@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
analyzejoins.c took care to clean out removed relids from the
clause_relids and required_relids of RestrictInfos associated with
the doomed rel ... but it paid no attention to the fact that if such a
RestrictInfo contains an OR clause, there will be sub-RestrictInfos
containing similar fields.
I'm more than a bit surprised that this oversight hasn't caused
visible problems before. In any case, it's certainly broken now,
so add logic to clean out the sub-RestrictInfos recursively.
We might need to back-patch this someday.
Per bug #17786 from Robins Tharakan.
Discussion: https://postgr.es/m/17786-f1ea7fbdab97daec@postgresql.org
|
|
|
|
|
|
|
|
|
|
| |
While we got away with this sloppiness before, it's not okay now
that fee7b77b9 caused build_joinrel_tlist() to make use of phrels.
Per report from Robins Tharakan.
Richard Guo (some cosmetic tweaks by me)
Discussion: https://postgr.es/m/CAMbWs4_ngw9sKxpTE8hqk=-ooVX_CQP3DarA4HzkRMz_JKpTrA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The portion of join_is_removable() that checks PlaceHolderVars
can be made a little more accurate and intelligible than it was.
The key point is that we can allow join removal even if a PHV
mentions the target rel in ph_eval_at, if that mention was only
added as a consequence of forcing the PHV up to a join level
that's at/above the outer join we're trying to get rid of.
We can check that by testing for the OJ's relid appearing in
ph_eval_at, indicating that it's supposed to be evaluated after
the outer join, plus the existing test that the contained
expression doesn't actually mention the target rel.
While here, add an explicit check that there'll be something left
in ph_eval_at after we remove the target rel and OJ relid. There
is an Assert later on about that, and I'm not too sure that the
case could happen for a PHV satisfying the other constraints,
but let's just check. (There was previously a bms_is_subset test
that meant to cover this risk, but it's broken now because it
doesn't account for the fact that we'll also remove the OJ relid.)
The real reason for revisiting this code though is that the
Assert I left behind in 8538519db turns out to be easily
reachable, because if a PHV of this sort appears in an upper-level
qual clause then that clause's clause_relids will include the
PHV's ph_eval_at relids. This is a mirage though: we have or soon
will remove these relids from the PHV's ph_eval_at, and therefore
they no longer belong in qual clauses' clause_relids either.
Remove that Assert in join_is_removable, and replace the similar
one in remove_rel_from_query with code to remove the deleted relids
from clause_relids.
Per bug #17773 from Robins Tharakan.
Discussion: https://postgr.es/m/17773-a592e6cedbc7bac5@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have a RestrictInfo that mentions both the removal-candidate
relation and the outer join's relid, then that is a pushed-down
condition not a join condition, so it should be grounds for deciding
that we can't remove the outer join. In commit 2489d76c4, I'd blindly
included the OJ's relid into "joinrelids" as per the new standard
convention, but the checks of attr_needed and ph_needed should only
allow the join's input rels to be mentioned.
Having done that, the check for references in pushed-down quals
a few lines further down should be redundant. I left it in place
as an Assert, though.
While researching this I happened across a couple of comments that
worried about the effects of update_placeholder_eval_levels.
That's gone as of b448f1c8d, so we can remove some worry.
Per bug #17769 from Robins Tharakan. The submitted test case
triggers this more or less accidentally because we flatten out
a LATERAL sub-select after we've done join strength reduction;
if we did that in the other order, this problem would be masked
because the outer join would get simplified to an inner join.
To ensure that the committed test case will continue to test
what it means to even if we make that happen someday, use a
test clause involving COALESCE(), which will prevent us from
using it to do join strength reduction.
Patch by me, but thanks to Richard Guo for initial investigation.
Discussion: https://postgr.es/m/17769-e4f7a5c9d84a80a7@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove RestrictInfo.nullable_relids, along with a good deal of
infrastructure that calculated it. One use-case for it was in
join_clause_is_movable_to, but we can now replace that usage with
a check to see if the clause's relids include any outer join
that can null the target relation. The other use-case was in
join_clause_is_movable_into, but that test can just be dropped
entirely now that the clause's relids include outer joins.
Furthermore, join_clause_is_movable_into should now be
accurate enough that it will accept anything returned by
generate_join_implied_equalities, so we can restore the Assert
that was diked out in commit 95f4e59c3.
Remove the outerjoin_delayed mechanism. We needed this before to
prevent quals from getting evaluated below outer joins that should
null some of their vars. Now that we consider varnullingrels while
placing quals, that's taken care of automatically, so throw the
whole thing away.
Teach remove_useless_result_rtes to also remove useless FromExprs.
Having done that, the delay_upper_joins flag serves no purpose any
more and we can remove it, largely reverting 11086f2f2.
Use constant TRUE for "dummy" clauses when throwing back outer joins.
This improves on a hack I introduced in commit 6a6522529. If we
have a left-join clause l.x = r.y, and a WHERE clause l.x = constant,
we generate r.y = constant and then don't really have a need for the
join clause. But we must throw the join clause back anyway after
marking it redundant, so that the join search heuristics won't think
this is a clauseless join and avoid it. That was a kluge introduced
under time pressure, and after looking at it I thought of a better
way: let's just introduce constant-TRUE "join clauses" instead,
and get rid of them at the end. This improves the generated plans for
such cases by not having to test a redundant join clause. We can also
get rid of the ugly hack used to mark such clauses as redundant for
selectivity estimation.
Patch by me; thanks to Richard Guo for review.
Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
|