| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit implements the automatic conversion of 'x IN (VALUES ...)' into
ScalarArrayOpExpr. That simplifies the query tree, eliminating the appearance
of an unnecessary join.
Since VALUES describes a relational table, and the value of such a list is
a table row, the optimizer will likely face an underestimation problem due to
the inability to estimate cardinality through MCV statistics. The cardinality
evaluation mechanism can work with the array inclusion check operation.
If the array is small enough (< 100 elements), it will perform a statistical
evaluation element by element.
We perform the transformation in the convert_ANY_sublink_to_join() if VALUES
RTE is proper and the transformation is convertible. The conversion is only
possible for operations on scalar values, not rows. Also, we currently
support the transformation only when it ends up with a constant array.
Otherwise, the evaluation of non-hashed SAOP might be slower than the
corresponding Hash Join with VALUES.
Discussion: https://postgr.es/m/0184212d-1248-4f1f-a42d-f5cb1c1976d2%40tantorlabs.com
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Ivan Kush <ivan.kush@tantorlabs.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
|
|
|
|
| |
Backpatch-through: 13
|
|
|
|
|
|
|
|
| |
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz
Backpatch-through: 12
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the plan itself is parallel-safe, and the initPlans are too,
there's no reason anymore to prevent the plan from being marked
parallel-safe. That restriction (dating to commit ab77a5a45) was
really a special case of the fact that we couldn't transmit subplans
to parallel workers at all. We fixed that in commit 5e6d8d2bb and
follow-ons, but this case never got addressed.
We still forbid attaching initPlans to a Gather node that's
inserted pursuant to debug_parallel_query = regress. That's because,
when we hide the Gather from EXPLAIN output, we'd hide the initPlans
too, causing cosmetic regression diffs. It seems inadvisable to
kluge EXPLAIN to the extent required to make the output look the
same, so just don't do it in that case.
Along the way, this also takes care of some sloppiness about updating
path costs to match when we move initplans from one place to another
during createplan.c and setrefs.c. Since all the planning decisions
are already made by that point, this is just cosmetic; but it seems
good to keep EXPLAIN output consistent with where the initplans are.
The diff in query_planner() might be worth remarking on. I found that
one because after fixing things to allow parallel-safe initplans, one
partition_prune test case changed plans (as shown in the patch) ---
but only when debug_parallel_query was active. The reason proved to
be that we only bothered to mark Result nodes as potentially
parallel-safe when debug_parallel_query is on. This neglects the fact
that parallel-safety may be of interest for a sub-query even though
the Result itself doesn't parallelize.
Discussion: https://postgr.es/m/1129530.1681317832@sss.pgh.pa.us
|
|
|
|
| |
Backpatch-through: 11
|
|
|
|
| |
Backpatch-through: 10
|
|
|
|
| |
Backpatch-through: 9.5
|
|
|
|
| |
Backpatch-through: update all files in master, backpatch legal files through 9.4
|
|
|
|
|
|
|
|
|
| |
Switch to 2.1 version of pg_bsd_indent. This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.
Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old name of this file was never a very good indication of what it
was for. Now that there's also access/relation.h, we have a potential
confusion hazard as well, so let's rename it to something more apropos.
Per discussion, "pathnodes.h" is reasonable, since a good fraction of
the file is Path node definitions.
While at it, tweak a couple of other headers that were gratuitously
importing relation.h into modules that don't need it.
Discussion: https://postgr.es/m/7719.1548688728@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Up to now, createplan.c attempted to share PARAM_EXEC slots for
NestLoopParams across different plan levels, if the same underlying Var
was being fed down to different righthand-side subplan trees by different
NestLoops. This was, I think, more of an artifact of using subselect.c's
PlannerParamItem infrastructure than an explicit design goal, but anyway
that was the end result.
This works well enough as long as the plan tree is executing synchronously,
but the feature whereby Gather can execute the parallelized subplan locally
breaks it. An upper NestLoop node might execute for a row retrieved from
a parallel worker, and assign a value for a PARAM_EXEC slot from that row,
while the leader's copy of the parallelized subplan is suspended with a
different active value of the row the Var comes from. When control
eventually returns to the leader's subplan, it gets the wrong answers if
the same PARAM_EXEC slot is being used within the subplan, as reported
in bug #15577 from Bartosz Polnik.
This is pretty reminiscent of the problem fixed in commit 46c508fbc, and
the proper fix seems to be the same: don't try to share PARAM_EXEC slots
across different levels of controlling NestLoop nodes.
This requires decoupling NestLoopParam handling from PlannerParamItem
handling, although the logic remains somewhat similar. To avoid bizarre
division of labor between subselect.c and createplan.c, I decided to move
all the param-slot-assignment logic for both cases out of those files
and put it into a new file paramassign.c. Hopefully it's a bit better
documented now, too.
A regression test case for this might be nice, but we don't know a
test case that triggers the problem with a suitably small amount
of data.
Back-patch to 9.6 where we added Gather nodes. It's conceivable that
related problems exist in older branches; but without some evidence
for that, I'll leave the older branches alone.
Discussion: https://postgr.es/m/15577-ca61ab18904af852@postgresql.org
|
|
|
|
| |
Backpatch-through: certain files through 9.4
|
|
|
|
| |
Backpatch-through: certain files through 9.3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
|
|
|
|
| |
Backpatch certain files through 9.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now we computed these Param ID sets at the end of subquery_planner,
but that approach depends on subquery_planner returning a concrete Plan
tree. We would like to switch over to returning one or more Paths for a
subquery, and in that representation the necessary details aren't fully
fleshed out (not to mention that we don't really want to do this work for
Paths that end up getting discarded). Hence, refactor so that we can
compute the param ID sets at the end of planning, just before
set_plan_references is run.
The main change necessary to make this work is that we need to capture
the set of outer-level Param IDs available to the current query level
before exiting subquery_planner, since the outer levels' plan_params lists
are transient. (That's not going to pose a problem for returning Paths,
since all the work involved in producing that data is part of expression
preprocessing, which will continue to happen before Paths are produced.)
On the plus side, this change gets rid of several existing kluges.
Eventually I'd like to get rid of SS_finalize_plan altogether in favor of
doing this work during set_plan_references, but that will require some
complex rejiggering because SS_finalize_plan needs to visit subplans and
initplans before the main plan. So leave that idea for another day.
|
|
|
|
| |
Backpatch certain files through 9.0
|
|
|
|
|
| |
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
|
|
|
|
|
| |
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
|
|
|
|
| |
commit-fest.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we use a PlaceHolderVar from the outer relation in an inner indexscan,
we need to reference the PlaceHolderVar as such as the value to be passed
in from the outer relation. The previous code effectively tried to
reconstruct the PHV from its component expression, which doesn't work since
(a) the Vars therein aren't necessarily bubbled up far enough, and (b) it
would be the wrong semantics anyway because of the possibility that the PHV
is supposed to have gone to null at some point before the current join.
Point (a) led to "variable not found in subplan target list" planner
errors, but point (b) would have led to silently wrong answers.
Per report from Roger Niederland.
|
| |
|
|
|
|
|
|
|
|
| |
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
relation using the general PARAM_EXEC executor parameter mechanism, rather
than the ad-hoc kluge of passing the outer tuple down through ExecReScan.
The previous method was hard to understand and could never be extended to
handle parameters coming from multiple join levels. This patch doesn't
change the set of possible plans nor have any significant performance effect,
but it's necessary infrastructure for future generalization of the concept
of an inner indexscan plan.
ExecReScan's second parameter is now unused, so it's removed.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.
Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.
This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.
|
|
|
|
| |
provided by Andrew.
|
|
|
|
|
|
|
|
|
| |
making pull_up_sublinks() construct a full-blown JoinExpr tree representation
of IN/EXISTS SubLinks that it is able to convert to semi or anti joins.
This makes pull_up_sublinks() a shade more complex, but the gain in semantic
clarity is worth it. I still have more to do in this area to address the
previously-discussed problems, but this commit in itself fixes at least one
bug in HEAD, as shown by added regression test case.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some unimplemented aspects: recursive queries must use UNION ALL
(should allow UNION too), and we don't have SEARCH or CYCLE clauses.
These might or might not get done for 8.4, but even without them it's a
pretty useful feature.
There are also a couple of small loose ends and definitional quibbles,
which I'll send a memo about to pgsql-hackers shortly. But let's land
the patch now so we can get on with other development.
Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
level of a JOIN/ON clause, not only at top level of WHERE. (However, we
can't do this in an outer join's ON clause, unless the ANY/EXISTS refers
only to the nullable side of the outer join, so that it can effectively
be pushed down into the nullable side.) Per request from Kevin Grittner.
In passing, fix a bug in the initial implementation of EXISTS pullup:
it would Assert if the EXIST's WHERE clause used a join alias variable.
Since we haven't yet flattened join aliases when this transformation
happens, it's necessary to include join relids in the computed set of
RHS relids.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the old JOIN_IN code, but antijoins are new functionality.) Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins. Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation. That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch. So for the moment the output size estimates for semi and
especially anti joins are quite bogus.
|
|
|
|
|
|
|
|
|
|
|
| |
the current query level that aren't in fact output parameters of the current
initPlans. (This means, for example, output parameters of regular subplans.)
To make this work correctly for output parameters coming from sibling
initplans requires rejiggering the API of SS_finalize_plan just a bit:
we need the siblings to be visible to it, rather than hidden as
SS_make_initplan_from_plan had been doing. This is really part of my response
to bug #4290, but I concluded this part probably shouldn't be back-patched,
since all that it's doing is to make a debugging cross-check tighter.
|
| |
|
|
|
|
|
|
|
|
|
| |
this code was last gone over, there wasn't really any alternative to
globals because we didn't have the PlannerInfo struct being passed all
through the planner code. Now that we do, we can restructure things
to avoid non-reentrancy. I'm fooling with this because otherwise I'd
have had to add another global variable for the planned compact
range table list.
|
|
|
|
| |
back-stamped for this.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
a new PlannerInfo struct, which is passed around instead of the bare
Query in all the planning code. This commit is essentially just a
code-beautification exercise, but it does open the door to making
larger changes to the planner data structures without having to muck
with the widely-known Query struct.
|
|
|
|
|
|
| |
into indexscans on matching indexes. For the moment, it only handles
int4 and text datatypes; next step is to add a column to pg_aggregate
so that all MIN/MAX aggregates can be handled. Per my recent proposal.
|
|
|
|
|
|
|
|
| |
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
extensions to support our historical behavior. An aggregate belongs
to the closest query level of any of the variables in its argument,
or the current query level if there are no variables (e.g., COUNT(*)).
The implementation involves adding an agglevelsup field to Aggref,
and treating outer aggregates like outer variables at planning time.
|