| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's unusual to have any resjunk columns in an ON CONFLICT ... UPDATE
list, but it can happen when MULTIEXPR_SUBLINK SubPlans are present.
If it happens, the ON CONFLICT UPDATE code path would end up storing
tuples that include the values of the extra resjunk columns. That's
fairly harmless in the short run, but if new columns are added to
the table then the values would become accessible, possibly leading
to malfunctions if they don't match the datatypes of the new columns.
This had escaped notice through a confluence of missing sanity checks,
including
* There's no cross-check that a tuple presented to heap_insert or
heap_update matches the table rowtype. While it's difficult to
check that fully at reasonable cost, we can easily add assertions
that there aren't too many columns.
* The output-column-assignment cases in execExprInterp.c lacked
any sanity checks on the output column numbers, which seems like
an oversight considering there are plenty of assertion checks on
input column numbers. Add assertions there too.
* We failed to apply nodeModifyTable's ExecCheckPlanOutput() to
the ON CONFLICT UPDATE tlist. That wouldn't have caught this
specific error, since that function is chartered to ignore resjunk
columns; but it sure seems like a bad omission now that we've seen
this bug.
In HEAD, the right way to fix this is to make the processing of
ON CONFLICT UPDATE tlists work the same as regular UPDATE tlists
now do, that is don't add "SET x = x" entries, and use
ExecBuildUpdateProjection to evaluate the tlist and combine it with
old values of the not-set columns. This adds a little complication
to ExecBuildUpdateProjection, but allows removal of a comparable
amount of now-dead code from the planner.
In the back branches, the most expedient solution seems to be to
(a) use an output slot for the ON CONFLICT UPDATE projection that
actually matches the target table, and then (b) invent a variant of
ExecBuildProjectionInfo that can be told to not store values resulting
from resjunk columns, so it doesn't try to store into nonexistent
columns of the output slot. (We can't simply ignore the resjunk columns
altogether; they have to be evaluated for MULTIEXPR_SUBLINK to work.)
This works back to v10. In 9.6, projections work much differently and
we can't cheaply give them such an option. The 9.6 version of this
patch works by inserting a JunkFilter when it's necessary to get rid
of resjunk columns.
In addition, v11 and up have the reverse problem when trying to
perform ON CONFLICT UPDATE on a partitioned table. Through a
further oversight, adjust_partition_tlist() discarded resjunk columns
when re-ordering the ON CONFLICT UPDATE tlist to match a partition.
This accidentally prevented the storing-bogus-tuples problem, but
at the cost that MULTIEXPR_SUBLINK cases didn't work, typically
crashing if more than one row has to be updated. Fix by preserving
resjunk columns in that routine. (I failed to resist the temptation
to add more assertions there too, and to do some minor code
beautification.)
Per report from Andres Freund. Back-patch to all supported branches.
Security: CVE-2021-32028
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.
The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.
ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.
With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.
NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.
Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.
Reviewed-by: Amit Langote
Security: CVE-2021-3393
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
perform_pruning_combine_step() was not taught about the number of
partition indexes used in hash partitioning; more embarrassingly,
get_matching_hash_bounds() also had it wrong. These errors are masked
in the common case where all the partitions have the same modulus
and no partition is missing. However, with missing or unequal-size
partitions, we could erroneously prune some partitions that need
to be scanned, leading to silently wrong query answers.
While a minimal-footprint fix for this could be to export
get_partition_bound_num_indexes and make the incorrect functions use it,
I'm of the opinion that that function should never have existed in the
first place. It's not reasonable data structure design that
PartitionBoundInfoData lacks any explicit record of the length of
its indexes[] array. Perhaps that was all right when it could always
be assumed equal to ndatums, but something should have been done about
it as soon as that stopped being true. Putting in an explicit
"nindexes" field makes both partition_bounds_equal() and
partition_bounds_copy() simpler, safer, and faster than before,
and removes explicit knowledge of the number-of-partition-indexes
rules from some other places too.
This change also makes get_hash_partition_greatest_modulus obsolete.
I left that in place in case any external code uses it, but no core
code does anymore.
Per bug #16840 from Michał Albrycht. Back-patch to v11 where the
hash partitioning code came in. (In the back branches, add the new
field at the end of PartitionBoundInfoData to minimize ABI risks.)
Discussion: https://postgr.es/m/16840-571a22976f829ad4@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, gen_partprune_steps() always built executor pruning steps
using all suitable clauses, including those containing PARAM_EXEC
Params. This meant that the pruning steps were only completely safe
for executor run-time (scan start) pruning. To prune at executor
startup, we had to ignore the steps involving exec Params. But this
doesn't really work in general, since there may be logic changes
needed as well --- for example, pruning according to the last operator's
btree strategy is the wrong thing if we're not applying that operator.
The rules embodied in gen_partprune_steps() and its minions are
sufficiently complicated that tracking their incremental effects in
other logic seems quite impractical.
Short of a complete redesign, the only safe fix seems to be to run
gen_partprune_steps() twice, once to create executor startup pruning
steps and then again for run-time pruning steps. We can save a few
cycles however by noting during the first scan whether we rejected
any clauses because they involved exec Params --- if not, we don't
need to do the second scan.
In support of this, refactor the internal APIs in partprune.c to make
more use of passing information in the GeneratePruningStepsContext
struct, rather than as separate arguments.
This is, I hope, the last piece of our response to a bug report from
Alan Jackson. Back-patch to v11 where this code came in.
Discussion: https://postgr.es/m/FAD28A83-AC73-489E-A058-2681FA31D648@tvsquared.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
With correctly crafted DDLs, this could lead to disclosure of arbitrary
backend memory a user may have no right to access. This impacts only
REL_11_STABLE, as the issue has been introduced by 34295b8.
On HEAD, add regression tests to cover this issue in the future.
Author: Michael Paquier
Reviewed-by: Noah Misch
Security: CVE-2019-10129
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to insert a tuple into a partitioned table, the routing to
the correct partition has been messed up by mixing when a tuple needs to
be stored in an intermediate parent's slot and when a tuple needs to be
converted because of attribute changes between the immediate parent
relation and the parent relation one level above that (the grandparent).
This could trigger errors like the following:
ERROR: cannot extract attribute from empty tuple slot SQL state: XX000
This was not detected because regression tests with dropped attributes
only included tests with two levels of partitioning, and this can be
triggered with three levels or more.
This fixes bug #15733, which has been introduced by 34295b8. The bug
happens only on REL_11_STABLE and HEAD gains the regression tests added
for this bug.
Reported-by: Petr Fedorov
Author: Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/15733-7692379e310b80ec@postgresql.org
|
|
|
|
|
|
|
| |
There's no reason to expose the struct definition, so don't.
Author: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
Discussion: https://postgr.es/m/d3fa24c1-bc65-7133-81df-6474387ccc4f@lab.ntt.co.jp
|
|
|
|
|
|
|
|
|
|
|
| |
In a multi-layer partitioning setup, if at plan time all the
sub-partitions are pruned but the intermediate one remains, the executor
later throws a spurious error that there's nothing to prune. That is
correct, but there's no reason to throw an error. Therefore, don't.
Reported-by: Andreas Seltenreich <seltenreich@gmx.de>
Author: David Rowley <david.rowley@2ndquadrant.com>
Discussion: https://postgr.es/m/87in4h98i0.fsf@ansel.ydns.eu
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous coding here supposed that if run-time partitioning applied to
a particular Append/MergeAppend plan, then all child plans of that node
must be members of a single partitioning hierarchy. This is totally wrong,
since an Append could be formed from a UNION ALL: we could have multiple
hierarchies sharing the same Append, or child plans that aren't part of any
hierarchy.
To fix, restructure the related plan-time and execution-time data
structures so that we can have a separate list or array for each
partitioning hierarchy. Also track subplans that are not part of any
hierarchy, and make sure they don't get pruned.
Per reports from Phil Florent and others. Back-patch to v11, since
the bug originated there.
David Rowley, with a lot of cosmetic adjustments by me; thanks also
to Amit Langote for review.
Discussion: https://postgr.es/m/HE1PR03MB17068BB27404C90B5B788BCABA7B0@HE1PR03MB1706.eurprd03.prod.outlook.com
|
|
|
|
|
|
|
|
|
|
|
| |
Some operations were being done in a longer-lived memory context,
causing intra-query leaks. It's not noticeable unless you're doing a
large COPY, but if you are, it eats enough memory to cause a problem.
Co-authored-by: Kohei KaiGai <kaigai@heterodb.com>
Co-authored-by: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAOP8fzYtVFWZADq4c=KoTAqgDrHWfng+AnEPEZccyxqxPVbbWQ@mail.gmail.com
|
|
|
|
|
| |
"compute_hash_value" is particularly gratuitously generic, but IMO
all of these ought to have names clearly related to partitioning.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous coding saved pointers into the partitioned table's relcache
entry, but then closed the relcache entry, causing those pointers to
nominally become dangling. Actual trouble would be seen in the field
only if a relcache flush occurred mid-query, but that's hardly out of
the question.
While we could fix this by copying all the data in question at query
start, it seems better to just hold the relcache entry open for the
whole query.
While at it, improve the handling of support-function lookups: do that
once per query not once per pruning test. There's still something to be
desired here, in that we fail to exploit the possibility of caching data
across queries in the fn_extra fields of the relcache's FmgrInfo structs,
which could happen if we just used those structs in-place rather than
copying them. However, combining that with the possibility of per-query
lookups of cross-type comparison functions seems to require changes in the
APIs of a lot of the pruning support functions, so it's too invasive to
consider as part of this patch. A win would ensue only for complex
partition key data types (e.g. arrays), so it may not be worth the
trouble.
David Rowley and Tom Lane
Discussion: https://postgr.es/m/17850.1528755844@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't need two passes if we scan child partitions before parents,
as that way the children's present_parts are up to date before they're
needed. I (tgl) think there's actually a bug being fixed here, for the
case of an intermediate partitioned table with no direct leaf children,
but haven't attempted to construct a test case to prove it.
David Rowley
Discussion: https://postgr.es/m/CAKJS1f-6GODRNgEtdPxCnAPme2h2hTztB6LmtfdmcYAAOE0kQg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Starting with commit f0e44751d717, ExecConstraints was in charge of
running the partition constraint; commit 19c47e7c8202 modified that so
that caller could request to skip that checking depending on some
conditions, but that commit and 15ce775faa42 together introduced a small
bug there which caused ExecInsert to request skipping the constraint
check but have this not be honored -- in effect doing the check twice.
This could have been fixed in a very small patch, but on further
analysis of the involved function and its callsites, it turns out to be
simpler to give the responsibility of checking the partition constraint
fully to the caller, and return ExecConstraints to its original
(pre-partitioning) shape where it only checked tuple descriptor-related
constraints. Each caller must do partition constraint checking on its
own schedule, which is more convenient after commit 2f178441044 anyway.
Reported-by: David Rowley
Author: David Rowley, Álvaro Herrera
Reviewed-by: Amit Langote, Amit Khandekar, Simon Riggs
Discussion: https://postgr.es/m/CAKJS1f8w8+awsxgea8wt7_UX8qzOQ=Tm1LD+U1fHqBAkXxkW2w@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
Use "subplan" rather than "subnode" to refer to the child plans of
a partitioning Append; this seems a bit more specific and hence
clearer. Improve assorted comments. No non-cosmetic changes.
David Rowley and Tom Lane
Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These struct definitions were originally dropped into primnodes.h,
which is a poor choice since that's mainly intended for primitive
expression node types; these are not in that category. What they
are is auxiliary info in Plan trees, so move them to plannodes.h.
For consistency, also relocate some related code that was apparently
placed with the aid of a dartboard.
There's no interesting code changes in this commit, just reshuffling.
David Rowley and Tom Lane
Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The initial coding of the run-time-pruning feature only coped with cases
where the partition key(s) are compared to Params. That is a bit silly;
we can allow it to work with any non-Var-containing stable expression, as
long as we take special care with expressions containing PARAM_EXEC Params.
The code is hardly any longer this way, and it's considerably clearer
(IMO at least). Per gripe from Pavel Stehule.
David Rowley, whacked around a bit by me
Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
|
|
|
|
|
|
|
|
|
| |
In editing 09b12d52db1cf1a4c72d876f3fb6c9d06919e51a I made it wrong;
fix that and try to more clearly explain the situation.
Patch by me, reviewed by David Rowley and Amit Langote
Discussion: http://postgr.es/m/CA+TgmobAq+mA5hzm0a5OS38qQY5758DDDGqa3sBJN4hvir-H9w@mail.gmail.com
|
|
|
|
|
|
| |
David Rowley, reviewed by Amit Langote, and revised a bit by me.
Discussion: http://postgr.es/m/CAKJS1f9yyimYyFzbHM4EwE+tkj4jvrHqSH0H4S4Kbas=UFpc9Q@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without these fixes, changes to the inserted tuple made by remote
triggers are ignored when building local RETURNING tuples.
In the core code, call ExecInitRoutingInfo at a later point from
within ExecInitPartitionInfo so that the FDW callback gets invoked
after the returning list has been built. But move CheckValidResultRel
out of ExecInitRoutingInfo so that it can happen at an earlier stage.
In postgres_fdw, refactor assorted deparsing functions to work with
the RTE rather than the PlannerInfo, which saves us having to
construct a fake PlannerInfo in cases where we don't have a real one.
Then, we can pass down a constructed RTE that yields the correct
deparse result when no real one exists. Unfortunately, this
necessitates a hack that understands how the core code manages RT
indexes for update tuple routing, which is ugly, but we don't have a
better idea right now.
Original report, analysis, and patch by Etsuro Fujita. Heavily
refactored by me. Then worked over some more by Amit Langote.
Discussion: http://postgr.es/m/5AD4882B.10002@lab.ntt.co.jp
|
|
|
|
|
|
|
|
|
| |
Remove the words "if not already done." This obsolete wording
corresponds to an early development version of what became edd44738bc8.
Author: Etsuro Fujita
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/5ADF117B.5030606@lab.ntt.co.jp
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of doing ExecInitExpr every time a Param needs to be evaluated
in run-time partition pruning, do it once during run-time pruning
set-up and cache the exprstate in PartitionPruneContext, saving a lot of
work.
Author: David Rowley
Reviewed-by: Amit Langote, Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f8-x+q-90QAPDu_okhQBV4DPEtPz8CJ=m0940GyT4DA4w@mail.gmail.com
|
|
|
|
| |
Per pademelon.
|
|
|
|
|
|
|
|
|
|
|
|
| |
I added this "optimization" on top of Amit Langote's 158b7bc6d779, but
the quick path is never taken because the partition uses a different
pg_type oid than its parent table (causing equalTupleDescs to return
false). Changing that requires more analysis and is too considered
dangerous at this point in the cycle, so revert it.
We might make it work someday, but not for pg11.
Discussion: https://postgr.es/m/825031be-942c-8c24-6163-13c27f217a3d@lab.ntt.co.jp
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had an Assert() preventing whole-row expressions from being used in
the SET clause of INSERT ON CONFLICT, but it seems unnecessary, given
some tests, so remove it. Add a new test to exercise the case.
Still at ExecInitPartitionInfo, we used map_partition_varattnos (which
constructs an attribute map, then calls map_variable_attnos) using
the same two relations many times in different expressions and with
different parameters. Constructing the map over and over is a waste.
To avoid this repeated work, construct the map once, and use
map_variable_attnos() directly instead.
Author: Amit Langote, per comments by me (Álvaro)
Discussion: https://postgr.es/m/20180326142016.m4st5e34chrzrknk@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's been a massive addition of partitioning code in PostgreSQL 11,
with little oversight on its placement, resulting in a
catalog/partition.c with poorly defined boundaries and responsibilities.
This commit tries to set a couple of distinct modules to separate things
a little bit. There are no code changes here, only code movement.
There are three new files:
src/backend/utils/cache/partcache.c
src/include/partitioning/partdefs.h
src/include/utils/partcache.h
The previous arrangement of #including catalog/partition.h almost
everywhere is no more.
Authors: Amit Langote and Álvaro Herrera
Discussion: https://postgr.es/m/98e8d509-790a-128c-be7f-e48a5b2d8d97@lab.ntt.co.jp
https://postgr.es/m/11aa0c50-316b-18bb-722d-c23814f39059@lab.ntt.co.jp
https://postgr.es/m/143ed9a4-6038-76d4-9a55-502035815e68@lab.ntt.co.jp
https://postgr.es/m/20180413193503.nynq7bnmgh6vs5vm@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commits d204ef63776b8a00ca220adec23979091564e465,
83454e3c2b28141c0db01c7d2027e01040df5249 and a few more commits thereafter
(complete list at the end) related to MERGE feature.
While the feature was fully functional, with sufficient test coverage and
necessary documentation, it was felt that some parts of the executor and
parse-analyzer can use a different design and it wasn't possible to do that in
the available time. So it was decided to revert the patch for PG11 and retry
again in the future.
Thanks again to all reviewers and bug reporters.
List of commits reverted, in reverse chronological order:
f1464c5380 Improve parse representation for MERGE
ddb4158579 MERGE syntax diagram correction
530e69e59b Allow cpluspluscheck to pass by renaming variable
01b88b4df5 MERGE minor errata
3af7b2b0d4 MERGE fix variable warning in non-assert builds
a5d86181ec MERGE INSERT allows only one VALUES clause
4b2d44031f MERGE post-commit review
4923550c20 Tab completion for MERGE
aa3faa3c7a WITH support in MERGE
83454e3c2b New files for MERGE
d204ef6377 MERGE SQL Command following SQL:2016
Author: Pavan Deolasee
Reviewed-by: Michael Paquier
|
|
|
|
|
|
|
|
| |
Fix a couple of typos, and update a comment about why we set a BMS to
NULL.
Author: David Rowley
Discussion: http://postgr.es/m/CAKJS1f-tux=KdUz6ENJ9GHM_V2qgxysadYiOyQS9Ko9PTteVhQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Traditionally, include/catalog/pg_foo.h contains extern declarations
for functions in backend/catalog/pg_foo.c, in addition to its function
as the authoritative definition of the pg_foo catalog's rowtype.
In some cases, we'd been forced to split out those extern declarations
into separate pg_foo_fn.h headers so that the catalog definitions
could be #include'd by frontend code. That problem is gone as of
commit 9c0a0de4c, so let's undo the splits to make things less
confusing.
Discussion: https://postgr.es/m/23690.1523031777@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Existing partition pruning is only able to work at plan time, for query
quals that appear in the parsed query. This is good but limiting, as
there can be parameters that appear later that can be usefully used to
further prune partitions.
This commit adds support for pruning subnodes of Append which cannot
possibly contain any matching tuples, during execution, by evaluating
Params to determine the minimum set of subnodes that can possibly match.
We support more than just simple Params in WHERE clauses. Support
additionally includes:
1. Parameterized Nested Loop Joins: The parameter from the outer side of the
join can be used to determine the minimum set of inner side partitions to
scan.
2. Initplans: Once an initplan has been executed we can then determine which
partitions match the value from the initplan.
Partition pruning is performed in two ways. When Params external to the plan
are found to match the partition key we attempt to prune away unneeded Append
subplans during the initialization of the executor. This allows us to bypass
the initialization of non-matching subplans meaning they won't appear in the
EXPLAIN or EXPLAIN ANALYZE output.
For parameters whose value is only known during the actual execution
then the pruning of these subplans must wait. Subplans which are
eliminated during this stage of pruning are still visible in the EXPLAIN
output. In order to determine if pruning has actually taken place, the
EXPLAIN ANALYZE must be viewed. If a certain Append subplan was never
executed due to the elimination of the partition then the execution
timing area will state "(never executed)". Whereas, if, for example in
the case of parameterized nested loops, the number of loops stated in
the EXPLAIN ANALYZE output for certain subplans may appear lower than
others due to the subplan having been scanned fewer times. This is due
to the list of matching subnodes having to be evaluated whenever a
parameter which was found to match the partition key changes.
This commit required some additional infrastructure that permits the
building of a data structure which is able to perform the translation of
the matching partition IDs, as returned by get_matching_partitions, into
the list index of a subpaths list, as exist in node types such as
Append, MergeAppend and ModifyTable. This allows us to translate a list
of clauses into a Bitmapset of all the subpath indexes which must be
included to satisfy the clause list.
Author: David Rowley, based on an earlier effort by Beena Emerson
Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi,
Jesper Pedersen
Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
Also enable this for postgres_fdw.
Etsuro Fujita, based on an earlier patch by Amit Langote. The larger
patch series of which this is a part has been reviewed by Amit
Langote, David Fetter, Maksim Milyutin, Álvaro Herrera, Stephen Frost,
and me. Minor documentation changes to the final version by me.
Discussion: http://postgr.es/m/29906a26-da12-8c86-4fb9-d8f88442f2b9@lab.ntt.co.jp
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Review comments from Andres Freund
* Consolidate code into AfterTriggerGetTransitionTable()
* Rename nodeMerge.c to execMerge.c
* Rename nodeMerge.h to execMerge.h
* Move MERGE handling in ExecInitModifyTable()
into a execMerge.c ExecInitMerge()
* Move mt_merge_subcommands flags into execMerge.h
* Rename opt_and_condition to opt_merge_when_and_condition
* Wordsmith various comments
Author: Pavan Deolasee
Reviewer: Simon Riggs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.
MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
DO NOTHING;
MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.
MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.
MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.
Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.
This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.
Various issues reported via sqlsmith by Andreas Seltenreich
Authors: Pavan Deolasee, Simon Riggs
Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs
Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com
https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
|
|
|
|
| |
This reverts commit 354f13855e6381d288dfaa52bcd4f2cb0fd4a5eb.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit eb7ed3f30634 enabled unique constraints on partitioned tables,
but one thing that was not working properly is INSERT/ON CONFLICT.
This commit introduces a new node keeps state related to the ON CONFLICT
clause per partition, and fills it when that partition is about to be
used for tuple routing.
Author: Amit Langote, Álvaro Herrera
Reviewed-by: Etsuro Fujita, Pavan Deolasee
Discussion: https://postgr.es/m/20180228004602.cwdyralmg5ejdqkq@alvherre.pgsql
|
|
|
|
|
|
|
|
|
| |
These values can be obtained from the ModifyTable node which is already
a part of both the ModifyTableState and ExecInsert.
Author: Álvaro Herrera, Amit Langote
Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/20180316151303.rml2p5wffn3o6qy6@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since edd44738bc8814 WCO expressions of partitioned tables are
initialized with the first subplan as parent. That's not correct, as
the correct context is the ModifyTableState node. That's also what is
used for RETURNING processing, initialized nearby.
This appears not to cause any visible problems for in core code, but
is problematic for in development patch.
Discussion: https://postgr.es/m/20180303043818.tnvlo243bgy7una3@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's not necessary to fully initialize the executor data structures
for partitions to which no tuples are ever routed. Consider, for
example, an INSERT statement that inserts only one row: it only cares
about the partition to which that one row is routed. The new function
ExecInitPartitionInfo performs the initialization in question only
when a particular partition is about to receive a tuple. This includes
creating, validating, and saving a pointer to the ResultRelInfo,
setting up for speculative insertions, translating WCOs and
initializing the resulting expressions, translating returning lists
and building the appropriate projection information, and setting up a
tuple conversion map.
One thing that's not deferred is locking the child partitions; that
seems desirable but would need more thought. Still, testing shows
that this makes single-row inserts significantly faster on a table
with many partitions without harming the bulk-insert case.
Amit Langote, reviewed by Etsuro Fujita, with a few changes by me
Discussion: http://postgr.es/m/8975331d-d961-cbdd-f862-fdd3d97dc2d0@lab.ntt.co.jp
|
|
|
|
|
|
| |
Etsuro Fujita
Discussion: http://postgr.es/m/5A8EAF74.5010905@lab.ntt.co.jp
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The reason for doing so is that it will allow expression evaluation to
optimize based on the underlying tupledesc. In particular it will
allow to JIT tuple deforming together with the expression itself.
For that expression initialization needs to be moved after the
relevant slots are initialized - mostly unproblematic, except in the
case of nodeWorktablescan.c.
After doing so there's no need for ExecAssignResultType() and
ExecAssignResultTypeFromTL() anymore, as all former callers have been
converted to create a slot with a fixed descriptor.
When creating a slot with a fixed descriptor, tts_values/isnull can be
allocated together with the main slot, reducing allocation overhead
and increasing cache density a bit.
Author: Andres Freund
Discussion: https://postgr.es/m/20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de
|
|
|
|
|
|
|
|
| |
Doing so causes EXPLAIN ANALYZE to show trigger statistics multiple
times. Commit 2f178441044be430f6b4d626e4dae68a9a6f6cec seems to
be to blame for this.
Amit Langote, revieed by Amit Khandekar, Etsuro Fujita, and me.
|
|
|
|
|
|
| |
Etsuro Fujita
Discussion: http://postgr.es/m/5A7981EA.8020201@lab.ntt.co.jp
|
|
|
|
|
|
|
| |
Report by buildfarm member skink and Tom Lane. Analysis by me.
Patch by Amit Khandekar.
Discussion: http://postgr.es/m/CAJ3gD9fVA1iXQYhfqHP5n_TEd4U9=V8TL_cc-oKRnRmxgdvJrQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint. In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one. This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented. (There is a pending patch to improve the
situation further, but it needs more review.)
Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me. A few final revisions by me.
Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
At present, we always raise an ERROR if the partition constraint
is violated, but a pending patch for UPDATE tuple routing will
consider instead moving the tuple to the correct partition.
Refactor to make that simpler.
Amit Khandekar, reviewed by Amit Langote, David Rowley, and me.
Discussion: http://postgr.es/m/CAJ3gD9cue54GbEzfV-61nyGpijvjZgCcghvLsB0_nL8Nm8HzCA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of having ExecSetupPartitionTupleRouting return multiple out
parameters, have it return a pointer to a structure containing all of
those different things. Also, provide and use a cleanup function,
ExecCleanupTupleRouting, instead of cleaning up all of the resources
allocated by ExecSetupPartitionTupleRouting individually.
Amit Khandekar, reviewed by Amit Langote, David Rowley, and me
Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com
|
|
|
|
| |
Backpatch-through: certain files through 9.3
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 8355a011a0124bdf7ccbada206a967d427039553 was reverted in
f05230752d53c4aa74cffa9b699983bbb6bcb118, but this attempt is
hopefully better-considered: we now pass the correct value to
ExecOpenIndices, which should avoid the crash that we hit before.
Amit Langote, reviewed by Simon Riggs and by me. Some final
editing by me.
Discussion: http://postgr.es/m/7ff1e8ec-dc39-96b1-7f47-ff5965dceeac@lab.ntt.co.jp
|