| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two code paths of tablecmds.c (for relations with storage and without
storage) use the same logic to check if the move of a relation to a
new tablespace is allowed or not and to update pg_class.reltablespace
and pg_class.relfilenode.
A potential TABLESPACE clause for REINDEX, CLUSTER and VACUUM FULL needs
similar checks to make sure that nothing is moved around in illegal ways
(no mapped relations, shared relations only in pg_global, no move of
temp tables owned by other backends).
This reorganizes the existing code of ALTER TABLE so as all this logic
is controlled by two new routines that can be reused for the other
commands able to move relations across tablespaces, limiting the number
of code paths in need of the same protections. This also removes some
code that was duplicated for tables with and without storage for ALTER
TABLE.
Author: Alexey Kondratov, Michael Paquier
Discussion: https://postgr.es/m/YA+9mAMWYLXJMVPL@paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SPI_execute_with_receiver and SPI_cursor_parse_open_with_paramlist are
new in v14 (cf. commit 2f48ede08). Before they can get out the door,
let's change their APIs to follow the practice recently established by
SPI_prepare_extended etc: shove all optional arguments into a struct
that callers are supposed to pre-zero. The hope is to allow future
addition of more options without either API breakage or a continuing
proliferation of new SPI entry points. With that in mind, choose
slightly more generic names for them: SPI_execute_extended and
SPI_cursor_parse_open respectively.
Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch essentially is cleaning up technical debt left behind
by the original implementation of plpgsql procedures, particularly
commit d92bc83c4. That patch (or more precisely, follow-on patches
fixing its worst bugs) forced us to re-plan CALL and DO statements
each time through, if we're in a non-atomic context. That wasn't
for any fundamental reason, but just because use of a saved plan
requires having a ResourceOwner to hold a reference count for the
plan, and we had no suitable resowner at hand, nor would the
available APIs support using one if we did. While it's not that
expensive to create a "plan" for CALL/DO, the cycles do add up
in repeated executions.
This patch therefore makes the following API changes:
* GetCachedPlan/ReleaseCachedPlan are modified to let the caller
specify which resowner to use to pin the plan, rather than forcing
use of CurrentResourceOwner.
* spi.c gains a "SPI_execute_plan_extended" entry point that lets
callers say which resowner to use to pin the plan. This borrows the
idea of an options struct from the recently added SPI_prepare_extended,
hopefully allowing future options to be added without more API breaks.
This supersedes SPI_execute_plan_with_paramlist (which I've marked
deprecated) as well as SPI_execute_plan_with_receiver (which is new
in v14, so I just took it out altogether).
* I also took the opportunity to remove the crude hack of letting
plpgsql reach into SPI private data structures to mark SPI plans as
"no_snapshot". It's better to treat that as an option of
SPI_prepare_extended.
Now, when running a non-atomic procedure or DO block that contains
any CALL or DO commands, plpgsql creates a ResourceOwner that
will be used to pin the plans of the CALL/DO commands. (In an
atomic context, we just use CurrentResourceOwner, as before.)
Having done this, we can just save CALL/DO plans normally,
whether or not they are used across transaction boundaries.
This seems to be good for something like 2X speedup of a CALL
of a trivial procedure with a few simple argument expressions.
By restricting the creation of an extra ResourceOwner like this,
there's essentially zero penalty in cases that can't benefit.
Pavel Stehule, with some further hacking by me
Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
|
|
|
|
|
| |
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/c94be044-818f-15e3-1ad3-7a7ae2dfed0a@iki.fi
|
|
|
|
|
|
|
|
| |
I chanced to notice that this dumped core due to a faulty Assert.
To add insult to injury, the output has been misformatted since v11.
Obviously we need some regression testing here.
Discussion: https://postgr.es/m/d1cc628c-3953-4209-957b-29427acc38c8@www.fastmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Up until now, we've held this lock when performing a checkpoint or
restartpoint, but commit 076a055acf3c55314de267c62b03191586d79cf6 back
in 2004 and commit 7e48b77b1cebb9a43f9fdd6b17128a0ba36132f9 from 2009,
taken together, have removed all need for this. In the present code,
there's only ever one process entitled to attempt a checkpoint: either
the checkpointer, during normal operation, or the postmaster, during
single-user operation. So, we don't need the lock.
One possible concern in making this change is that it means that
a substantial amount of code where HOLD_INTERRUPTS() was previously
in effect due to the preceding LWLockAcquire() will now be
running without that. This could mean that ProcessInterrupts()
gets called in places from which it didn't before. However, this
seems unlikely to do very much, because the checkpointer doesn't
have any signal mapped to die(), so it's not clear how,
for example, ProcDiePending = true could happen in the first
place. Similarly with ClientConnectionLost and recovery conflicts.
Also, if there are any such problems, we might want to fix them
rather than reverting this, since running lots of code with
interrupt handling suspended is generally bad.
Patch by me, per an inquiry by Amul Sul. Review by Tom Lane
and Michael Paquier.
Discussion: http://postgr.es/m/CAAJ_b97XnBBfYeSREDJorFsyoD1sHgqnNuCi=02mNQBUMnA=FA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both heapgettup() and heapgettup_pagemode() incorrectly set the first page
to scan in a backward scan in which the number of pages to scan was
specified by heap_setscanlimits(). The code incorrectly started the scan
at the end of the relation when startBlk was 0, or otherwise at
startBlk - 1, neither of which is correct when only scanning a subset of
pages.
The fix here checks if heap_setscanlimits() has changed the number of
pages to scan and if so we set the first page to scan as the final page in
the specified range during backward scans.
Proper adjustment of this code was forgotten when heap_setscanlimits() was
added in 7516f5259 back in 9.5. However, practice, nowhere in core code
performs backward scans after having used heap_setscanlimits(), yet, it is
possible an extension uses the heap functions in this way, hence
backpatch.
An upcoming patch does use heap_setscanlimits() with backward scans, so
this must be fixed before that can go in.
Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvpGc9h0_oVD2CtgBcxCS1N-qDYZSeBRnUh+0CWJA9cMaA@mail.gmail.com
Backpatch-through: 9.5, all supported versions
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 69bd60672 fixed the initialization of streamed transactions for
RelationSyncEntry. It forgot to initialize the publication actions while
invalidating the RelationSyncEntry due to which even though the relation
is dropped from a particular publication we still publish its changes. Fix
it by initializing pubactions when entry got invalidated.
Author: Japin Li and Bharath Rupireddy
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CALj2ACV+0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds code omitted from commit 7db0cd2145 by accident, which had
two consequences. Firstly, only rows inserted by heap_multi_insert were
frozen as expected when running COPY FREEZE, while heap_insert left
rows unfrozen. That however includes rows in TOAST tables, so a lot of
data might have been left unfrozen. Secondly, page might have been left
partially empty after relcache invalidation.
This addresses both of those issues.
Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'd omitted this from the grammar in commit c9d529848, figuring that
it wasn't worth supporting. However we already have one complaint,
so it seems that judgment was wrong. It doesn't require a huge
amount of code, so add it back. (I'm still drawing the line at
UNION/INTERSECT/EXCEPT though: those'd require an unreasonable
amount of grammar refactoring, and the single-result-row restriction
makes them near useless anyway.)
Also rethink the documentation: this behavior is a property of
all pl/pgsql expressions, not just assignments.
Discussion: https://postgr.es/m/20210122134106.e94c5cd7@mail.verfriemelt.org
|
|
|
|
|
|
|
|
|
| |
Calls to LWLockWaitForVar() fired the TRACE_POSTGRESQL_LWLOCK_ACQUIRE
tracepoint, but LWLockWaitForVar() never actually acquires the LWLock.
(Probably a copy/paste bug in 68a2e52bbaf.) Remove it.
Author: Craig Ringer <craig.ringer@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAGRY4nxJo+-HCC2i5H93ttSZ4gZO-FSddCwvkb-qAfQ1zdXd1w@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The callback for retrieving state change information during connection
setup was only installed when the connection was mostly set up, and
thus didn't provide much information and missed all the details related
to the handshake.
This also extends the callback with SSL_state_string_long() to print
more information about the state change within the SSL object handled.
While there, fix some comments which were incorrectly referring to the
callback and its previous location in fe-secure.c.
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/232CF476-94E1-42F1-9408-719E2AEC5491@yesql.se
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, pull_varnos() took the relids of a PlaceHolderVar as being
equal to the relids in its contents, but that fails to account for the
possibility that we have to postpone evaluation of the PHV due to outer
joins. This could result in a malformed plan. The known cases end up
triggering the "failed to assign all NestLoopParams to plan nodes"
sanity check in createplan.c, but other symptoms may be possible.
The right value to use is the join level we actually intend to evaluate
the PHV at. We can get that from the ph_eval_at field of the associated
PlaceHolderInfo. However, there are some places that call pull_varnos()
before the PlaceHolderInfos have been created; in that case, fall back
to the conservative assumption that the PHV will be evaluated at its
syntactic level. (In principle this might result in missing some legal
optimization, but I'm not aware of any cases where it's an issue in
practice.) Things are also a bit ticklish for calls occurring during
deconstruct_jointree(), but AFAICS the ph_eval_at fields should have
reached their final values by the time we need them.
The main problem in making this work is that pull_varnos() has no
way to get at the PlaceHolderInfos. We can fix that easily, if a
bit tediously, in HEAD by passing it the planner "root" pointer.
In the back branches that'd cause an unacceptable API/ABI break for
extensions, so leave the existing entry points alone and add new ones
with the additional parameter. (If an old entry point is called and
encounters a PHV, it'll fall back to using the syntactic level,
again possibly missing some valid optimization.)
Back-patch to v12. The computation is surely also wrong before that,
but it appears that we cannot reach a bad plan thanks to join order
restrictions imposed on the subquery that the PlaceHolderVar came from.
The error only became reachable when commit 4be058fe9 allowed trivial
subqueries to be collapsed out completely, eliminating their join order
restrictions.
Per report from Stephan Springl.
Discussion: https://postgr.es/m/171041.1610849523@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
| |
ExecInitModifyTable has to initialize batching for all result relations,
not just the first one. Furthermore, when junk filters were necessary,
the pointer pointed past the mtstate->resultRelInfo array.
Per reports from multiple non-x86 animals (florican, locust, ...).
Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extends the FDW API to allow batching inserts into foreign tables. That
is usually much more efficient than inserting individual rows, due to
high latency for each round-trip to the foreign server.
It was possible to implement something similar in the regular FDW API,
but it was inconvenient and there were issues with reporting the number
of actually inserted rows etc. This extends the FDW API with two new
functions:
* GetForeignModifyBatchSize - allows the FDW picking optimal batch size
* ExecForeignBatchInsert - inserts a batch of rows at once
Currently, only INSERT queries support batching. Support for DELETE and
UPDATE may be added in the future.
This also implements batching for postgres_fdw. The batch size may be
specified using "batch_size" option both at the server and table level.
The initial patch version was written by me, but it was rewritten and
improved in many ways by Takayuki Tsunakawa.
Author: Takayuki Tsunakawa
Reviewed-by: Tomas Vondra, Amit Langote
Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 9eb5607e699, I got the condition on checking for split or
deleted page wrong: I used && instead of ||. The comment correctly said
"concurrent split _or_ deletion".
As a result, GiST insertion could miss a concurrent split, and insert to
wrong page. Duncan Sands demonstrated this with a test script that did a
lot of concurrent inserts.
Backpatch to v12, where this was introduced. REINDEX is required to fix
indexes that were affected by this bug.
Backpatch-through: 12
Reported-by: Duncan Sands
Discussion: https://www.postgresql.org/message-id/a9690483-6c6c-3c82-c8ba-dc1a40848f11%40deepbluecap.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Specifying duplicated objects in this command would lead to unique
constraint violations in pg_default_acl or "tuple already updated by
self" errors. Similarly to GRANT/REVOKE, increment the command ID after
each subcommand processing to allow this case to work transparently.
A regression test is added by tweaking one of the existing queries of
privileges.sql to stress this case.
Reported-by: Andrus
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/ae2a7dc1-9d71-8cba-3bb9-e4cb7eb1f44e@hot.ee
Backpatch-through: 9.5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Somebody extended search_plan_tree() to treat MergeAppend exactly
like Append, which is 100% wrong, because unlike Append we can't
assume that only one input node is actively returning tuples.
Hence a cursor using a MergeAppend across a UNION ALL or inheritance
tree could falsely match a WHERE CURRENT OF query at a row that
isn't actually the cursor's current output row, but coincidentally
has the same TID (in a different table) as the current output row.
Delete the faulty code; this means that such a case will now return
an error like 'cursor "foo" is not a simply updatable scan of table
"bar"', instead of silently misbehaving. Users should not find that
surprising though, as the same cursor query could have failed that way
already depending on the chosen plan. (It would fail like that if the
sort were done with an explicit Sort node instead of MergeAppend.)
Expand the clearly-inadequate commentary to be more explicit about
what this code is doing, in hopes of forestalling future mistakes.
It's been like this for awhile, so back-patch to all supported
branches.
Discussion: https://postgr.es/m/482865.1611075182@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
| |
This is a leftover from commit 0926e96c49. Changing this separately
because this file is being modified for upcoming patch logical replication
of 2PC.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Ps+EgG8KzcmAyAgBUi_vuTps6o9ZA8DG6SdnO0-YuOhPQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
execCurrent.c's search_plan_tree() assumed that ForeignScanStates
and CustomScanStates necessarily have a valid ss_currentRelation.
This is demonstrably untrue for postgres_fdw's remote join and
remote aggregation plans, and non-leaf custom scans might not have
an identifiable scan relation either. Avoid crashing by ignoring
such nodes when the field is null.
This solution will lead to errors like 'cursor "foo" is not a
simply updatable scan of table "bar"' in cases where maybe we
could have allowed WHERE CURRENT OF to work. That's not an issue
for postgres_fdw's usages, since joins or aggregations would render
WHERE CURRENT OF invalid anyway. But an otherwise-transparent
upper level custom scan node might find this annoying. When and if
someone cares to expend work on such a scenario, we could invent a
custom-scan-provider callback to determine what's safe.
Report and patch by David Geier, commentary by me. It's been like
this for awhile, so back-patch to all supported branches.
Discussion: https://postgr.es/m/0253344d-9bdd-11c4-7f0d-d88c02cd7991@swarm64.com
|
|
|
|
|
|
|
|
|
|
| |
This is better style and more symmetrical with the other if-branch.
This likely should have been included in 9de77b545 (which created
the opportunity), but it was overlooked.
Japin Li
Discussion: https://postgr.es/m/MEYP282MB16699FA4A7CD57EB250E871FB6A40@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
|
|
|
|
|
|
|
|
| |
We had bytea btrim() already, but for some reason not the other two.
Joel Jacobson
Discussion: https://postgr.es/m/d10cd5cd-a901-42f1-b832-763ac6f7ff3a@www.fastmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the per-barrier-type functions tasked with absorbing
them were expected to always succeed and never throw an error.
However, that's a bit inconvenient. Further study has revealed that
there are realistic cases where it might not be possible to absorb
a ProcSignalBarrier without terminating the transaction, or even
the whole backend. Similarly, for some barrier types, there might
be other reasons where it's not reasonably possible to absorb the
barrier at certain points in the code, so provide a way for a
per-barrier-type function to reject absorbing the barrier.
Unfortunately, there's still no committed code making use of this
infrastructure; hopefully, we'll get there. :-(
Patch by me, reviewed by Andres Freund and Amul Sul.
Discussion: http://postgr.es/m/20200908182005.xya7wetdh3pndzim@alap3.anarazel.de
Discussion: http://postgr.es/m/CA+Tgmob56Pk1-5aTJdVPCWFHon7me4M96ENpGe9n_R4JUjjhZA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record. The standby then checks whether its own settings are at least
as big as the ones on the primary. If not, the standby shuts down
with a fatal error.
This patch changes this behavior for hot standbys to pause recovery at
that point instead. That allows read traffic on the standby to
continue while database administrators figure out next steps. When
recovery is unpaused, the server shuts down (as before). The idea is
to fix the parameters while recovery is paused and then restart when
there is a maintenance window.
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This continues the work done in b5913f6. All the options of those
commands are changed to use hex values rather than enums to reduce the
risk of compatibility bugs when introducing new options. Each option
set is moved into a new structure that can be extended with more
non-boolean options (this was already the case of VACUUM). The code of
REINDEX is restructured so as manual REINDEX commands go through a
single routine from utility.c, like VACUUM, to ease the allocation
handling of option parameters when a command needs to go through
multiple transactions.
This can be used as a base infrastructure for future patches related to
those commands, including reindex filtering and tablespace support.
Per discussion with people mentioned below, as well as Alvaro Herrera
and Peter Eisentraut.
Author: Michael Paquier, Justin Pryzby
Reviewed-by: Alexey Kondratov, Justin Pryzby
Discussion: https://postgr.es/m/X8riynBLwxAD9uKk@paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the
visibility map. Until now we only marked individual tuples as frozen,
but page-level flags were not updated, so the first VACUUM after the
COPY FREEZE had to rewrite the whole table.
This is a fairly old patch, and multiple people worked on it. The first
version was written by Jeff Janes, and then reworked by Pavan Deolasee
and Anastasia Lubennikova.
Author: Anastasia Lubennikova, Pavan Deolasee, Jeff Janes
Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada,
Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii,
Darafei Praliaskouski
Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com
Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This add counters for number of sessions, the different kind of session
termination types, and timers for how much time is spent in active vs
idle in a database to pg_stat_database.
Internally this also renames the parameter "force" to disconnect. This
was the only use-case for the parameter before, so repurposing it to
this mroe narrow usecase makes things cleaner than inventing something
new.
Author: Laurenz Albe
Reviewed-By: Magnus Hagander, Soumyadeep Chakraborty, Masahiro Ikeda
Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.camel@cybertec.at
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every core SLRU wraps around. With the exception of pg_notify, the wrap
point can fall in the middle of a page. Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback. Update each callback implementation to fit the new
specification. This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.) The bug fixed here has the same symptoms and user
followup steps as 592a589a04bd456410b853d86bd05faa9432cbbb. Back-patch
to 9.5 (all supported versions).
Reviewed by Andrey Borodin and (in earlier versions) by Tom Lane.
Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com
|
|
|
|
|
|
|
|
|
|
| |
The result of TextDatumGetCString is already palloc'ed so we don't need to
allocate memory for it again. We decide not to backpatch it as there
doesn't seem to be any case where it can create a meaningful leak.
Author: Zhijie Hou
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/229fed2eb8c54c71a96ccb99e516eb12@G08CNEXMBPEKD05.g08.fujitsu.local
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a check that CREATE STATISTICS does not add extended statistics on
system catalogs, similarly to indexes etc. It can be overriden using
the allow_system_table_mods GUC.
This bug exists since 7b504eb282c, adding the extended statistics, so
backpatch all the way back to PostgreSQL 10.
Author: Tomas Vondra
Reported-by: Dean Rasheed
Backpatch-through: 10
Discussion: https://postgr.es/m/CAEZATCXAPrrOKwEsyZKQ4uzzJQWBCt6QAvOcgqRGdWwT1zb%2BrQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is like commit c98763bf51bf, but for REINDEX CONCURRENTLY. To wit:
this flags indicates that the current process is safe to ignore for the
purposes of waiting for other snapshots, when doing CREATE INDEX
CONCURRENTLY and REINDEX CONCURRENTLY. This helps two processes doing
either of those things not deadlock, and also avoids spurious waits.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
| |
Commit ac883ac453 refactored shm_toc_estimate() but changed its calculation
of shared memory size for TOC incorrectly. Previously this could cause too
large memory to be allocated.
Back-patch to v11 where the bug was introduced.
Author: Takayuki Tsunakawa
Discussion: https://postgr.es/m/TYAPR01MB2990BFB73170E2C4921E2C4DFEA80@TYAPR01MB2990.jpnprd01.prod.outlook.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The counter tracking the last segment number recycled was getting
initialized when recycling one single segment, while it should be used
across a full cycle of segments recycled to prevent useless checks
related to entries already recycled.
This performance issue has been introduced by b2a5545, and it was first
implemented in 61b86142.
No backpatch is done per the lack of field complaints.
Reported-by: Andres Freund, Thomas Munro
Author: Michael Paquier
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/20170621211016.eln6cxxp3jrv7m4m@alap3.anarazel.de
Discussion: https://postgr.es/m/CA+hUKG+DRiF9z1_MU4fWq+RfJMxP7zjoptfcmuCFPeO4JM2iVg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a tablespace is used in a partitioned relation (per commits
ca4103025dfe in pg12 for tables and 33e6c34c3267 in pg11 for indexes),
it is possible to drop the tablespace, potentially causing various
problems. One such was reported in bug #16577, where a rewriting ALTER
TABLE causes a server crash.
Protect against this by using pg_shdepend to keep track of tablespaces
when used for relations that don't keep physical files; we now abort a
tablespace if we see that the tablespace is referenced from any
partitioned relations.
Backpatch this to 11, where this problem has been latent all along. We
don't try to create pg_shdepend entries for existing partitioned
indexes/tables, but any ones that are modified going forward will be
protected.
Note slight behavior change: when trying to drop a tablespace that
contains both regular tables as well as partitioned ones, you'd
previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll
get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST. Arguably, the latter is more
correct.
It is possible to add protecting pg_shdepend entries for existing
tables/indexes, by doing
ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default;
ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace;
for each partitioned table/index that is not in the database default
tablespace. Because these partitioned objects do not have storage, no
file needs to be actually moved, so it shouldn't take more time than
what's required to acquire locks.
This query can be used to search for such relations:
SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 709d003fbd refactored WAL-reading code, but accidentally caused
WalSndSegmentOpen() to fail to follow a timeline switch while reading from
a historic timeline. This issue caused a standby to fail to follow a primary
on a newer timeline when WAL archiving is enabled.
If there is a timeline switch within the segment, WalSndSegmentOpen() should
read from the WAL segment belonging to the new timeline. But previously
since it failed to follow a timeline switch, it tried to read the WAL segment
with old timeline. When WAL archiving is enabled, that WAL segment with
old timeline doesn't exist because it's renamed to .partial. This leads
a primary to have tried to read non-existent WAL segment, and which caused
replication to faill with the error "ERROR: requested WAL segment ... has
already been removed".
This commit fixes WalSndSegmentOpen() so that it's able to follow a timeline
switch, to ensure that a standby is able to follow a primary on a newer
timeline even when WAL archiving is enabled.
This commit also adds the regression test to check whether a standby is
able to follow a primary on a newer timeline when WAL archiving is enabled.
Back-patch to v13 where the bug was introduced.
Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi, tweaked by Fujii Masao
Reviewed-by: Alvaro Herrera, Fujii Masao
Discussion: https://postgr.es/m/20201209.174314.282492377848029776.horikyota.ntt@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit addresses some issues with c3826f83 that moved the hex
decoding routine to src/common/:
- The decoding function lacked overflow checks, so when used for
security-related features it was an open door to out-of-bound writes if
not carefully used that could remain undetected. Like the base64
routines already in src/common/ used by SCRAM, this routine is reworked
to check for overflows by having the size of the destination buffer
passed as argument, with overflows checked before doing any writes.
- The encoding routine was missing. This is moved to src/common/ and
it gains the same overflow checks as the decoding part.
On failure, the hex routines of src/common/ issue an error as per the
discussion done to make them usable by frontend tools, but not by shared
libraries. Note that this is why ECPG is left out of this commit, and
it still includes a duplicated logic doing hex encoding and decoding.
While on it, this commit uses better variable names for the source and
destination buffers in the existing escape and base64 routines in
encode.c and it makes them more robust to overflow detection. The
previous core code issued a FATAL after doing out-of-bound writes if
going through the SQL functions, which would be enough to detect
problems when working on changes that impacted this area of the
code. Instead, an error is issued before doing an out-of-bound write.
The hex routines were being directly called for bytea conversions and
backup manifests without such sanity checks. The current calls happen
to not have any problems, but careless uses of such APIs could easily
lead to CVE-class bugs.
Author: Bruce Momjian, Michael Paquier
Reviewed-by: Sehrope Sarkuni
Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
macOS's ranlib issued a warning about an empty pread.o file with the
previous arrangement, on systems new enough to require no replacement
functions. Let's go back to using configure's AC_REPLACE_FUNCS system
to build and include each .o in the library only if it's needed, which
requires moving the *v() functions to their own files.
Also move the _with_retry() wrapper to a more permanent home.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1283127.1610554395%40sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Teach nbtree and heapam to cooperate in order to eagerly remove
duplicate tuples representing dead MVCC versions. This is "bottom-up
deletion". Each bottom-up deletion pass is triggered lazily in response
to a flood of versions on an nbtree leaf page. This usually involves a
"logically unchanged index" hint (these are produced by the executor
mechanism added by commit 9dc718bd).
The immediate goal of bottom-up index deletion is to avoid "unnecessary"
page splits caused entirely by version duplicates. It naturally has an
even more useful effect, though: it acts as a backstop against
accumulating an excessive number of index tuple versions for any given
_logical row_. Bottom-up index deletion complements what we might now
call "top-down index deletion": index vacuuming performed by VACUUM.
Bottom-up index deletion responds to the immediate local needs of
queries, while leaving it up to autovacuum to perform infrequent clean
sweeps of the index. The overall effect is to avoid certain
pathological performance issues related to "version churn" from UPDATEs.
The previous tableam interface used by index AMs to perform tuple
deletion (the table_compute_xid_horizon_for_tuples() function) has been
replaced with a new interface that supports certain new requirements.
Many (perhaps all) of the capabilities added to nbtree by this commit
could also be extended to other index AMs. That is left as work for a
later commit.
Extend deletion of LP_DEAD-marked index tuples in nbtree by adding logic
to consider extra index tuples (that are not LP_DEAD-marked) for
deletion in passing. This increases the number of index tuples deleted
significantly in many cases. The LP_DEAD deletion process (which is now
called "simple deletion" to clearly distinguish it from bottom-up
deletion) won't usually need to visit any extra table blocks to check
these extra tuples. We have to visit the same table blocks anyway to
generate a latestRemovedXid value (at least in the common case where the
index deletion operation's WAL record needs such a value).
Testing has shown that the "extra tuples" simple deletion enhancement
increases the number of index tuples deleted with almost any workload
that has LP_DEAD bits set in leaf pages. That is, it almost never fails
to delete at least a few extra index tuples. It helps most of all in
cases that happen to naturally have a lot of delete-safe tuples. It's
not uncommon for an individual deletion operation to end up deleting an
order of magnitude more index tuples compared to the old naive approach
(e.g., custom instrumentation of the patch shows that this happens
fairly often when the regression tests are run).
Add a further enhancement that augments simple deletion and bottom-up
deletion in indexes that make use of deduplication: Teach nbtree's
_bt_delitems_delete() function to support granular TID deletion in
posting list tuples. It is now possible to delete individual TIDs from
posting list tuples provided the TIDs have a tableam block number of a
table block that gets visited as part of the deletion process (visiting
the table block can be triggered directly or indirectly). Setting the
LP_DEAD bit of a posting list tuple is still an all-or-nothing thing,
but that matters much less now that deletion only needs to start out
with the right _general_ idea about which index tuples are deletable.
Bump XLOG_PAGE_MAGIC because xl_btree_delete changed.
No bump in BTREE_VERSION, since there are no changes to the on-disk
representation of nbtree indexes. Indexes built on PostgreSQL 12 or
PostgreSQL 13 will automatically benefit from bottom-up index deletion
(i.e. no reindexing required) following a pg_upgrade. The enhancement
to simple deletion is available with all B-Tree indexes following a
pg_upgrade, no matter what PostgreSQL version the user upgrades from.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzm+maE3apHB8NOtmM=p-DO65j2V5GzAWCOEEuy3JZgb2g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add an executor aminsert() hint mechanism that informs index AMs that
the incoming index tuple (the tuple that accompanies the hint) is not
being inserted by execution of an SQL statement that logically modifies
any of the index's key columns.
The hint is received by indexes when an UPDATE takes place that does not
apply an optimization like heapam's HOT (though only for indexes where
all key columns are logically unchanged). Any index tuple that receives
the hint on insert is expected to be a duplicate of at least one
existing older version that is needed for the same logical row. Related
versions will typically be stored on the same index page, at least
within index AMs that apply the hint.
Recognizing the difference between MVCC version churn duplicates and
true logical row duplicates at the index AM level can help with cleanup
of garbage index tuples. Cleanup can intelligently target tuples that
are likely to be garbage, without wasting too many cycles on less
promising tuples/pages (index pages with little or no version churn).
This is infrastructure for an upcoming commit that will teach nbtree to
perform bottom-up index deletion. No index AM actually applies the hint
just yet.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=CEKFa74EScx_hFVshCOn6AA5T-ajFASTdzipdkLTNQQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up of the work done in commit 0650ff2303. This commit
extends log_recovery_conflict_waits so that a log message is produced
also when recovery conflict has already been resolved after deadlock_timeout
passes, i.e., when the startup process finishes waiting for recovery
conflict after deadlock_timeout. This is useful in investigating how long
recovery conflicts prevented the recovery from applying WAL.
Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Bertrand Drouvot
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The memory for the snapshot was leaked while serializing it to disk during
logical decoding. This memory will be freed only once walsender stops
streaming the changes. This can lead to a huge memory increase when master
logs Standby Snapshot too frequently say when the user is trying to create
many replication slots.
Reported-by: funnyxj.fxj@alibaba-inc.com
Diagnosed-by: funnyxj.fxj@alibaba-inc.com
Author: Amit Kapila
Backpatch-through: 9.5
Discussion: https://postgr.es/m/033ab54c-6393-42ee-8ec9-2b399b5d8cde.funnyxj.fxj@alibaba-inc.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to commit d6ad34f341, this patch optimizes
DropRelFileNodesAllBuffers() by avoiding the complete buffer pool scan and
instead find the buffers to be invalidated by doing lookups in the
BufMapping table.
This optimization helps operations where the relation files need to be
removed like Truncate, Drop, Abort of Create Table, etc.
Author: Kirk Jamison
Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila
Tested-By: Haiying Tang
Discussion: https://postgr.es/m/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com
|
|
|
|
|
| |
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACUDXLAkf_XxQO9tAUtnTNGi3Lmd8fANd+vBJbcHn1HoWA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
This struct is used by ReindexRelationConcurrently to keep track of the
relations to process. This saves having to obtain some data repeatedly,
and has future uses as well.
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql
|
|
|
|
|
|
|
|
| |
This comment has been wrong since its introduction in commit
2c03216d8311.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoAzz6qipFJBbGEaHmyWxvvNDp8httbwLR9tUQWaTjUs2Q@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
We missed closing the relation descriptor while sending changes via the
root of partitioned relations during logical replication.
Author: Amit Langote and Mark Zhao
Reviewed-by: Amit Kapila and Ashutosh Bapat
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/tencent_41FEA657C206F19AB4F406BE9252A0F69C06@qq.com
Discussion: https://postgr.es/m/tencent_6E296D2F7D70AFC90D83353B69187C3AA507@qq.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The recovery path of DropRelFileNodeBuffers() is optimized so that
scanning of the whole buffer pool can be avoided when the number of
blocks to be truncated in a relation is below a certain threshold. For
such cases, we find the buffers by doing lookups in BufMapping table.
This improves the performance by more than 100 times in many cases
when several small tables (tested with 1000 relations) are truncated
and where the server is configured with a large value of shared
buffers (greater than equal to 100GB).
This optimization helps cases (a) when vacuum or autovacuum truncated off
any of the empty pages at the end of a relation, or (b) when the relation is
truncated in the same transaction in which it was created.
This commit introduces a new API smgrnblocks_cached which returns a cached
value for the number of blocks in a relation fork. This helps us to determine
the exact size of relation which is required to apply this optimization. The
exact size is required to ensure that we don't leave any buffer for the
relation being dropped as otherwise the background writer or checkpointer
can lead to a PANIC error while flushing buffers corresponding to files that
don't exist.
Author: Kirk Jamison based on ideas by Amit Kapila
Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila
Tested-By: Haiying Tang
Discussion: https://postgr.es/m/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move it to class 57 (Operator Intervention), which seems like a
better choice given that from the client's standpoint it behaves
a heck of a lot like, e.g., ERRCODE_ADMIN_SHUTDOWN.
In a green field I'd put ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT
here as well. But that's been around for a few years, so it's
probably too late to change its SQLSTATE code.
Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of making many block-sized write() calls to fill a new WAL file
with zeroes, make a smaller number of pwritev() calls (or various
emulations). The actual number depends on the OS's IOV_MAX, which
PG_IOV_MAX currently caps at 32. That means we'll write 256kB per call
on typical systems. We may want to tune the number later with more
experience.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brenext(), when parsing a '*' quantifier, forgot to return any "value"
for the token; per the equivalent case in next(), it should return
value 1 to indicate that greedy rather than non-greedy behavior is
wanted. The result is that the compiled regexp could behave like 'x*?'
rather than the intended 'x*', if we were unlucky enough to have
a zero in v->nextvalue at this point. That seems to happen with some
reliability if we have '.*' at the beginning of a BRE-mode regexp,
although that depends on the initial contents of a stack-allocated
struct, so it's not guaranteed to fail.
Found by Alexander Lakhin using valgrind testing. This bug seems
to be aboriginal in Spencer's code, so back-patch all the way.
Discussion: https://postgr.es/m/16814-6c5e3edd2bdf0d50@postgresql.org
|