| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous coding here threw an error from assign_client_encoding
if it was invoked in a parallel worker. That's a very fundamental
violation of the GUC hook API: assign hooks must not throw errors.
The place to complain is in the check hook, so move the test to
there, and use the regular check-hook API (ie return false) to
report it.
The reason this coding is a problem is that it breaks GUC rollback,
which may occur after we leave InitializingParallelWorker state.
That case seems not actually reachable before now, but commit
f5f30c22e made it reachable, so we need to fix this before that
can be un-reverted.
In passing, improve the commentary in ParallelWorkerMain, and
add a check for failure of SetClientEncoding. That's another
case that can't happen now but might become possible after
foreseeable code rearrangements (notably, if the shortcut of
skipping PrepareClientEncoding stops being OK).
Discussion: https://postgr.es/m/18545-feba138862f19aaa@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed. This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.
The queue of waiters is stored in the shared memory as an LSN-ordered pairing
heap, where the waiter with the nearest LSN stays on the top. During
the replay of WAL, waiters whose LSNs have already been replayed are deleted
from the shared memory pairing heap and woken up by setting their latches.
pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise,
the snapshot could prevent the replay of WAL records, implying a kind of
self-deadlock. This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working without an active snapshot,
not a function.
Catversion is bumped.
Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is used in the startup process to check that the pgstats file we
are reading includes the redo LSN referring to the shutdown checkpoint
where it has been written. The redo LSN in the pgstats file needs to
match with what the control file has.
This is intended to be used for an upcoming change that will extend the
write of the stats file to happen during checkpoints, rather than only
shutdown sequences.
Bump PGSTAT_FILE_FORMAT_ID.
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/Zp8o6_cl0KSgsnvS@paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit f5f30c22ed69fb37b896c4d4546b2ab823c3fd61.
Some buildfarm animals are failing with "cannot change
"client_encoding" during a parallel operation". It looks like
assign_client_encoding is unhappy at being asked to roll back a
client_encoding setting after a parallel worker encounters a
failure. There must be more to it though: why didn't I see this
during local testing? In any case, it's clear that moving the
RestoreGUCState() call is not as side-effect-free as I thought.
Given that the bug f5f30c22e intended to fix has gone unreported
for years, it's not something that's urgent to fix; I'm not
willing to risk messing with it further with only days to our
next release wrap.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Parallel workers failed after a sequence like
BEGIN;
CREATE USER foo;
SET SESSION AUTHORIZATION foo;
because check_session_authorization could not see the uncommitted
pg_authid row for "foo". This is because we ran RestoreGUCState()
in a separate transaction using an ordinary just-created snapshot.
The same disease afflicts any other GUC that requires catalog lookups
and isn't forgiving about the lookups failing.
To fix, postpone RestoreGUCState() into the worker's main transaction
after we've set up a snapshot duplicating the leader's. This affects
check_transaction_isolation and check_transaction_deferrable, which
think they should only run during transaction start. Make them
act like check_transaction_read_only, which already knows it should
silently accept the value when InitializingParallelWorker.
Per bug #18545 from Andrey Rachitskiy. Back-patch to all
supported branches, because this has been wrong for awhile.
Discussion: https://postgr.es/m/18545-feba138862f19aaa@postgresql.org
|
|
|
|
|
|
|
| |
It's only used very locally within the function.
Reviewed-by: Robert Haas
Discussion: https://www.postgresql.org/message-id/7f86e06a-98c5-4ce3-8ec9-3885c8de0358@iki.fi
|
|
|
|
|
|
|
|
|
|
| |
The function is used only once in the startup process, so the leak
into current memory context is harmless.
This is a tiny step in making the server thread-safe.
Reviewed-by: Robert Haas
Discussion: https://www.postgresql.org/message-id/7f86e06a-98c5-4ce3-8ec9-3885c8de0358@iki.fi
|
|
|
|
|
|
|
|
|
|
| |
This is a continuation of 3937cadfd438, taking care of more areas I have
managed to miss previously.
Reported-by: Noah Misch
Reviewed-by: Noah Misch
Discussion: https://postgr.es/m/20240724130059.1f.nmisch@google.com
Backpatch-through: 17
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a standby is promoted, CleanupAfterArchiveRecovery() may decide
to rename the final WAL file from the old timeline by adding ".partial"
to the name. If WAL summarization is enabled and this file is renamed
before its partial contents are summarized, WAL summarization breaks:
the summarizer gets stuck at that point in the WAL stream and just
errors out.
To fix that, first make the startup process wait for WAL summarization
to catch up before renaming the file. Generally, this should be quick,
and if it's not, the user can shut off summarize_wal and try again.
To make this fix work, also teach the WAL summarizer that after a
promotion has occurred, no more WAL can appear on the previous
timeline: previously, the WAL summarizer wouldn't switch to the new
timeline until we actually started writing WAL there, but that meant
that when the startup process was waiting for the WAL summarizer, it
was waiting for an action that the summarizer wasn't yet prepared to
take.
In the process of fixing these bugs, I realized that the logic to wait
for WAL summarization to catch up was spread out in a way that made
it difficult to reuse properly, so this code refactors things to make
it easier.
Finally, add a test case that would have caught this bug and the
previously-fixed bug that WAL summarization sometimes needs to back up
when the timeline changes.
Discussion: https://postgr.es/m/CA+TgmoZGEsZodXC4f=XZNkAeyuDmWTSkpkjCEOcF19Am0mt_OA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The two_phase option is controlled by both the publisher (as a slot
option) and the subscriber (as a subscription option), so the slot option
must also be modified.
Changing the 'two_phase' option for a subscription from 'true' to 'false'
is permitted only when there are no pending prepared transactions
corresponding to that subscription. Otherwise, the changes of already
prepared transactions can be replicated again along with their corresponding
commit leading to duplicate data or errors.
To avoid data loss, the 'two_phase' option for a subscription can only be
changed from 'false' to 'true' once the initial data synchronization is
completed. Therefore this is performed later by the logical replication worker.
Author: Hayato Kuroda, Ajin Cherian, Amit Kapila
Reviewed-by: Peter Smith, Hou Zhijie, Amit Kapila, Vitaly Davydov, Vignesh C
Discussion: https://postgr.es/m/8fab8-65d74c80-1-2f28e880@39088166
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
clog.c, async.c and predicate.c included some SLRU page numbers still
handled as 4-byte integers, while int64 should be used for this purpose.
These holes have been introduced in 4ed8f0913bfd, that has introduced
the use of 8-byte integers for SLRU page numbers, still forgot about the
code paths updated by this commit.
Reported-by: Noah Misch
Author: Aleksander Alekseev, Michael Paquier
Discussion: https://postgr.es/m/20240626002747.dc.nmisch@google.com
Backpatch-through: 17
|
|
|
|
|
|
|
|
| |
bootstrap_data_checksum_version can just as easily be passed to where
it is used via function arguments.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
| |
slru.h described incorrectly how SLRU segment names are formatted
depending on the segment number and if long or short segment names are
used. This commit closes the gap with a better description, fitting
with the reality.
Reported-by: Noah Misch
Author: Aleksander Alekseev
Discussion: https://postgr.es/m/20240626002747.dc.nmisch@google.com
Backpatch-through: 17
|
|
|
|
|
|
| |
As per Coverity and Tom Lane, commit 402b586d0 (back-patched to v17
as 2b5819e2b) forgot to initialize this new structure member in this
code path.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To do this, we must include the wal_level in the first WAL record
covered by each summary file; so add wal_level to struct Checkpoint
and the payload of XLOG_CHECKPOINT_REDO and XLOG_END_OF_RECOVERY.
This, in turn, requires bumping XLOG_PAGE_MAGIC and, since the
Checkpoint is also stored in the control file, also
PG_CONTROL_VERSION. It's not great to do that so late in the release
cycle, but the alternative seems to ship v17 without robust
protections against this scenario, which could result in corrupted
incremental backups.
A side effect of this patch is that, when a server with
wal_level=replica is started with summarize_wal=on for the first time,
summarization will no longer begin with the oldest WAL that still
exists in pg_wal, but rather from the first checkpoint after that.
This change should be harmless, because a WAL summary for a partial
checkpoint cycle can never make an incremental backup possible when
it would otherwise not have been.
Report by Fujii Masao. Patch by me. Review and/or testing by Jakub
Wartak and Fujii Masao.
Discussion: http://postgr.es/m/6e30082e-041b-4e31-9633-95a66de76f5d@oss.nttdata.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All the errors triggered in the code paths patched here would cause the
backend to issue an internal_error errcode, which is a state that should
be used only for "can't happen" situations. However, these code paths
are reachable by the regression tests, and could be seen by users in
valid cases. Some regression tests expect internal errcodes as they
manipulate the backend state to cause corruption (like checksums), or
use elog() because it is more convenient (like injection points), these
have no need to change.
This reduces the number of internal failures triggered in a check-world
by more than half, while providing correct errcodes for these valid
cases.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/Zic_GNgos5sMxKoa@paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
| |
Using memcpy with strlen as the size parameter will not take the
NULL terminator into account, relying instead on the destination
buffer being properly initialized. Replace with strlcpy which is
a safer alternative, and more in line with how we handle copying
strings elsewhere.
Author: Ranier Vilela <ranier.vf@gmail.com>
Discussion: https://postgr.es/m/CAEudQApAsbLsQ+gGiw-hT+JwGhgogFa_=5NUkgFO6kOPxyNidQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Up to now, committing a transaction has caused CurrentMemoryContext to
get set to TopMemoryContext. Most callers did not pay any particular
heed to this, which is problematic because TopMemoryContext is a
long-lived context that never gets reset. If the caller assumes it
can leak memory because it's running in a limited-lifespan context,
that behavior translates into a session-lifespan memory leak.
The first-reported instance of this involved ProcessIncomingNotify,
which is called from the main processing loop that normally runs in
MessageContext. That outer-loop code assumes that whatever it
allocates will be cleaned up when we're done processing the current
client message --- but if we service a notify interrupt, then whatever
gets allocated before the next switch to MessageContext will be
permanently leaked in TopMemoryContext. sinval catchup interrupts
have a similar problem, and I strongly suspect that some places in
logical replication do too.
To fix this in a generic way, let's redefine the behavior as
"CommitTransactionCommand restores the memory context that was current
at entry to StartTransactionCommand". This clearly fixes the issue
for the notify and sinval cases, and it seems to match the mental
model that's in use in the logical replication code, to the extent
that anybody thought about it there at all.
For consistency, likewise make subtransaction exit restore the context
that was current at subtransaction start (rather than always selecting
the CurTransactionContext of the parent transaction level). This case
has less risk of resulting in a permanent leak than the outer-level
behavior has, but it would not meet the principle of least surprise
for some CommitTransactionCommand calls to restore the previous
context while others don't.
While we're here, also change xact.c so that we reset
TopTransactionContext at transaction exit and then re-use it in later
transactions, rather than dropping and recreating it in each cycle.
This probably doesn't save a lot given the context recycling mechanism
in aset.c, but it should save a little bit. Per suggestion from David
Rowley. (Parenthetically, the text in src/backend/utils/mmgr/README
implies that this is how I'd planned to implement it as far back as
commit 1aebc3618 --- but the code actually added in that commit just
drops and recreates it each time.)
This commit also cleans up a few places outside xact.c that were
needlessly making TopMemoryContext current, and thus risking more
leaks of the same kind. I don't think any of them represent
significant leak risks today, but let's deal with them while the
issue is top-of-mind.
Per bug #18512 from wizardbrony. Commit to HEAD only, as this change
seems to have some risk of breaking things for some callers. We'll
apply a narrower fix for the known-broken cases in the back branches.
Discussion: https://postgr.es/m/3478884.1718656625@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this commit, when the WAL summarizer started up or recovered
from an error, it would resume summarization from wherever it left
off. That was OK normally, but wrong if summarize_wal=off had been
turned off temporary, allowing some WAL to be removed, and then turned
back on again. In such cases, the WAL summarizer would simply hang
forever. This commit changes the reinitialization sequence for WAL
summarizer to rederive the starting position in the way we were
already doing at initial startup, fixing the problem.
Per report from Israel Barth Rubio. Reviewed by Tom Lane.
Discussion: http://postgr.es/m/CA+TgmoYN6x=YS+FoFOS6=nr6=qkXZFWhdiL7k0oatGwug2hcuA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We did not recover the subtransaction IDs of prepared transactions
when starting a hot standby from a shutdown checkpoint. As a result,
such subtransactions were considered as aborted, rather than
in-progress. That would lead to hint bits being set incorrectly, and
the subtransactions suddenly becoming visible to old snapshots when
the prepared transaction was committed.
To fix, update pg_subtrans with prepared transactions's subxids when
starting hot standby from a shutdown checkpoint. The snapshots taken
from that state need to be marked as "suboverflowed", so that we also
check the pg_subtrans.
Backport to all supported versions.
Discussion: https://www.postgresql.org/message-id/6b852e98-2d49-4ca1-9e95-db419a2696e0@iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. TruncateMultiXact() performs the SLRU truncations in a critical
section. Deleting the SLRU segments calls ForwardSyncRequest(), which
will try to compact the request queue if it's full
(CompactCheckpointerRequestQueue()). That in turn allocates memory,
which is not allowed in a critical section. Backtrace:
TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981
postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]
postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]
postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]
postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]
postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]
postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]
postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]
postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]
postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]
postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]
postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]
postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]
postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]
postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]
/lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]
postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]
To fix, bail out in CompactCheckpointerRequestQueue() without doing
anything, if it's called in a critical section. That covers the above
call path, as well as any other similar cases where
RegisterSyncRequest might be called in a critical section.
2. After fixing that, another problem became apparent: Autovacuum
process doing that truncation can deadlock with the checkpointer
process. TruncateMultiXact() sets "MyProc->delayChkptFlags |=
DELAY_CHKPT_START". If the sync request queue is full and cannot be
compacted, the process will repeatedly sleep and retry, until there is
room in the queue. However, if the checkpointer is trying to start a
checkpoint at the same time, and is waiting for the DELAY_CHKPT_START
processes to finish, the queue will never shrink.
More concretely, the autovacuum process is stuck here:
#0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570
#2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,
wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516
#3 0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)
at ../src/backend/storage/ipc/latch.c:538
#4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614
#5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495
#6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566
#7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006
#8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201
#9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917
#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760
#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550
#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569
and the checkpointer is stuck here:
#0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50
#3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098
#4 0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464
To fix, add AbsorbSyncRequests() to the loops where the checkpointer
waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to
finish.
Backpatch to v14. Before that, SLRU deletion didn't call
RegisterSyncRequest, which avoided this failure. I'm not sure if there
are other similar scenarios on older versions, but we haven't had
any such reports.
Discussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The macros were confused about the argument data types. All the
arguments were called 'xid', and some of the macros included casts to
TransactionId, even though the arguments were actually either
MultiXactIds or MultiXactOffsets. It compiles to the same thing,
because TransactionId, MultiXactId and MultiXactOffset are all
typedefs of uint32, but it was highly misleading.
Author: Maxim Orlov <orlovmg@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACG%3DezbLUG-OD1osAW3OchOMxZtdxHh2itYR9Zhh-a13wEBEQw%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/ff143b24-a093-40da-9833-d36b83726bdf%40iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
| |
The purpose of the function is to reduce the effective
autovacuum_multixact_freeze_max_age if the multixact members SLRU is
approaching wraparound, to make multixid freezing more aggressive.
The returned value should therefore never be greater than plain
autovacuum_multixact_freeze_max_age.
Reviewed-by: Robert Haas
Discussion: https://www.postgresql.org/message-id/85fb354c-f89f-4d47-b3a2-3cbd461c90a3@iki.fi
Backpatch-through: 12, all supported versions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ReorderBufferImmediateInvalidation() executes invalidation messages in
an aborted transaction. However, RelationFlushRelation sometimes
required a valid resource owner, to temporarily increment the refcount
of the relache entry. Commit b8bff07daa worked around that in the main
subtransaction abort function, AbortSubTransaction(), but missed this
similar case in ReorderBufferImmediateInvalidation().
To fix, introduce a separate function to invalidate a relcache
entry. It does the same thing as RelationClearRelation(rebuild==true)
does when outside a transaction, but can be called without
incrementing the refcount.
Add regression test. Before this fix, it failed with:
ERROR: ResourceOwnerEnlarge called after release started
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/e56be7d9-14b1-664d-0bfc-00ce9772721c@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After further review, we want to move in the direction of always
quoting GUC names in error messages, rather than the previous (PG16)
wildly mixed practice or the intermittent (mid-PG17) idea of doing
this depending on how possibly confusing the GUC name is.
This commit applies appropriate quotes to (almost?) all mentions of
GUC names in error messages. It partially supersedes a243569bf65 and
8d9978a7176, which had moved things a bit in the opposite direction
but which then were abandoned in a partial state.
Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com
|
|
|
|
|
| |
Author: Alexander Lakhin
Discussion: https://postgr.es/m/ae9f2fcb-4b24-5bb0-4240-efbbbd944ca1@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fefd9a3fed turned tail recursion of CommitTransactionCommand() and
AbortCurrentTransaction() into iteration. However, it splits the handling of
cases between different functions.
This commit puts the handling of all the cases into
AbortCurrentTransactionInternal() and CommitTransactionCommandInternal().
Now CommitTransactionCommand() and AbortCurrentTransaction() are just doing
the repeated calls of internal functions.
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20240415224834.w6piwtefskoh32mv%40awork3.anarazel.de
Author: Andres Freund
|
|
|
|
|
|
|
| |
This commit reverts 06c418e163, e37662f221, bf1e650806, 25f42429e2,
ee79928441, and 74eaf66f98 per review by Heikki Linnakangas.
Discussion: https://postgr.es/m/b155606b-e744-4218-bda5-29379779da1a%40iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In one multixact.c edge case, we need a mechanism to wait for one
multixact offset to be written before being allowed to read the next
one. We used to handle this case by sleeping for one millisecond and
retrying, but such sleeps have been reported as problematic in
production cases. We can avoid the problem by using a condition
variable: readers sleep on it and then every creator of multixacts
broadcasts into the CV when creation is sufficiently far along.
Author: Kyotaro Horiguchi <horikyotajntt@gmail.com>
Reviewed-by: Andrey Borodin <amborodin@acm.org>
Discussion: https://postgr.es/m/47A598F4-B4E7-4029-8FEC-A06A6C3CB4B5@yandex-team.ru
Discussion: https://postgr.es/m/20200515.090333.24867479329066911.horikyota.ntt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This tracks the position of WAL that's been fully copied into WAL
buffers by all processes emitting WAL. (For some reason we call that
"WAL insertion"). This is updated using atomic monotonic advance during
WaitXLogInsertionsToFinish, which is not when the insertions actually
occur, but it's the only place where we know where have all the
insertions have completed.
This value is useful in WALReadFromBuffers, which can verify that
callers don't try to read past what has been inserted. (However, more
infrastructure is needed in order to actually use WAL after the flush
point, since it could be lost.)
The value is also useful in WaitXLogInsertionsToFinish() itself, since
we can now exit quickly when all WAL has been already inserted, without
even having to take any locks.
|
|
|
|
|
|
|
|
|
|
|
| |
Even though waiting for replay LSN happens without explicit transaction,
AbortTransaction() is responsible for the cleanup of the shared memory if
the error is thrown in a stored procedure. So, we need to do WaitLSNCleanup()
there to clean up after some unexpected error happened while waiting for
replay LSN.
Discussion: https://postgr.es/m/202404051815.eri4u5q6oj26%40alvherre.pgsql
Author: Alvaro Herrera
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the need to hold both the info_lck spinlock and
WALWriteLock to update them. We use stock atomic write instead, with
WALWriteLock held. Readers can use atomic read, without any locking.
This allows for some code to be reordered: some places were a bit
contorted to avoid repeated spinlock acquisition, but that's no longer a
concern, so we can turn them to more natural coding. Some further
changes are possible (maybe to performance wins), but in this commit I
did rather minimal ones only, to avoid increasing the blast radius.
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/20200831182156.GA3983@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After this change we have XLogCtl->logWriteResult and ->logFlushResult.
There's no functional change, other than the fact that the assignment
from shared memory to local is no longer done via struct assignment, but
instead using a macro that copies each member separately.
The current representation is inconvenient going forward; notably, we
would like to add a new member "Copy" (to keep track of the last
position copied into WAL buffers), so the symmetry between the values in
shared memory vs. those in local would be lost.
This also gives us freedom to later change the concurrency model for the
values in shared memory: we can make them use atomics instead of relying
on the info_lck spinlock.
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/202404031119.cd2kugjk2vho@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
06c418e163 introduced pg_wal_replay_wait() procedure allowing to wait for
the particular LSN to be replayed on standby. The waiters were stored in
the flat array. Even though scanning small arrays is fast, that might be a
problem at scale (a lot of waiting processes).
This commit replaces the flat shared memory array with the pairing heap,
which holds the waiter with the least LSN at the top. This gives us O(log N)
complexity for both inserting and removing waiters.
Reported-by: Alvaro Herrera
Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
| |
This adds errcodes to a set of PANIC and FATAL errors in xlog.c
and relcache.c, which previously had no errcode at all set, in
order to make fleetwide analysis of errorlogs easier. There are
many more ereport/elogs left which could benefit from having an
errcode but this at least makes a dent in the issue.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ1k8LgLEqncPGmz_fWnrobV6bjABOTH4tOWta6xNcPQig@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed before starting the transaction.
This option is useful when the user makes some data changes on primary and
needs a guarantee to see these changes on standby.
The queue of waiters is stored in the shared memory array sorted by LSN.
During replay of WAL waiters whose LSNs are already replayed are deleted from
the shared memory array and woken up by setting of their latches.
pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise,
the snapshot could prevent the replay of WAL records implying a kind of
self-deadlock. This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working in a non-atomic context,
not a function.
Catversion is bumped.
Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow use of BeginInternalSubTransaction() in parallel mode, so long
as the subtransaction doesn't attempt to acquire an XID or increment
the command counter. Given those restrictions, the other parallel
processes don't need to know about the subtransaction at all, so
this should be safe. The benefit is that it allows subtransactions
intended for error recovery, such as pl/pgsql exception blocks,
to be used in PARALLEL SAFE functions.
Another reason for doing this is that the API of
BeginInternalSubTransaction() doesn't allow reporting failure.
pl/python for one, and perhaps other PLs, copes very poorly with an
error longjmp out of BeginInternalSubTransaction(). The headline
feature of this patch removes the only easily-triggerable failure
case within that function. There remain some resource-exhaustion
and similar cases, which we now deal with by promoting them to FATAL
errors, so that callers need not try to clean up. (It is likely
that such errors would leave us with corrupted transaction state
inside xact.c, making recovery difficult if not impossible anyway.)
Although this work started because of a report of a pl/python crash,
we're not going to do anything about that in the back branches.
Back-patching this particular fix is obviously not very wise.
While we could contemplate some narrower band-aid, pl/python is
already an untrusted language, so it seems okay to classify this
as a "so don't do that" case.
Patch by me, per report from Hao Zhang. Thanks to Robert Haas for
review.
Discussion: https://postgr.es/m/CALY6Dr-2yLVeVPhNMhuBnRgOZo1UjoTETgtKBx1B2gUi8yy+3g@mail.gmail.com
|
|
|
|
|
|
|
|
| |
Similar to commit 7e735035f20.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-WhpCFMbXCjtJ%2BFzmjfPrp7Hw1pk4p%2BZpU95Kh3ofZ1A%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two FATALs and one PANIC gain details about the LSNs they fail at:
- When restoring from a backup_label, the FATAL log generated when not
finding the checkpoint record now reports its LSN.
- When restoring from a backup_label, the FATAL log generated when not
finding the redo record referenced by a checkpoint record now shows both
the redo and checkpoint record LSNs.
- When not restoring from a backup_label, the PANIC error generated when
not finding the checkpoint record now reports its LSN.
This information is useful when debugging corruption issues, and these
LSNs may not show up in the logs depending on the level of logging
configured in the backend.
Author: David Steele
Discussion: https://postgr.es/m/0e90da89-77ca-4ccf-872c-9626d755e288@pgmasters.net
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function recurses, but didn't perform stack-depth checks. It's
just a debugging aid, so instead of the usual check_stack_depth()
call, stop the printing if we'd risk stack overflow.
Here's an example of how to test this:
(n=1000000; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT s$i;"; done; printf "SET log_min_messages = 'DEBUG5'; SAVEPOINT sp;") | psql >/dev/null
In the passing, swap building the list of child XIDs and recursing to
parent. That saves memory while recursing, reducing the risk of out of
memory errors with lots of subtransactions. The saving is not very
significant in practice, but this order seems more logical anyway.
Report by Egor Chindyaskin and Alexander Lakhin.
Discussion: https://www.postgresql.org/message-id/1672760457.940462079%40f306.i.mail.ru
Author: Heikki Linnakangas
Reviewed-by: Robert Haas, Andres Freund, Alexander Korotkov
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Usually the compiler will optimize away the tail recursion anyway, but
if it doesn't, you can drive the function into stack overflow. For
example:
(n=1000000; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT s$i;"; done; printf "ERROR; COMMIT;") | psql >/dev/null
In order to get better readability and less changes to the existing code the
recursion-replacing loop is implemented as a wrapper function.
Report by Egor Chindyaskin and Alexander Lakhin.
Discussion: https://postgr.es/m/1672760457.940462079%40f306.i.mail.ru
Author: Alexander Korotkov, Heikki Linnakangas
|
|
|
|
|
|
|
| |
Remove an extra & operator, per Tom Lane. My bugs, introduced with
commit 53c2a97a9266.
Discussion: https://postgr.es/m/3885480.1709590472@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
| |
When this code was written the duplicity didn't matter, but with all the
SLRU-bank stuff we just added, it has become excessive. Turn it into a
simpler loop with no code duplication. Also add a test so that this
code becomes covered.
Discussion: https://postgr.es/m/202403041517.3a35jw53os65@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
| |
After commit 53c2a97a9266, the code flow around the "retry" goto label
in GetMultiXactIdMembers was confused about what was possible: we never
return there with a held lock, so there's no point in testing for one.
This realization lets us simplify the code a bit. While at it, make the
scope of a couple of local variables in the same function a bit tighter.
Per Coverity.
|
|
|
|
|
|
|
|
|
| |
New code in 53c2a97a9266 uses direct array access to
shared->bank_locks[bankno].lock which can be made a little bit more
legible by using the SimpleLruGetBankLock helper function.
Nothing terribly serious, but let's add some clarity.
Discussion: https://postgr.es/m/202403041517.3a35jw53os65@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
as determined by include-what-you-use (IWYU)
While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that. In some cases, a more
specific #include replaces another less specific one.
Some manual adjustments of the automatic result:
- IWYU currently doesn't know about includes that provide global
variable declarations (like -Wmissing-variable-declarations), so
those includes are being kept manually.
- All includes for port(ability) headers are being kept for now, to
play it safe.
- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
patch from exploding in size.
Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.
As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.
Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
|
|
|
|
|
|
|
| |
The pid was originally used in error context of messages propagated
from parallel workers, but commit 292794f82b removed that. If the need
arises in the future, you can also get the pid with
"shm_mq_get_sender(pcxt->worker[i].error_mqh)->pid".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This test serves as a way to demonstrate how to use the features
introduced in 37b369dc67bc, while providing coverage for 7863ee4def65
that caused the startup process to throw "PANIC: could not locate a
valid checkpoint record" when starting recovery. The test checks that a
node is able to properly restart following a crash when a restart point
was finishing across a promotion, with an injection point added in the
middle of CreateRestartPoint() to stop the restartpoint in flight. Note
that this test fails when 7863ee4def65 is reverted.
Kyotaro Horiguchi is the original author of this test, that has been
originally posted on the thread where 7863ee4def65 was discussed. I
have just upgraded and polished it to rely on injection points, making
it much cheaper to reproduce the failure.
This test requires injection points to be enabled in the builds, hence
meson and ./configure need an update to pass this knowledge down to the
test. The name of the new injection point follows the same naming
convention as 6a1ea02c491d. The Makefile's EXTRA_INSTALL of recovery
TAP tests is updated to include modules/injection_points.
Author: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: Andrey Borodin, Bertrand Drouvot
Discussion: https://postgr.es/m/ZdLuxBk5hGpol91B@paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that BackendId was just another index into the proc array, it was
redundant with the 0-based proc numbers used in other places. Replace
all usage of backend IDs with proc numbers.
The only place where the term "backend id" remains is in a few pgstat
functions that expose backend IDs at the SQL level. Those IDs are now
in fact 0-based ProcNumbers too, but the documentation still calls
them "backend ids". That term still seems appropriate to describe what
the numbers are, so I let it be.
One user-visible effect is that pg_temp_0 is now a valid temp schema
name, for backend with ProcNumber 0.
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, backend ID was an index into the ProcState array, in the
shared cache invalidation manager (sinvaladt.c). The entry in the
ProcState array was reserved at backend startup by scanning the array
for a free entry, and that was also when the backend got its backend
ID. Things become slightly simpler if we redefine backend ID to be the
index into the PGPROC array, and directly use it also as an index to
the ProcState array. This uses a little more memory, as we reserve a
few extra slots in the ProcState array for aux processes that don't
need them, but the simplicity is worth it.
Aux processes now also have a backend ID. This simplifies the
reservation of BackendStatusArray and ProcSignal slots.
You can now convert a backend ID into an index into the PGPROC array
simply by subtracting 1. We still use 0-based "pgprocnos" in various
places, for indexes into the PGPROC array, but the only difference now
is that backend IDs start at 1 while pgprocnos start at 0. (The next
commmit will get rid of the term "backend ID" altogether and make
everything 0-based.)
There is still a 'backendId' field in PGPROC, now part of 'vxid' which
encapsulates the backend ID and local transaction ID together. It's
needed for prepared xacts. For regular backends, the backendId is
always equal to pgprocno + 1, but for prepared xact PGPROC entries,
it's the ID of the original backend that processed the transaction.
Reviewed-by: Andres Freund, Reid Thompson
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
|