| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce TimestampDifferenceMilliseconds() to simplify callers
that would rather have the difference in milliseconds, instead of
the select()-oriented seconds-and-microseconds format. This gets
rid of at least one integer division per call, and it eliminates
some apparently-easy-to-mess-up arithmetic.
Two of these call sites were in fact wrong:
* pg_prewarm's autoprewarm_main() forgot to multiply the seconds
by 1000, thus ending up with a delay 1000X shorter than intended.
That doesn't quite make it a busy-wait, but close.
* postgres_fdw's pgfdw_get_cleanup_result() thought it needed to compute
microseconds not milliseconds, thus ending up with a delay 1000X longer
than intended. Somebody along the way had noticed this problem but
misdiagnosed the cause, and imposed an ad-hoc 60-second limit rather
than fixing the units. This was relatively harmless in context, because
we don't care that much about exactly how long this delay is; still,
it's wrong.
There are a few more callers of TimestampDifference() that don't
have a direct need for seconds-and-microseconds, but can't use
TimestampDifferenceMilliseconds() either because they do need
microsecond precision or because they might possibly deal with
intervals long enough to overflow 32-bit milliseconds. It might be
worth inventing another API to improve that, but that seems outside
the scope of this patch; so those callers are untouched here.
Given the fact that we are fixing some bugs, and the likelihood
that future patches might want to back-patch code that uses this
new API, back-patch to all supported branches.
Alexey Kondratov and Tom Lane
Discussion: https://postgr.es/m/3b1c053a21c07c1ed5e00be3b2b855ef@postgrespro.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Specifically, this blocks DECLARE ... WITH HOLD and firing of deferred
triggers within index expressions and materialized view queries. An
attacker having permission to create non-temp objects in at least one
schema could execute arbitrary SQL functions under the identity of the
bootstrap superuser. One can work around the vulnerability by disabling
autovacuum and not manually running ANALYZE, CLUSTER, REINDEX, CREATE
INDEX, VACUUM FULL, or REFRESH MATERIALIZED VIEW. (Don't restore from
pg_dump, since it runs some of those commands.) Plain VACUUM (without
FULL) is safe, and all commands are fine when a trusted user owns the
target object. Performance may degrade quickly under this workaround,
however. Back-patch to 9.5 (all supported versions).
Reviewed by Robert Haas. Reported by Etienne Stalmans.
Security: CVE-2020-25695
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Avoid pointlessly highlighting that an index vacuum was executed by a
parallel worker; user doesn't care.
* Don't give the impression that a non-concurrent reindex of an invalid
index on a TOAST table would work, because it wouldn't.
* Add a "translator:" comment for a mysterious message.
Discussion: https://postgr.es/m/20201107034943.GA16596@alvherre.pgsql
Reviewed-by: Michael Paquier <michael@paquier.xyz>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brin_form_tuple failed to consider the values may be toasted, inserting
the toast pointer into the index. This may easily result in index
corruption, as the toast data may be deleted and cleaned up by vacuum.
The cleanup however does not care about indexes, leaving invalid toast
pointers behind, which triggers errors like this:
ERROR: missing chunk number 0 for toast value 16433 in pg_toast_16426
A less severe consequence are inconsistent failures due to the index row
being too large, depending on whether brin_form_tuple operated on plain
or toasted version of the row. For example
CREATE TABLE t (val TEXT);
INSERT INTO t VALUES ('... long value ...')
CREATE INDEX idx ON t USING brin (val);
would likely succeed, as the row would likely include toast pointer.
Switching the order of INSERT and CREATE INDEX would likely fail:
ERROR: index row size 8712 exceeds maximum 8152 for index "idx"
because this happens before the row values are toasted.
The bug exists since PostgreSQL 9.5 where BRIN indexes were introduced.
So backpatch all the way back.
Author: Tomas Vondra
Reviewed-by: Alvaro Herrera
Backpatch-through: 9.5
Discussion: https://postgr.es/m/20201001184133.oq5uq75sb45pu3aw@development
Discussion: https://postgr.es/m/20201104010544.zexj52mlldagzowv%40development
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Logic for counting heap TIDs from posting list tuples (added by commit
0d861bbb) was faulty. It didn't count any TIDs/index tuples in the
event of no callback being set. This meant that we incorrectly counted
no index tuples in clean-up only VACUUMs, which could lead to
pg_class.reltuples being spuriously set to 0 in affected indexes.
To fix, go back to counting items from the page in cases where there is
no callback. This approach isn't very accurate, but it works well
enough in practice while avoiding the expense of accessing every index
tuple during cleanup-only VACUUMs.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
https://postgr.es/m/20201023174451.69e358f1@firost
Backpatch: 13-, where nbtree deduplication was introduced
|
|
|
|
|
|
|
|
|
|
| |
Certain background workers initiate parallel queries while
debug_query_string==NULL, at which point they attempted strlen(NULL) and
died to SIGSEGV. Older debug_query_string observers allow NULL, so do
likewise in these newer ones. Back-patch to v11, where commit
7de4a1bcc56f494acbd0d6e70781df877dc8ecb5 introduced the first of these.
Discussion: https://postgr.es/m/20201014022636.GA1962668@rfd.leadboat.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gistRelocateBuildBuffersOnSplit did not get the memo about which
attribute count to use. This could lead to a crash if there were
included columns and buffering build was chosen. (Because there
are random page-split decisions elsewhere in GiST index build,
the crashes are not entirely deterministic.)
Back-patch to v12 where GiST gained support for included columns.
Pavel Borisov
Discussion: https://postgr.es/m/CALT9ZEECCV5m7wvxg46PC-7x-EybUmnpupBGhSFMoAAay+r6HQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Harmonize behavior by moving reponsibility for fsyncing directories down
into slru.c. In 10 and later, only the multixact directories were
missed (see commit 1b02be21), and in older branches all SLRUs were
missed.
Back-patch to all supported releases.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGLtsTUOScnNoSMZ-2ZLv%2BwGh01J6kAo_DM8mTRq1sKdSQ%40mail.gmail.com
|
|
|
|
|
|
|
|
| |
99% of this is docs, but also a couple of comments. No code changes.
Justin Pryzby
Discussion: https://postgr.es/m/20200919175804.GE30557@telsasoft.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
After commits 85f6b49c2c and 3ba59ccc89, we can allow parallel inserts
which was earlier not possible as parallel group members won't conflict
for relation extension and page lock. In those commits, we forgot to
update comments at few places.
Author: Amit Kapila
Reviewed-by: Robert Haas and Dilip Kumar
Backpatch-through: 13
Discussion: https://postgr.es/m/CAFiTN-tMrQh5FFMPx5aWJ+1gi1H6JxktEhq5mDwCHgnEO5oBkA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For parallel btree scan to work for array of scan keys, it should reach
BTPARALLEL_DONE state once for every distinct combination of array keys.
This is required to ensure that the parallel workers don't try to seize
blocks at the same time for different scan keys. We missed to update this
state when we discovered that the scan keys can't be satisfied.
Author: James Hunter
Reviewed-by: Amit Kapila
Tested-by: Justin Pryzby
Backpatch-through: 10, where it was introduced
Discussion: https://postgr.es/m/4248CABC-25E3-4809-B4D0-128E1BAABC3C@amazon.com
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
collectMatchBitmap() needs to re-find the index tuple it was previously
looking at, after transiently dropping lock on the index page it's on.
The tuple should still exist and be at its prior position or somewhere
to the right of that, since ginvacuum never removes tuples but
concurrent insertions could add one. However, there was a thinko in
that logic, to the effect of expecting any inserted tuples to have the
same index "attnum" as what we'd been scanning. Since there's no
physical separation of tuples with different attnums, it's not terribly
hard to devise scenarios where this fails, leading to transient "lost
saved point in index" errors. (While I've duplicated this with manual
testing, it seems impossible to make a reproducible test case with our
available testing technology.)
Fix by just continuing the scan when the attnum doesn't match.
While here, improve the error message used if we do fail, so that it
matches the wording used in btree for a similar case.
collectMatchBitmap()'s posting-tree code path was previously not
exercised at all by our regression tests. While I can't make
a regression test that exhibits the bug, I can at least improve
the code coverage here, so do that. The test case I made for this
is an extension of one added by 4b754d6c1, so it only works in
HEAD and v13; didn't seem worth trying hard to back-patch it.
Per bug #16595 from Jesse Kinkead. This has been broken since
multicolumn capability was added to GIN (commit 27cb66fdf),
so back-patch to all supported branches.
Discussion: https://postgr.es/m/16595-633118be8eef9ce2@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were displaying the wrong phase information for 'info' message in the
index clean up phase because we were switching to the previous phase a bit
early. We were also not displaying context information for heap phase
unless the block number is valid which is fine for error cases but for
messages at 'info' or lower error level it appears to be inconsistent with
index phase information.
Reported-by: Sawada Masahiko
Author: Sawada Masahiko
Reviewed-by: Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/CA+fd4k4HcbhPnCs7paRTw1K-AHin8y4xKomB9Ru0ATw0UeTy2w@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SimpleLruTruncate() header comment states the new coding rule. To
achieve this, add locktype "frozenid" and two LWLocks. This closes a
rare opportunity for data loss, which manifested as "apparent
wraparound" or "could not access status of transaction" errors. Data
loss is more likely in pg_multixact, due to released branches' thin
margin between multiStopLimit and multiWrapLimit. If a user's physical
replication primary logged ": apparent wraparound" messages, the user
should rebuild standbys of that primary regardless of symptoms. At less
risk is a cluster having emitted "not accepting commands" errors or
"must be vacuumed" warnings at some point. One can test a cluster for
this data loss by running VACUUM FREEZE in every database. Back-patch
to 9.5 (all supported versions).
Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a table is scanned by heapam_index_build_range_scan (née
IndexBuildHeapScan) and the table lock being held allows concurrent data
changes, it is possible for new HOT chains to sprout in a page that were
unknown when the scan of a page happened. This leads to an error such
as
ERROR: failed to find parent tuple for heap-only tuple at (X,Y) in table "tbl"
because the root tuple was not present when we first obtained the list
of the page's root tuples. This can be fixed by re-obtaining the list
of root tuples, if we see that a heap-only tuple appears to point to a
non-existing root.
This was reported by Anastasia as occurring for BRIN summarization
(which exists since 9.5), but I think it could theoretically also happen
with CREATE INDEX CONCURRENTLY (much older) or REINDEX CONCURRENTLY
(very recent). It seems a happy coincidence that BRIN forces us to
backpatch this all the way to 9.5.
Reported-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
Diagnosed-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
Co-authored-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/602d8487-f0b2-5486-0088-0f372b2549fa@postgrespro.ru
Backpatch: 9.5 - master
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a page range is desummarized at just the right time concurrently with
an index walk, BRIN would raise an error indicating index corruption.
This is scary and unhelpful; silently returning that the page range is
not summarized is sufficient reaction.
This bug was introduced by commit 975ad4e602ff as additional protection
against a bug whose actual fix was elsewhere. Backpatch equally.
Reported-By: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
Diagnosed-By: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/2588667e-d07d-7e10-74e2-7e1e46194491@postgrespro.ru
Backpatch: 9.5 - master
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
max_slot_wal_keep_size that was added in v13 and wal_keep_segments are
the GUC parameters to specify how much WAL files to retain for
the standby servers. While max_slot_wal_keep_size accepts the number of
bytes of WAL files, wal_keep_segments accepts the number of WAL files.
This difference of setting units between those similar parameters could
be confusing to users.
To alleviate this situation, this commit renames wal_keep_segments to
wal_keep_size, and make users specify the WAL size in it instead of
the number of WAL files.
There was also the idea to rename max_slot_wal_keep_size to
max_slot_wal_keep_segments, in the discussion. But we have been moving
away from measuring in segments, for example, checkpoint_segments was
replaced by max_wal_size. So we concluded to rename wal_keep_segments
to wal_keep_size.
Back-patch to v13 where max_slot_wal_keep_size was added.
Author: Fujii Masao
Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, David Steele
Discussion: https://postgr.es/m/574b4ea3-e0f9-b175-ead2-ebea7faea855@oss.nttdata.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no advantage to attempting deduplication for a unique index
during CREATE INDEX, since there cannot possibly be any duplicates.
Doing so wastes cycles due to unnecessary copying. Make sure that we
avoid it consistently.
We already avoided unique index deduplication in the case where there
were some spool2 tuples to merge. That didn't account for the fact that
spool2 is removed early/unset in the common case where it has no tuples
that need to be merged (i.e. it failed to account for the "spool2 turns
out to be unnecessary" optimization in _bt_spools_heapscan()).
Oversight in commit 0d861bbb, which added nbtree deduplication
Backpatch: 13-, where nbtree deduplication was introduced.
|
|
|
|
|
|
|
|
|
|
|
| |
Incorrect function names were referenced. As this fixes some portions
of tableam.h, that is mentioned in the docs as something to look at when
implementing a table AM, backpatch down to 12 where this has been
introduced.
Author: Hironobu Suzuki
Discussion: https://postgr.es/m/8fe6d672-28dd-3f1d-7aed-ac2f6d599d3f@interdb.jp
Backpatch-through: 12
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove previous hack in KeepLogSeg that added a case to deal with a
(badly represented) invalid segment number. This was added for the sake
of GetWALAvailability. But it's not needed if in that function we
initialize the segment number to be retreated to the currently being
written segment, so do that instead.
Per valgrind-running buildfarm member skink, and some sparc64 animals.
Discussion: https://postgr.es/m/1724648.1594230917@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
| |
This message was being emitted on the grounds that only crashed
summarization could cause it, but in reality even an aborted vacuum
could do it ... which makes it way too noisy, particularly since it
shows up in regression tests and makes them die.
Reported by Tom Lane.
Discussion: https://postgr.es/m/489091.1593534251@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
| |
Since slot_keep_segs indicates the number of WAL segments not LSN,
its datatype should not be XLogRecPtr.
Back-patch to v13 where this issue was added.
Reported-by: Atsushi Torikoshi
Author: Atsushi Torikoshi, tweaked by Fujii Masao
Discussion: https://postgr.es/m/ebd0d674f3e050222238a960cac5251a@oss.nttdata.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous definition of the column was almost universally disliked,
so provide this updated definition which is more useful for monitoring
purposes: a large positive value is good, while zero or a negative value
means danger. This should be operationally more convenient.
Backpatch to 13, where the new column to pg_replication_slots (and the
feature it represents) were added.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://postgr.es/m/9ddfbf8c-2f67-904d-44ed-cf8bc5916228@oss.nttdata.com
|
|
|
|
|
|
|
| |
Author: Vignesh C
Reviewed-by: Sawada Masahiko
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/CALDaNm3Ppt71NafGY5mk3V2i3Q+mm93pVibDq-0NpW7WU67Jcg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
Use separate functions to save and restore error context information as
that made code easier to understand. Also, make it clear that the index
information required for error context is sane.
Author: Andres Freund, Justin Pryzby, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/CAA4eK1LWo+v1OWu=Sky27GTGSCuOmr7iaURNbc5xz6jO+SaPeA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 0d861bbb, which added deduplication to nbtree, had
_bt_check_unique() pass a TID to table_index_fetch_tuple_check() that
isn't safe to mutate. table_index_fetch_tuple_check()'s tid argument is
modified when the TID in question is not the latest visible tuple in a
hot chain, though this wasn't documented.
To fix, go back to using a local copy of the TID in _bt_check_unique(),
and update comments above table_index_fetch_tuple_check().
Backpatch: 13-, where B-Tree deduplication was introduced.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In pg_replication_slot, change output from normal/reserved/lost to
reserved/extended/unreserved/ lost, which better expresses the possible
states particularly near the time where segments are no longer safe but
checkpoint has not run yet.
Under the new definition, reserved means the slot is consuming WAL
that's still under the normal WAL size constraints; extended means it's
consuming WAL that's being protected by wal_keep_segments or the slot
itself, whose size is below max_slot_wal_keep_size; unreserved means the
WAL is no longer safe, but checkpoint has not yet removed those files.
Such as slot is in imminent danger, but can still continue for a little
while and may catch up to the reserved WAL space.
Also, there were some bugs in the calculations used to report the
status; fixed those.
Backpatch to 13.
Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200616.120236.1809496990963386593.horikyota.ntt@gmail.com
|
|
|
|
|
|
|
| |
The current definition is dangerous. No bugs exist in our code at
present, but backpatch to 11 nonetheless where it was introduced.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
|
|
|
|
|
|
|
|
|
|
| |
spg_mask() didn't take into account that pd_lower equal to SizeOfPageHeaderData
is still valid value. This commit fixes that. Backpatch to 11, where
spg_mask() pg_lower check was introduced.
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/20200615131405.GM52676%40paquier.xyz
Backpatch-through: 11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was possible for deduplication's single value strategy to mistakenly
believe that a very small duplicate tuple counts as one of the six large
tuples that it aims to leave behind after the page finally splits. This
could cause slightly suboptimal space utilization with very low
cardinality indexes, though only under fairly narrow conditions.
To fix, be particular about what kind of tuple counts as a
maxpostingsize-capped tuple. This avoids confusion in the event of a
small tuple that gets "wedged" between two large tuples, where all
tuples on the page are duplicates of the same value.
Discussion: https://postgr.es/m/CAH2-Wz=Y+sgSFc-O3LpiZX-POx2bC+okec2KafERHuzdVa7-rQ@mail.gmail.com
Backpatch: 13-, where deduplication was introduced (by commit 0d861bbb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Convert buffile.c error handling to use ereport. This fixes cases where
I/O errors were indistinguishable from EOF or not reported. Also remove
"%m" from error messages where errno would be bogus. While we're
modifying those strings, add block numbers and short read byte counts
where appropriate.
Back-patch to all supported releases.
Reported-by: Amit Khandekar <amitdkhan.pg@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGJE04G%3D8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw%40mail.gmail.com
|
|
|
|
|
| |
Reported-By: Tom Lane
Discussion: https://postgr.es/m/841649.1592065060@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rewrite the documentation of these functions, in light of recent bug fix
commit 5940ffb2.
Back-patch to 13 where the check-for-conflict-out code was split up into
AM-specific and generic parts, and new documentation was added that now
looked wrong.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443%40jepsen.io
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SSI's HeapCheckForSerializableConflictOut() test failed to correctly
handle conditions involving a concurrently inserted tuple which is later
concurrently updated by a separate transaction . A SELECT statement
that called HeapCheckForSerializableConflictOut() could end up using the
same XID (updater's XID) for both the original tuple, and the successor
tuple, missing the XID of the xact that created the original tuple
entirely. This only happened when neither tuple from the chain was
visible to the transaction's MVCC snapshot.
The observable symptoms of this bug were subtle. A pair of transactions
could commit, with the later transaction failing to observe the effects
of the earlier transaction (because of the confusion created by the
update to the non-visible row). This bug dates all the way back to
commit dafaa3ef, which added SSI.
To fix, make sure that we check the xmin of concurrently inserted tuples
that happen to also have been updated concurrently.
Author: Peter Geoghegan
Reported-By: Kyle Kingsbury
Reviewed-By: Thomas Munro
Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443@jepsen.io
Backpatch: All supported versions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The redo routines for XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} must acquire
ControlFileLock before modifying ControlFile->checkPointCopy, or the
checkpointer could write out a control file with a bad checksum.
Likewise, XLogReportParameters() must acquire ControlFileLock before
modifying ControlFile and calling UpdateControlFile().
Back-patch to all supported releases.
Author: Nathan Bossart <bossartn@amazon.com>
Author: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/70BF24D6-DC51-443F-B55A-95735803842A%40amazon.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since database connections can be used with WAL senders in 9.4, it is
possible to use physical replication. This commit fixes a crash when
starting physical replication with a WAL sender using a database
connection, caused by the refactoring done in 850196b.
There have been discussions about forbidding the use of physical
replication in a database connection, but this is left for later,
taking care only of the crash new to 13.
While on it, add a test to check for a failure when attempting logical
replication if the WAL sender does not have a database connection. This
part is extracted from a larger patch by Kyotaro Horiguchi.
Reported-by: Vladimir Sitnikov
Author: Michael Paquier, Kyotaro Horiguchi
Reviewed-by: Kyotaro Horiguchi, Álvaro Herrera
Discussion: https://postgr.es/m/CAB=Je-GOWMj1PTPkeUhjqQp-4W3=nW-pXe2Hjax6rJFffB5_Aw@mail.gmail.com
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 624686abcf8 added an assertion that verified that _bt_search
successfully relocated the leaf page undergoing deletion. Page deletion
cannot deal with the case where the descent stack is to the right of the
page, so this seemed critical (deletion can only handle the case where
the descent stack is to the left of the leaf/target page). However, the
assertion went a bit too far.
Since only a buffer pin is held on the leaf page throughout the call to
_bt_search, nothing guarantees that it can't have split during this
small window. And if does actually split, _bt_search may end up
"relocating" a page to the right of the original target leaf page. This
scenario seems extremely unlikely, but it must still be considered.
Remove the assertion, and document how we cope in this scenario.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Synchronize the event names for parallel hash join waits with other
event names, by getting rid of the slashes and dropping "-ing"
suffixes. Rename ClogGroupUpdate to XactGroupUpdate, to match the
new SLRU name. Move the ProcSignalBarrier event to the IPC category;
it doesn't belong under IO.
Also a bit more wordsmithing in the wait event documentation tables.
Discussion: https://postgr.es/m/4505.1589640417@sss.pgh.pa.us
|
|
|
|
|
|
|
| |
This is just to provide a clean basis for comparison of the results
of the new version. I did fix a typo that crept into 242dfcbaf.
Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Choose names that fit into the conventions for wait event names
(particularly, that multi-word names are in the style MultiWordName)
and hopefully convey more information to non-hacker users than the
previous names did.
Also rename SerializablePredicateLockListLock to
SerializablePredicateListLock; the old name was long enough to cause
table formatting problems, plus the double occurrence of "Lock" seems
confusing/error-prone.
Also change a couple of particularly opaque LWLock field names.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_bt_killitems marks btree items dead when a scan leaves the page where
they live, but it does so with only share lock (to improve concurrency).
This was historicall okay, since killing a dead item has no
consequences. However, with the advent of data checksums and
wal_log_hints, this action incurs a WAL full-page-image record of the
page. Multiple concurrent processes would write the same page several
times, leading to WAL bloat. The probability of this happening can be
reduced by only killing items if they're not already dead, so change the
code to do that.
The problem could eliminated completely by having _bt_killitems upgrade
to exclusive lock upon seeing a killable item, but that would reduce
concurrency so it's considered a cure worse than the disease.
Backpatch all the way back to 9.5, since wal_log_hints was introduced in
9.4.
Author: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: https://postgr.es/m/CA+fd4k6PeRj2CkzapWNrERkja5G0-6D-YQiKfbukJV+qZGFZ_Q@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, the names assigned to SLRUs had no purpose other than
being shmem lookup keys, so not a lot of thought went into them.
As of v13, though, we're exposing them in the pg_stat_slru view and
the pg_stat_reset_slru function, so it seems advisable to take a bit
more care. Rename them to names based on the associated on-disk
storage directories (which fortunately we *did* think about, to some
extent; since those are also visible to DBAs, consistency seems like
a good thing). Also rename the associated LWLocks, since those names
are likewise user-exposed now as wait event names.
For the most part I only touched symbols used in the respective modules'
SimpleLruInit() calls, not the names of other related objects. This
renaming could have been taken further, and maybe someday we will do so.
But for now it seems undesirable to change the names of any globally
visible functions or structs, so some inconsistency is unavoidable.
(But I *did* terminate "oldserxid" with prejudice, as I found that
name both unreadable and not descriptive of the SLRU's contents.)
Table 27.12 needs re-alphabetization now, but I'll leave that till
after the other LWLock renamings I have in mind.
Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
| |
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.
Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is little point in using the LWLockRegisterTranche mechanism for
built-in tranche names. It wastes cycles, it creates opportunities for
bugs (since failing to register a tranche name is a very hard-to-detect
problem), and the lack of any centralized list of names encourages
sloppy nonconformity in name choices. Moreover, since we have a
centralized list of the tranches anyway in enum BuiltinTrancheIds, we're
certainly not buying any flexibility in return for these disadvantages.
Hence, nuke all the backend-internal LWLockRegisterTranche calls,
and instead provide a const array of the builtin tranche names.
(I have in mind to change a bunch of these names shortly, but this
patch is just about getting them into one place.)
Discussion: https://postgr.es/m/9056.1589419765@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
| |
The additional pain from level 4 is excessive for the gain.
Also revert all the source annotation changes to their original
wordings, to avoid back-patching pain.
Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of re-identifying which statistics bucket to use for a given
SLRU on every counter increment, do it once during shmem initialization.
This saves a fair number of cycles, and there's no real cost because
we could not have a bucket assignment that varies over time or across
backends anyway.
Also, get rid of the ill-considered decision to let pgstat.c pry
directly into SLRU's shared state; it's cleaner just to have slru.c
pass the stats bucket number.
In consequence of these changes, there's no longer any need to store
an SLRU's LWLock tranche info in shared memory, so get rid of that,
making this a net reduction in shmem consumption. (That partly
reverts fe702a7b3.)
This is basically code review for 28cac71bd, so I also cleaned up
some comments, removed a dangling extern declaration, fixed some
things that should be static and/or const, etc.
Discussion: https://postgr.es/m/3618.1589313035@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Have both physical and logical walsender share a 'xlogreader' state
struct for tracking state. This replaces the existing globals sendSeg
and sendCxt.
* Change WALRead not to receive XLogReaderState->seg and ->segcxt as
separate arguments anymore; just use the ones from 'state'. This is
made possible by the above change.
* have the XLogReader segment_open contract require the callbacks to
install the file descriptor in the state struct themselves instead of
returning it. xlogreader was already ignoring any possible failed
return from the callbacks, relying solely on them never returning.
(This point is not altogether excellent, as it means the callbacks
have to know more of XLogReaderState; but to really improve on that
we would have to pass back error info from the callbacks to
xlogreader. And the complexity would not be saved but instead just
transferred to the callers of WALRead, which would have to learn how
to throw errors from the open_segment callback in addition of, as
currently, from pg_pread.)
* segment_open no longer receives the 'segcxt' as a separate argument,
since it's part of the XLogReaderState argument.
Per comments from Kyotaro Horiguchi.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200511203336.GA9913@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use it at level 4, a bit more restrictive than the default level, and
tweak our commanding comments to FALLTHROUGH.
(However, leave zic.c alone, since it's external code; to avoid the
warnings that would appear there, change CFLAGS for that file in the
Makefile.)
Author: Julien Rouhaud <rjuju123@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20200412081825.qyo5vwwco3fv4gdo@nol
Discussion: https://postgr.es/m/flat/E1fDenm-0000C8-IJ@gemulon.postgresql.org
|
|
|
|
|
|
|
|
|
|
|
| |
The existing callers of XLogReadDetermineTimeline() performing recovery
need to check a replay LSN position when determining on which timeline
to read a WAL page. A portion of the comment describing this function
said exactly that, while referring to a routine for fetching a write
LSN, something not available in recovery.
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20200511.101619.2043820539323292957.horikyota.ntt@gmail.com
|