| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
| |
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz
Backpatch-through: 12
|
|
|
|
|
|
| |
Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna.
Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org
|
|
|
|
|
|
|
| |
Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna. Some subtractions
by me.
Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The brinbuildCallbackParallel callback used by parallel BRIN builds did
not consider that the parallel table scans may be synchronized, starting
from an arbitrary block and then wrap around.
If this happened and the scan actually did wrap around, tuples from the
beginning of the table were added to the last range produced by the same
worker. The index would be missing range at the beginning of the table,
while the last range would be too wide. This would not produce incorrect
query results, but it'd be less efficient.
Fixed by checking for both past and future ranges in the callback. The
worker may produce multiple summaries for the same page range, but the
leader will merge them as if the summaries came from different workers.
Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit b437571714 added support for parallel builds for BRIN indexes,
using code similar to BTREE parallel builds, and also a new tuplesort
variant. This commit simplifies the new code in two ways:
* The "spool" grouping tuplesort and the heap/index is not necessary.
The heap/index are available as separate arguments, causing confusion.
So remove the spool, and use the tuplesort directly.
* The new tuplesort variant does not need the heap/index, as it sorts
simply by the range block number, without accessing the tuple data.
So simplify that too.
Initial report and patch by Ranier Vilela, further cleanup by me.
Author: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAqD7f2i4iyEaAz-5o-bf6zXVX-AkNUBm-YjUXEemaEh6A%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recently-introduced code in reconstruct.c was using "unsigned"
to store the result of read(), pg_pread(), or write(). This is
completely bogus: it breaks subsequent tests for the result being
negative, as we're being reminded of by a chorus of buildfarm
warnings. Switch to "int" as was doubtless intended. (There are
several other uses of "unsigned" in this file that also look poorly
chosen to me, but for now I'm just trying to clean up the buildfarm.)
A larger problem is that "int" is not necessarily wide enough to hold
the result: per POSIX, all these functions return ssize_t. In places
where the requested read or write length clearly fits in int, that's
academic. It may be academic anyway as long as we constrain
individual data files to 1GB, since even a readv or writev-like
operation would then not be responsible for transferring more than
1GB. Nonetheless it seems like trouble waiting to happen, so I made
a pass over readv and writev calls and fixed the result variables
where that seemed appropriate. We might want to think about changing
some of the fd.c functions to return ssize_t too, for future-proofing;
but I didn't tackle that here.
Discussion: https://postgr.es/m/1672202.1703441340@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
e0b1ee17dc introduced optimization for matching B-tree scan keys required for
the directional scan. However, it incorrectly assumed that all keys required
for opposite direction scan are satisfied by _bt_first(). It has been
illustrated that with multiple scan keys over the same column, a lesser one
(according to the scan direction) could win leaving the other one unsatisfied.
Instead of relying on _bt_first() this commit introduces code that memorizes
whether there was at least one match on the page. If that's true we know that
keys required for opposite-direction scan are satisfied as soon as
corresponding values are not NULLs.
Also, this commit simplifies the description for the optimization of keys
required for the current direction scan. Now the flag used for this is named
continuescanPrechecked and means exactly that *continuescan flag is known
to be true for the last item on the page.
Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzn0LeLcb1PdBnK0xisz8NpHkxRrMr3NWJ%2BKOK-WZ%2BQtTQ%40mail.gmail.com
Reviewed-by: Pavel Borisov
|
|
|
|
|
|
|
|
|
|
| |
It's not necessary to keep the firstPage flag as a field of BTScanOpaqueData.
This commit makes it an argument of the _bt_readpage() function. We can easily
distinguish first-time and repeated calls (within the scan) of this function.
Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzk4SOsw%2BtHuTFiz8U9Jqj-R77rYPkhWKODCBb1mdHACXA%40mail.gmail.com
Reviewed-by: Pavel Borisov
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is necessary when spgcanreturn() is invoked on a partitioned
index, and the failure might be reachable in other scenarios as
well. The rest of what spgGetCache() does is perfectly sensible
for a partitioned index, so we should allow it to go through.
I think the main takeaway from this is that we lack sufficient test
coverage for non-btree partitioned indexes. Therefore, I added
simple test cases for brin and gin as well as spgist (hash and
gist AMs were covered already in indexing.sql).
Per bug #18256 from Alexander Lakhin. Although the known test case
only fails since v16 (3c569049b), I've got no faith at all that there
aren't other ways to reach this problem; so back-patch to all
supported branches.
Discussion: https://postgr.es/m/18256-0b0e1b6e4a620f1b@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 6af179395 added isCatalogRel field to some WAL record types,
but this field was not shown in the rmgr descriptions. This commit
changes the several rmgr descriptions to display the isCatalogRel
field.
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/957dc8f9-2a02-4640-9c01-9dcbf97c4187%40gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To take an incremental backup, you use the new replication command
UPLOAD_MANIFEST to upload the manifest for the prior backup. This
prior backup could either be a full backup or another incremental
backup. You then use BASE_BACKUP with the INCREMENTAL option to take
the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST
option to trigger this behavior.
An incremental backup is like a regular full backup except that
some relation files are replaced with files with names like
INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains
additional lines identifying it as an incremental backup. The new
pg_combinebackup tool can be used to reconstruct a data directory
from a full backup and a series of incremental backups.
Patch by me. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub
Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to
Jakub for incredibly helpful and extensive testing.
Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When active, this process writes WAL summary files to
$PGDATA/pg_wal/summaries. Each summary file contains information for a
certain range of LSNs on a certain TLI. For each relation, it stores a
"limit block" which is 0 if a relation is created or destroyed within
a certain range of WAL records, or otherwise the shortest length to
which the relation was truncated during that range of WAL records, or
otherwise InvalidBlockNumber. In addition, it stores a list of blocks
which have been modified during that range of WAL records, but
excluding blocks which were removed by truncation after they were
modified and never subsequently modified again.
In other words, it tells us which blocks need to copied in case of an
incremental backup covering that range of WAL records. But this
doesn't yet add the capability to actually perform an incremental
backup; the next patch will do that.
A new parameter summarize_wal enables or disables this new background
process. The background process also automatically deletes summary
files that are older than wal_summarize_keep_time, if that parameter
has a non-zero value and the summarizer is configured to run.
Patch by me, with some design help from Dilip Kumar and Andres Freund.
Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter
Eisentraut, and Álvaro Herrera.
Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
First, mark the xlblocks member with InvalidXLogRecPtr, then issue a
write barrier, then initialize it. That ensures that the xlblocks
member doesn't appear valid while the contents are being initialized.
In preparation for reading WAL buffer contents without a lock.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACVfFMfqD5oLzZSQQZWfXiJqd-NdX0_317veP6FuB31QWA@mail.gmail.com
Reviewed-by: Andres Freund
|
|
|
|
|
|
|
|
|
|
| |
In preparation for reading the contents of WAL buffers without a
lock. Also, avoids the previously-needed comment in GetXLogBuffer()
explaining why it's safe from torn reads.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACVfFMfqD5oLzZSQQZWfXiJqd-NdX0_317veP6FuB31QWA@mail.gmail.com
Reviewed-by: Andres Freund
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dead tuples are ignored and are not marked as dead during recovery, as
it can lead to MVCC issues on a standby because its xmin may not match
with the primary. This information is tracked by a field called
"xactStartedInRecovery" in the transaction state data, switched on when
starting a transaction in recovery.
Unfortunately, this information was not correctly tracked when starting
a subtransaction, because the transaction state used for the
subtransaction did not update "xactStartedInRecovery" based on the state
of its parent. This would cause index scans done in subtransactions to
return inconsistent data, depending on how the xmin of the primary
and/or the standby evolved.
This is broken since the introduction of hot standby in efc16ea52067, so
backpatch all the way down.
Author: Fei Changhong
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/tencent_C4D907A5093C071A029712E73B43C6512706@qq.com
Backpatch-through: 12
|
|
|
|
|
|
|
|
|
|
|
|
| |
This GUC was intended as a debugging help in the 9.0 area when hot
standby and streaming replication were being developped, able to offer
more information at LOG level rather than DEBUGn. There are more tools
available these days that are able to offer rather equivalent
information, like pg_waldump introduced in 9.3. It is not obvious how
this facility is useful these days, so let's remove it.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/ZXEXEAUVFrvpquSd@paquier.xyz
|
|
|
|
|
|
|
|
|
|
| |
Remove comments that supposed that holding a pin was a useful interlock
for _bt_walk_left(). There are times when _bt_walk_left() doesn't hold
either a lock or a pin on any page, so clearly this can't be true.
_bt_walk_left() is even prepared to deal with concurrent deletion of
both the original page and any pages to its left.
Oversight in commit 2ed5b87f96.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Teach _bt_binsrch (and related helper routines like _bt_search and
_bt_compare) about the initial positioning requirements of backward
scans. Routines like _bt_binsrch already know all about "nextkey"
searches, so it seems natural to teach them about "goback"/backward
searches, too. These concepts are closely related, and are much easier
to understand when discussed together.
Now that certain implementation details are hidden from _bt_first, it's
straightforward to add a new optimization: backward scans using the <
strategy now avoid extra leaf page accesses in certain "boundary cases".
Consider the following example, which uses the tenk1 table (and its
tenk1_hundred index) from the standard regression tests:
SELECT * FROM tenk1 WHERE hundred < 12 ORDER BY hundred DESC LIMIT 1;
Before this commit, nbtree would scan two leaf pages, even though it was
only really necessary to scan one leaf page. We'll now descend straight
to the leaf page containing a (12, -inf) high key instead. The scan
will locate matching non-pivot tuples with "hundred" values starting
from the value 11. The scan won't waste a page access on the right
sibling leaf page, which cannot possibly contain any matching tuples.
You can think of the optimization added by this commit as disabling an
optimization (the _bt_compare "!pivotsearch" behavior that was added to
Postgres 12 in commit dd299df8) for a small subset of cases where it was
always counterproductive.
Equivalently, you can think of the new optimization as extending the
"pivotsearch" behavior that page deletion by VACUUM has long required
(since the aforementioned Postgres 12 commit went in) to other, similar
cases. Obviously, this isn't strictly necessary for these new cases
(unlike VACUUM, _bt_first is prepared to move the scan to the left once
on the leaf level), but the underlying principle is the same.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=XPzM8HzaLPq278Vms420mVSHfgs9wi5tjFKHcapZCEw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow using multiple worker processes to build BRIN index, which until
now was supported only for BTREE indexes. For large tables this often
results in significant speedup when the build is CPU-bound.
The work is split in a simple way - each worker builds BRIN summaries on
a subset of the table, determined by the regular parallel scan used to
read the data, and feeds them into a shared tuplesort which sorts them
by blkno (start of the range). The leader then reads this sorted stream
of ranges, merges duplicates (which may happen if the parallel scan does
not align with BRIN pages_per_range), and adds the resulting ranges into
the index.
The number of duplicate results produced by workers (requiring merging
in the leader process) should be fairly small, thanks to how parallel
scans assign chunks to workers. The likelihood of duplicate results may
increase for higher pages_per_range values, but then there are fewer
page ranges in total. In any case, we expect the merging to be much
cheaper than summarization, so this should be a win.
Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for BRIN indexes
(e.g. uniqueness checks).
This also introduces a new index AM flag amcanbuildparallel, determining
whether to attempt to start parallel workers for the index build.
Original patch by me, with reviews and substantial reworks by Matthias
van de Meent, certainly enough to make him a co-author.
Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent
Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When building BRIN indexes, the brinbuildCallback only advances to the
next page range when seeing a tuple that doesn't belong to the current
one. This means that the index may end up missing ranges at the end of
the table, if those pages do not contain any indexable tuples.
We tend not to have completely empty pages at the end of a relation, but
this also applies to partial indexes, where the tuples may simply not
match the index predicate. This results in inefficient scans using the
affected BRIN index - without the summaries, the page ranges have to be
read and processed, which consumes I/O and possibly also CPU time.
The existing code already added empty ranges for earlier parts of the
table, this commit makes sure we add them for the ranges at the end of
the table too.
Patch by Matthias van de Meent, with review/improvements by me.
Author: Matthias van de Meent
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CAEze2WiMsPZg%3DxkvSF_jt4%3D69k6K7gz5B8V2wY3gCGZ%2B1BzCbQ%40mail.gmail.com
|
|
|
|
|
|
|
|
| |
The old name was misleading: It's not a cache, the values kept in the
struct are the authoritative source.
Reviewed-by: Tristan Partin, Richard Guo
Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi
|
|
|
|
|
|
|
| |
For sake of consistency.
Reviewed-by: Tristan Partin, Richard Guo
Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While checking if a record could fit in the circular WAL decoding
buffer, the coding from commit 3f1ce973 used arithmetic that could
overflow. 64 bit systems were unaffected for various technical reasons,
which probably explains the lack of problem reports. Likewise for 32
bit systems running known 32 bit kernels. The systems at risk of
problems appear to be 32 bit processes running on 64 bit kernels, with
unlucky placement in memory.
Per complaint from GCC -fsanitize=undefined -m32, while testing
variations of 039_end_of_wal.pl.
Back-patch to 15.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGKH0oRPOX7DhiQ_b51sM8HqcPp2J3WA-Oen%3DdXog%2BAGGQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
This has been broken since b060dbe0001a that has reworked the callback
mechanism of XLogReader, most likely unnoticed because any form of
development involving WAL happens on platforms where this compiles fine.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACVF14WKQMFwcJ=3okVDhiXpuK5f7YdT+BdYXbbypMHqWA@mail.gmail.com
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 861f86beea changed hash_xlog_squeeze_page() to start reading
the write buffer conditionally but forgot to initialize it leading to an
uninitialized access.
Reported-by: Alexander Lakhin
Author: Hayato Kuroda
Reviewed-by: Alexander Lakhin, Amit Kapila
Discussion: http://postgr.es/m/62ed1a9f-746a-8e86-904b-51b9b806a1d9@gmail.com
|
|
|
|
|
| |
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/55d8800f-4a80-5256-1e84-246fbe79acd0@gmail.com
|
|
|
|
|
|
| |
Reported-by: Alexander Lakhin, Tom Lane
Author: Pavel Borisov
Discussion: https://postgr.es/m/55d8800f-4a80-5256-1e84-246fbe79acd0@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quotes are applied to GUCs in a very inconsistent way across the code
base, with a mix of double quotes or no quotes used. This commit
removes double quotes around all the GUC names that are obviously
referred to as parameters with non-English words (use of underscore,
mixed case, etc).
This is the result of a discussion with Álvaro Herrera, Nathan Bossart,
Laurenz Albe, Peter Eisentraut, Tom Lane and Daniel Gustafsson.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Switch from using TransactionId to FullTransactionId in naming of 2PC files.
Transaction state file in the pg_twophase directory now have extra 8 bytes in
the name to address an epoch of a given xid.
Author: Maxim Orlov, Aleksander Alekseev, Alexander Korotkov, Teodor Sigaev
Author: Nikita Glukhov, Pavel Borisov, Yura Sokolov
Reviewed-by: Jacob Champion, Heikki Linnakangas, Alexander Korotkov
Reviewed-by: Japin Li, Pavel Borisov, Tom Lane, Peter Eisentraut, Andres Freund
Reviewed-by: Andrey Borodin, Dilip Kumar, Aleksander Alekseev
Discussion: https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com
Discussion: https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've had repeated bugs in the area of handling SLRU wraparound in the past,
some of which have caused data loss. Switching to an indexing system for SLRUs
that does not wrap around should allow us to get rid of a whole bunch
of problems and improve the overall reliability of the system.
This particular patch however only changes the indexing and doesn't address
the wraparound per se. This is going to be done in the following patches.
Author: Maxim Orlov, Aleksander Alekseev, Alexander Korotkov, Teodor Sigaev
Author: Nikita Glukhov, Pavel Borisov, Yura Sokolov
Reviewed-by: Jacob Champion, Heikki Linnakangas, Alexander Korotkov
Reviewed-by: Japin Li, Pavel Borisov, Tom Lane, Peter Eisentraut, Andres Freund
Reviewed-by: Andrey Borodin, Dilip Kumar, Aleksander Alekseev
Discussion: https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com
Discussion: https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the tuple being updated is not visible to the crosscheck snapshot,
we return TM_Updated but the assertions would not hold in that case.
Move them to before the cross-check.
Fixes bug #17893. Backpatch to all supported versions.
Author: Alexander Lakhin
Backpatch-through: 12
Discussion: https://www.postgresql.org/message-id/17893-35847009eec517b5%40postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a bug introduced by c1ec02be1d79. It may happen that the executor
opens indexes on the result relation, but no rows end up being inserted.
Then the index_insert_cleanup still gets executed, but passes down NULL
to the AM callback. The AM callback may not expect this, as is the case
of brininsertcleanup, leading to a crash.
Fixed by only calling the cleanup callback if (ii_AmCache != NULL). This
way the AM can simply assume to only see a valid cache.
Reported-by: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4-w9qC-o9hQox9UHvdVZAYTp8OrPQOKtwbvzWaRejTT=Q@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
XLogSetAsyncXactLSN(), called at asynchronous commit, would wake up
walwriter every time the LSN advances, but walwriter doesn't actually
do anything unless it has at least 'wal_writer_flush_after' full
blocks of WAL to write. Repeatedly waking up walwriter to do nothing
is a waste of CPU cycles in both walwriter and the backends doing the
wakeups. To fix, apply the same logic in XLogSetAsyncXactLSN() to
decide whether to wake up walwriter, as walwriter uses to determine if
it has any work to do.
In the passing, rename misleadingly named 'flushbytes' local variable
to 'flushblocks'.
Author: Andres Freund, Heikki Linnakangas
Discussion: https://www.postgresql.org/message-id/20231024230929.vsc342baqs7kmbte@awork3.anarazel.de
|
|
|
|
| |
Per buildfarm member koel.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The brininsert code used to initialize (and destroy) BrinDesc and
BrinRevmap for each tuple, which is not free. This patch initializes
these structures only once, and reuses them for all inserts in the same
command. The data is passed through indexInfo->ii_AmCache.
This also introduces an optional AM callback "aminsertcleanup" that
allows performing custom cleanup in case simply pfree-ing ii_AmCache is
not sufficient (which is the case when the cache contains TupleDesc,
Buffers, and so on).
Author: Soumyadeep Chakraborty
Reviewed-by: Alvaro Herrera, Matthias van de Meent, Tomas Vondra
Discussion: https://postgr.es/m/CAE-ML%2B9r2%3DaO1wwji1sBN9gvPz2xRAtFUGfnffpd0ZqyuzjamA%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/CAB7nPqSDdF0heotQU3gsepgqx+9c+6KjLd3R6aNYH7KKfDd2ig@mail.gmail.com
Author: Michael Paquier
Backpatch-through: master
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously these functions returned the previous segment number if the
LSN was on a segment boundary. We now always return the current segment
number for an LSN.
Docs updated to reflect this change. Regression tests added, author
Andres Freund.
Also mentioned in thread https://postgr.es/m/flat/20220204225057.GA1535307%40nathanxps13#d964275c9540d8395e138efc0a75f7e8
BACKWARD INCOMPATIBILITY
Reported-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20190726.172120.101752680.horikyota.ntt@gmail.com
Co-authored-by: Kyotaro Horiguchi
Backpatch-through: master
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When there are no indexes on a table, we vacuum each heap block after
pruning it and then update the freespace map. Periodically, we also
vacuum the freespace map. This was done while unnecessarily holding a
lock on the heap page. Release the lock before calling
FreeSpaceMapVacuumRange() and, while we're at it, ensure the range
includes the heap block we just vacuumed.
There are no known deadlocks or other similar issues, therefore don't
backpatch. It's certainly not good to do all this work under a lock, but it's
not frequently reached, making it not worth the risk of backpatching.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAAKRu_YiL%3D44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of commit eaa5808e8e, MemoryContextResetAndDeleteChildren() is
just a backwards compatibility macro for MemoryContextReset(). Now
that some time has passed, this macro seems more likely to create
confusion.
This commit removes the macro and replaces all remaining uses with
calls to MemoryContextReset(). Any third-party code that use this
macro will need to be adjusted to call MemoryContextReset()
instead. Since the two have behaved the same way since v9.5, such
adjustments won't produce any behavior changes for all
currently-supported versions of PostgreSQL.
Reviewed-by: Amul Sul, Tom Lane, Alvaro Herrera, Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/20231113185950.GA1668018%40nathanxps13
|
|
|
|
|
|
|
|
|
|
|
| |
Alexander reported a crash with repeated create + drop database, after
the ResourceOwner rewrite (commit b8bff07daa). That was fixed by the
previous commit, but it nevertheless seems like a good idea clear
CurrentResourceOwner earlier, because you're not supposed to use it
for anything after we start releasing it.
Reviewed-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/11b70743-c5f3-3910-8e5b-dd6c115ff829%40gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's clearly stated in the comments that ginFindParents() must keep
the pin on the index's root page that's associated with the topmost
GinBtreeStack item. However, the code path for the case that the
desired downlink has been pushed down to the next index level
ignored this proviso, and would release the pin anyway if we were
still examining the root level. That led to an assertion failure
or "buffer NNNN is not owned by resource owner" error later, when
we try to release the pin again at the end of the insertion.
This is quite hard to reproduce, since it can only happen if an
index root page split occurs concurrently with our own insertion.
Thanks to Jeff Janes for finding a test case that triggers it
often enough to allow investigation.
This has been there since the beginning of GIN, so back-patch
to all supported branches.
Discussion: https://postgr.es/m/CAMkU=1yCAKtv86dMrD__Ja-7KzjE=uMeKX8y__cx5W-OEWy2ow@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
Commit 00d7fb5e2e started to use REGBUF_NO_CHANGE at a few places in the
code where we register the buffer before marking it dirty but missed
updating one of the code flows in the hash index where we free the overflow
page without any live tuples on it.
Author: Amit Kapila and Hayato Kuroda
Discussion: http://postgr.es/m/f045c8f7-ee24-ead6-3679-c04a43d21351@gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
We don't want existing slots in the old cluster to get invalidated during
the upgrade. During an upgrade, we set this variable to -1 via the command
line in an attempt to prevent such invalidations, but users have ways to
override it. This patch ensures the value is not overridden by the user.
Author: Kyotaro Horiguchi
Reviewed-by: Peter Smith, Alvaro Herrera
Discussion: http://postgr.es/m/20231027.115759.2206827438943188717.horikyota.ntt@gmail.com
|
|
|
|
|
|
|
|
|
| |
We seem to have accidentally used "insure" in a few places. Correct
that.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv0biqrhA3pMhu40aDsj343mTsD75khKnHsLqR8P04f=Q@mail.gmail.com
Backpatch-through: 12, oldest supported version
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of having a separate array/hash for each resource kind, use a
single array and hash to hold all kinds of resources. This makes it
possible to introduce new resource "kinds" without having to modify
the ResourceOwnerData struct. In particular, this makes it possible
for extensions to register custom resource kinds.
The old approach was to have a small array of resources of each kind,
and if it fills up, switch to a hash table. The new approach also uses
an array and a hash, but now the array and the hash are used at the
same time. The array is used to hold the recently added resources, and
when it fills up, they are moved to the hash. This keeps the access to
recent entries fast, even when there are a lot of long-held resources.
All the resource-specific ResourceOwnerEnlarge*(),
ResourceOwnerRemember*(), and ResourceOwnerForget*() functions have
been replaced with three generic functions that take resource kind as
argument. For convenience, we still define resource-specific wrapper
macros around the generic functions with the old names, but they are
now defined in the source files that use those resource kinds.
The release callback no longer needs to call ResourceOwnerForget on
the resource being released. ResourceOwnerRelease unregisters the
resource from the owner before calling the callback. That needed some
changes in bufmgr.c and some other files, where releasing the
resources previously always called ResourceOwnerForget.
Each resource kind specifies a release priority, and
ResourceOwnerReleaseAll releases the resources in priority order. To
make that possible, we have to restrict what you can do between
phases. After calling ResourceOwnerRelease(), you are no longer
allowed to remember any more resources in it or to forget any
previously remembered resources by calling ResourceOwnerForget. There
was one case where that was done previously. At subtransaction commit,
AtEOSubXact_Inval() would handle the invalidation messages and call
RelationFlushRelation(), which temporarily increased the reference
count on the relation being flushed. We now switch to the parent
subtransaction's resource owner before calling AtEOSubXact_Inval(), so
that there is a valid ResourceOwner to temporarily hold that relcache
reference.
Other end-of-xact routines make similar calls to AtEOXact_Inval()
between release phases, but I didn't see any regression test failures
from those, so I'm not sure if they could reach a codepath that needs
remembering extra resources.
There were two exceptions to how the resource leak WARNINGs on commit
were printed previously: llvmjit silently released the context without
printing the warning, and a leaked buffer io triggered a PANIC. Now
everything prints a WARNING, including those cases.
Add tests in src/test/modules/test_resowner.
Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
false_positive_rate is a parameter that can be set with the bloom
opclass in BRIN, and setting it to a value of exactly 0.25 would trigger
an assertion in the first INSERT done on the index with value set.
The assertion changed here relied on BLOOM_{MIN|MAX}_FALSE_POSITIVE_RATE
that are somewhat arbitrary values, and specifying an out-of-range value
would also trigger a failure when defining such an index. So, as-is,
the assertion was just doubling on the min-max check of the reloption.
This is now enlarged to check that it is a correct percentage value,
instead, based on a suggestion by Tom Lane.
Author: Alexander Lakhin
Reviewed-by: Tom Lane, Shihao Zhong
Discussion: https://postgr.es/m/17969-a6c54de48026d694@postgresql.org
Backpatch-through: 14
|
|
|
|
|
|
|
|
|
|
|
|
| |
This also updates some C comments.
Reported-by: suchithjn22@gmail.com
Discussion: https://postgr.es/m/167336599095.2667301.15497893107226841625@wrigleys.postgresql.org
Author: Laurenz Albe (doc patch)
Backpatch-through: 11
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pgstatindex failed with ERRCODE_DATA_CORRUPTED, of the "can't-happen"
class XX. The other functions succeeded on an empty index; they might
have malfunctioned if the failed index build left torn I/O or other
complex state. Report an ERROR in statistics functions pgstatindex,
pgstatginindex, pgstathashindex, and pgstattuple. Report DEBUG1 and
skip all index I/O in maintenance functions brin_desummarize_range,
brin_summarize_new_values, brin_summarize_range, and
gin_clean_pending_list. Back-patch to v11 (all supported versions).
Discussion: https://postgr.es/m/20231001195309.a3@google.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When beginning recovery, a LOG is displayed by the startup process to
show which recovery mode will be used depending on the .signal file(s)
set in the data folder, like "standby mode", recovery up to a given
target type and value, or archive recovery.
A different patch is under discussion to simplify the startup code by
requiring the presence of recovery.signal and/or standby.signal when a
backup_label file is read. Delaying a bit this LOG ensures that the
correct recovery mode would be reported, and putting it at this position
does not make it lose its value.
While on it, this commit adds a few comments documenting a bit more the
initial recovery steps and their dependencies, and fixes an incorrect
comment format. This introduces no behavior changes.
Extracted from a larger patch by me.
Reviewed-by: David Steele, Bowen Shi
Discussion: https://postgr.es/m/ZArVOMifjzE7f8W7@paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When beginning recovery from a base backup by reading a backup_label
file, it may be possible that no checkpoint record is available
depending on the method used when the case backup was taken, which would
prevent recovery from beginning. In this case, the FATAL messages
issued, initially added by c900c15269f0f, mentioned recovery.signal as
an option to do recovery but not standby.signal. Let's add it as an
available option, for clarity.
Per suggestion from Bowen Shi, extracted from a larger patch by me.
Author: Michael Paquier
Discussion: https://postgr.es/m/CAM_vCudkSjr7NsNKSdjwtfAm9dbzepY6beZ5DP177POKy8=2aw@mail.gmail.com
|