| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
| |
Introduced in aa64f23.
Author: Nathan Bossart
Discussion: https://postgr.es/m/20220209175338.GB1627503@nathanxps13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, it was really easy to write code that accessed MaxBackends
before we'd actually initialized it, especially when coding up an
extension. To make this less error-prune, introduce a new function
GetMaxBackends() which should be used to obtain the correct value.
This will ERROR if called too early. Demote the global variable to
a file-level static, so that nobody can peak at it directly.
Nathan Bossart. Idea by Andres Freund. Review by Greg Sabino Mullane,
by Michael Paquier (who had doubts about the approach), and by me.
Discussion: http://postgr.es/m/20210802224204.bckcikl45uezv5e4@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GiST sorted build currently chooses split points according to the only page
space utilization. That may lead to higher non-leaf keys overlap and, in turn,
slower search query answers.
This commit makes the sorted build use the opclass's picksplit method. Once
four pages at the level are accumulated, the picksplit method is applied until
each split partition fits the page. Some of our split algorithms could show
significant performance degradation while processing 4-times more data at once.
But those opclasses haven't received the sorted build support and shouldn't
receive it before their split algorithms are improved.
Discussion: https://postgr.es/m/CAHqSB9jqtS94e9%3D0vxqQX5dxQA89N95UKyz-%3DA7Y%2B_YJt%2BVW5A%40mail.gmail.com
Author: Aliaksandr Kalenik, Sergei Shoulbakov, Andrey Borodin
Reviewed-by: Björn Harrtell, Darafei Praliaskouski, Andres Freund
Reviewed-by: Alexander Korotkov
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Running a shell command for each file to be archived has a lot of
overhead and may not offer as much error checking as you want, or the
exact semantics that you want. So, offer the option to call a loadable
module for each file to be archived, rather than running a shell command.
Also, add a 'basic_archive' contrib module as an example implementation
that archives to a local directory.
Nathan Bossart, with a little bit of kibitzing by me.
Discussion: http://postgr.es/m/20220202224433.GA1036711@nathanxps13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not. Different
implementations have different behaviors. In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.
This patch adds this option to PostgreSQL. The default behavior
remains UNIQUE NULLS DISTINCT. Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.
The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.
I named all the internal flags, catalog columns, etc. in the negative
("nulls not distinct") so that the default PostgreSQL behavior is the
default if the flag is false.
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bde78@enterprisedb.com
|
|
|
|
|
|
|
|
|
| |
xlog.h is directly and indirectly #included in a lot of places. With
this change, xloginsert.h is no longer unnecessarily included in the
large number of them that don't need it.
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CALj2ACVe-W+WM5P44N7eG9C2_FmaeM8Dq5aCnD3fHt0Ba=WR6w@mail.gmail.com
|
|
|
|
|
| |
Rename pages_removed to removed_pages, for consistency with nearby
vacrel fields.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
WAL replay would cause a hard crash if the timeline expected by a
XLOG_END_OF_RECOVERY, a XLOG_CHECKPOINT_ONLINE, or a
XLOG_CHECKPOINT_SHUTDOWN record is not the same as the timeline being
replayed, using the same error message for all three of them. This
commit changes those error messages to use different wordings, adapted
to each record type, which is useful when it comes to the debugging of
an issue in this area.
Author: Amul Sul
Reviewed-by: Nathan Bossart, Robert Haas
Discussion: https://postgr.es/m/CAAJ_b97i1ZerYC_xW6o_AiDSW5n+sGi8k91Yc8KS8bKWKxjqwQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
| |
This fixes a set of issues that have accumulated over the past months
(or years) in various code areas. Most fixes are related to some recent
additions, as of the development of v15.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20220124030001.GQ23027@telsasoft.com
|
|
|
|
|
|
|
|
|
|
|
| |
While individual logical rewrite files were synced to disk, the directory was
not. On some filesystems that could lead to loosing directory entries after a
crash.
Reported-By: Tom Lane <tgl@sss.pgh.pa.us>
Author: Nathan Bossart <bossartn@amazon.com>
Discussion: https://postgr.es/m/867F2E29-2782-4869-970E-B984C6D35A8F@amazon.com
Backpatch: 10-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The logic in charge of writing commit timestamps (enabled with
track_commit_timestamp) for subtransactions had a one-bug bug,
where it would be possible that commit timestamps go missing for the
last subtransaction committed.
While on it, simplify a bit the iteration logic in the loop writing the
commit timestamps, as per suggestions from Kyotaro Horiguchi and Tom
Lane, so as some variable initializations are not part of the loop
itself.
Issue introduced in 73c986a.
Analyzed-by: Alex Kingsborough
Author: Alex Kingsborough, Kyotaro Horiguchi
Discussion: https://postgr.es/m/73A66172-4050-4F2A-B7F1-13508EDA2144@amazon.com
Backpatch-through: 10
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, callers of pg_newlocale_from_collation() did not call it
if the collation was DEFAULT_COLLATION_OID and instead proceeded with
a pg_locale_t of 0. Instead, now we call it anyway and have it return
0 if the default collation was passed. It already did this, so we
just have to adjust the callers. This simplifies all the call sites
and also makes future enhancements easier.
After discussion and testing, the previous comment in pg_locale.c
about avoiding this for performance reasons may have been mistaken
since it was testing a very different patch version way back when.
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/ed3baa81-7fac-7788-cc12-41e3f7917e34@enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new rmgr method, rm_decode, and use that rather than a switch
statement.
In preparation for rmgr extensibility.
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
Discussion: https://postgr.es/m/20220118095332.6xtlcjoyxobv6cbk@jrouhaud
|
|
|
|
|
|
|
|
| |
BufferGetBlockNumber() is not that cheap and obviously cannot change during
one heap_prune_page(), so only call it once. We might be able to do better and
pass the block number from the caller, but that'd be a larger change...
Discussion: https://postgr.es/m/20211211045710.ljtuu4gfloh754rs@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The log_autovacuum_min_duration instrumentation used its own dedicated
code for logging, which was not reused by VACUUM VERBOSE. This was
highly duplicative, and sometimes led to each code path using slightly
different accounting for essentially the same information.
Clean things up by making VACUUM VERBOSE reuse the same instrumentation
code. This code restructuring changes the structure of the VACUUM
VERBOSE output itself, but that seems like an overall improvement. The
most noticeable change in VACUUM VERBOSE output is that it no longer
outputs a distinct message per index per round of index vacuuming. Most
of the same information (about each index) is now shown in its new
per-operation summary message. This is far more legible.
A few details are no longer displayed by VACUUM VERBOSE, but that's no
real loss in practice, especially in the common case where we don't need
multiple index scans/rounds of vacuuming. This super fine-grained
information is still available via DEBUG2 messages, which might still be
useful in debugging scenarios.
VACUUM VERBOSE now shows new instrumentation, which is typically very
useful: all of the log_autovacuum_min_duration instrumentation that it
missed out on before now. This includes information about WAL overhead,
buffers hit/missed/dirtied information, and I/O timing information.
VACUUM VERBOSE still retains a few INFO messages of its own. This is
limited to output concerning the progress of heap rel truncation, as
well as some basic information about parallel workers. These details
are still potentially quite useful. They aren't a good fit for the log
output, which must summarize the whole operation.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzmW4Me7_qR4X4ka7pxP-jGmn7=Npma_-Z-9Y1eD0MQRLw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Corruption of redirect item pointers often only becomes visible well after
being corrupted, as e.g. bug #17255 shows: In the original reproducer,
gigabyte of WAL were between the source of the corruption and the corruption
becoming visible.
To make it easier to find / prevent such bugs, verify whether redirect
pointers are sensible at the end of heap_page_prune_execute(). 5cd7eb1f1c32
introduced related assertions while modifying the page, but they can't easily
detect marking the target of an existing redirect as unused. Sometimes the
corruption will be detected later, but that's harder to diagnose.
Author: Andres Freund <andres@andres@anarazel.de>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/20211122175914.ayk6gg6nvdwuhrzb@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since dc7420c2c92 the horizon used for pruning is determined "lazily". A more
accurate horizon is built on-demand, rather than in GetSnapshotData(). If a
horizon computation is triggered between two HeapTupleSatisfiesVacuum() calls
for the same tuple, the result can change from RECENTLY_DEAD to DEAD.
heap_page_prune() can process the same tid multiple times (once following an
update chain, once "directly"). When the result of HeapTupleSatisfiesVacuum()
of a tuple changes from RECENTLY_DEAD during the first access, to DEAD in the
second, the "tuple is DEAD and doesn't chain to anything else" path in
heap_prune_chain() can end up marking the target of a LP_REDIRECT ItemId
unused.
Initially not easily visible,
Once the target of a LP_REDIRECT ItemId is marked unused, a new tuple version
can reuse it. At that point the corruption may become visible, as index
entries pointing to the "original" redirect item, now point to a unrelated
tuple.
To fix, compute HTSV for all tuples on a page only once. This fixes the entire
class of problems of HTSV changing inside heap_page_prune(). However,
visibility changes can obviously still occur between HTSV checks inside
heap_page_prune() and outside (e.g. in lazy_scan_prune()).
The computation of HTSV is now done in bulk, in heap_page_prune(), rather than
on-demand in heap_prune_chain(). Besides being a bit simpler, it also is
faster: Memory accesses can happen sequentially, rather than in the order of
HOT chains.
There are other causes of HeapTupleSatisfiesVacuum() results changing between
two visibility checks for the same tuple, even before dc7420c2c92. E.g.
HEAPTUPLE_INSERT_IN_PROGRESS can change to HEAPTUPLE_DEAD when a transaction
aborts between the two checks. None of the these other visibility status
changes are known to cause corruption, but heap_page_prune()'s approach makes
it hard to be confident.
A patch implementing a more fundamental redesign of heap_page_prune(), which
fixes this bug and simplifies pruning substantially, has been proposed by
Peter Geoghegan in
https://postgr.es/m/CAH2-WzmNk6V6tqzuuabxoxM8HJRaWU6h12toaS-bqYcLiht16A@mail.gmail.com
However, that redesign is larger change than desirable for backpatching. As
the new design still benefits from the batched visibility determination
introduced in this commit, it makes sense to commit this narrower fix to 14
and master, and then commit Peter's improvement in master.
The precise sequence required to trigger the bug is complicated and hard to do
exercise in an isolation test (until we have wait points). Due to that the
isolation test initially posted at
https://postgr.es/m/20211119003623.d3jusiytzjqwb62p%40alap3.anarazel.de
and updated in
https://postgr.es/m/20211122175914.ayk6gg6nvdwuhrzb%40alap3.anarazel.de
isn't committable.
A followup commit will introduce additional assertions, to detect problems
like this more easily.
Bug: #17255
Reported-By: Alexander Lakhin <exclusion@gmail.com>
Debugged-By: Andres Freund <andres@anarazel.de>
Debugged-By: Peter Geoghegan <pg@bowt.ie>
Author: Andres Freund <andres@andres@anarazel.de>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/20211122175914.ayk6gg6nvdwuhrzb@alap3.anarazel.de
Backpatch: 14-, the oldest branch containing dc7420c2c92
|
|
|
|
| |
Another minor oversight in commit 4f8d9d12.
|
|
|
|
|
| |
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACW7SvfFW8r2uKH6oQm1kNpt8aQMG61kSBPK0S2PHhFbMw@mail.gmail.com
|
|
|
|
|
|
|
|
|
| |
Rename range_serialize/range_deserialize to
brin_range_serialize/brin_range_deserialize, since there are already
public range_serialize/range_deserialize in rangetypes.h.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/CA+renyX0ipvY6A_jUOHeB1q9mL4bEYfAZ5FBB7G7jUo5bykjrA@mail.gmail.com
|
|
|
|
| |
Backpatch-through: 10
|
|
|
|
|
|
|
|
|
| |
brin_new_memtuple already did this, so there's no need
for initialize_brin_buildstate to do it again.
Richard Guo, reviewed by Bharath Rupireddy
Discussion: https://postgr.es/m/CAMbWs4-kYYpKNOdiWtsCZ3jbkFFj4nhOVH22JH7dsrMYX=aGjg@mail.gmail.com
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Under concurrency, it is possible for two sessions to be merrily locking
and releasing a tuple and marking it again as HEAP_XMAX_INVALID all the
while a third session attempts to lock it, miserably fails at it, and
then contemplates life, the universe and everything only to eventually
fail an assertion that said bit is not set. Before SKIP LOCKED that was
indeed a reasonable expectation, but alas! commit df630b0dd5ea falsified
it.
This bug is as old as time itself, and even older, if you think time
begins with the oldest supported branch. Therefore, backpatch to all
supported branches.
Author: Simon Riggs <simon.riggs@enterprisedb.com>
Discussion: https://postgr.es/m/CANbhV-FeEwMnN8yuMyss7if1ZKjOKfjcgqB26n8pqu1e=q0ebg@mail.gmail.com
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit moves parallel vacuum related code to a new file
commands/vacuumparallel.c so that any table AM supporting indexes can
utilize parallel vacuum in order to call index AM callbacks (ambulkdelete
and amvacuumcleanup) with parallel workers.
Another reason for this refactoring is that the parallel vacuum isn't
specific to heap so it doesn't make sense to keep this code in
heap/vacuumlazy.c.
Author: Masahiko Sawada, based on suggestion from Andres Freund
Reviewed-by: Hou Zhijie, Amit Kapila, Haiying Tang
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
| |
An upcoming patch moves parallel vacuum code out of vacuumlazy.c. This
code restructuring will allow both lazy vacuum and parallel vacuum to use
index vacuum functions.
Author: Masahiko Sawada
Reviewed-by: Hou Zhijie, Amit Kapila
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of referring to target backends by pid, use pgprocno. This
means that we don't have to scan the ProcArray and we can drop some
special case code for dealing with the startup process.
Discussion: https://postgr.es/m/CA%2BhUKGLYRyDaneEwz5Uya_OgFLMx5BgJfkQSD%3Dq9HmwsfRRb-w%40mail.gmail.com
Reviewed-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Reviewed-by: Ashwin Agrawal <ashwinstar@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, in parallel vacuum, we allocated shmem area of
IndexBulkDeleteResult only for indexes where parallel index vacuuming is
safe and had null-bitmap in shmem area to access them. This logic was too
complicated with a small benefit of saving only a few bits per indexes.
In this commit, we allocate a dedicated shmem area for the array of
LVParallelIndStats that includes a parallel-safety flag, the index vacuum
status, and IndexBulkdeleteResult. There is one array element for every
index, even those indexes where parallel index vacuuming is unsafe or not
worthwhile. This commit makes the code clear by removing all
bitmap-related code.
Also, add the check each index vacuum status after parallel index vacuum
to make sure that all indexes have been processed.
Finally, rename parallel vacuum functions to parallel_vacuum_* for
consistency.
Author: Masahiko Sawada, based on suggestions by Andres Freund
Reviewed-by: Hou Zhijie, Amit Kapila
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using replication origins, pg_replication_origin_xact_setup() is an
optional choice to be able to set a LSN and a timestamp to mark the
origin, which would be additionally added to WAL for transaction commits
or aborts (including 2PC transactions). An assertion in the code path
of PREPARE TRANSACTION assumed that this data should always be set, so
it would trigger when using replication origins without setting up an
origin LSN. Some tests are added to cover more this kind of scenario.
Oversight in commit 1eb6d65.
Per discussion with Amit Kapila and Masahiko Sawada.
Discussion: https://postgr.es/m/YbbBfNSvMm5nIINV@paquier.xyz
Backpatch-through: 11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's not great that RecoveryInProgress() calls InitXLOGAccess(),
because a status inquiry function typically shouldn't have the side
effect of performing initializations. We could fix that by calling
InitXLOGAccess() from some other place, but instead, let's remove it
altogether.
One thing InitXLogAccess() did is initialize wal_segment_size, but it
doesn't need to do that. In the postmaster, PostmasterMain() calls
LocalProcessControlFile(), and all child processes will inherit that
value -- except in EXEC_BACKEND bulds, but then each backend runs
SubPostmasterMain() which also calls LocalProcessControlFile().
The other thing InitXLOGAccess() did is update RedoRecPtr and
doPageWrites, but that's not critical, because all code that uses
them will just retry if it turns out that they've changed. The
only difference is that most code will now see an initial value that
is definitely invalid instead of one that might have just been way
out of date, but that will only happen once per backend lifetime,
so it shouldn't be a big deal.
Patch by me, reviewed by Nathan Bossart, Michael Paquier, Andres
Freund, Heikki Linnakangas, and Álvaro Herrera.
Discussion: http://postgr.es/m/CA+TgmoY7b65qRjzHN_tWUk8B4sJqk1vj1d31uepVzmgPnZKeLg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea here is that when a performance problem is known to have
occurred at a certain point in time, it's a good thing if there is
some information available from the logs to help figure out what
might have happened around that time.
This change attracted an above-average amount of dissent, because
it means that a server with default settings will produce some amount
of log output even if nothing has gone wrong. However, by my count,
the mailing list discussion had about twice as many people in favor
of the change as opposed. The reasons for believing that the extra
log output is not an issue in practice are: (1) the rate at which
messages can be generated by this setting is bounded to one every
few minutes on a properly-configured system and (2) production
systems tend to have a lot more junk in the log from that due to
failed connection attempts, ERROR messages generated by application
activity, and the like.
Bharath Rupireddy, reviewed by Fujii Masao and by me. Many other
people commented on the thread, but as far as I can see that was
discussion of the merits of the change rather than review of the
patch.
Discussion: https://postgr.es/m/CALj2ACX-rW_OeDcp4gqrFUAkf1f50Fnh138dmkd0JkvCNQRKGA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit improves the description of some WAL records for the
Transaction RMGR:
- Track remote_apply for a transaction commit. This GUC is
user-settable, so this information can be useful for debugging.
- Add replication origin information for PREPARE TRANSACTION, with the
origin ID, LSN and timestamp
- Same as above, for ROLLBACK PREPARED.
This impacts the format of pg_waldump or anything using these
description routines, so no backpatch is done.
Author: Masahiko Sawada, Michael Paquier
Discussion: https://postgr.es/m/CAD21AoD2dJfgsdxk4_KciAZMZQoUiCvmV9sDpp8ZuKLtKCNXaA@mail.gmail.com
|
|
|
|
|
|
|
|
| |
One of the changes impacts the documentation, so backpatch.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pu6+c+r3mY24VT7u+H+E_s6vMr5OdRiZ8NT3EOa-E5Lmw@mail.gmail.com
Backpatch-through: 14
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The term "super-exclusive lock" is a synonym for "buffer cleanup lock"
that first appeared in nbtree many years ago. Standardize things by
consistently using the term cleanup lock. This finishes work started by
commit 276db875.
There is no good reason to have two terms. But there is a good reason
to only have one: to avoid confusion around why VACUUM acquires a full
cleanup lock (not just an ordinary exclusive lock) in index AMs, during
ambulkdelete calls. This has nothing to do with protecting the physical
index data structure itself. It is needed to implement a locking
protocol that ensures that TIDs pointing to the heap/table structure
cannot get marked for recycling by VACUUM before it is safe (which is
somewhat similar to how VACUUM uses cleanup locks during its first heap
pass). Note that it isn't strictly necessary for index AMs to implement
this locking protocol -- several index AMs use an MVCC snapshot as their
sole interlock to prevent unsafe TID recycling.
In passing, update the nbtree README. Cleanly separate discussion of
the aforementioned index vacuuming locking protocol from discussion of
the "drop leaf page pin" optimization added by commit 2ed5b87f. We now
structure discussion of the latter by describing how individual index
scans may safely opt out of applying the standard locking protocol (and
so can avoid blocking progress by VACUUM). Also document why the
optimization is not safe to apply during nbtree index-only scans.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzngHgQa92tz6NQihf4nxJwRzCV36yMJO_i8dS+2mgEVKw@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkHPgsBBvGWjz=8PjNhDefy7XRkDKiT5NxMs-n5ZCf2dA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
REINDEX CONCURRENTLY run on a toast index or a toast relation could
corrupt the target indexes rebuilt, as a backend running in parallel
that manipulates toast values would directly release the lock on the
toast relation when its local operation is done, rather than releasing
the lock once the transaction that manipulated the toast values
committed.
The fix done here is simple: we now hold a ROW EXCLUSIVE lock on the
toast relation when saving or deleting a toast value until the
transaction working on them is committed, so as a concurrent reindex
happening in parallel would be able to wait for any activity and see any
new rows inserted (or deleted).
An isolation test is added to check after the case fixed here, which is
a bit fancy by design as it relies on allow_system_table_mods to rename
the toast table and its index to fixed names. This way, it is possible
to reindex them directly without any dependency on the OID of the
underlying relation. Note that this could not use a DO block either, as
REINDEX CONCURRENTLY cannot be run in a transaction block. The test is
backpatched down to 13, where it is possible, thanks to c4a7a39, to use
allow_system_table_mods in a test suite.
Reported-by: Alexey Ermakov
Analyzed-by: Andres Freund, Noah Misch
Author: Michael Paquier
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/17268-d2fb426e0895abd4@postgresql.org
Backpatch-through: 12
|
|
|
|
|
|
|
|
| |
Commit 4a92a1c3d removed the TimeLineID update from RecoveryInProgress,
update comments accordingly.
Author: Amul Sul <sulamul@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b96wyzs8N45jc-kYd-bTE02hRWQieLZRpsUtNbhap7_PuQ@mail.gmail.com
|
|
|
|
|
|
|
| |
Oversight in commit 4f8d9d12.
Reported-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoDm38Em0bvRqeQKr4HPvOj65Y8cUgCP4idMk39iaLrxyw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When determining whether an index update may be skipped by using HOT, we
can ignore attributes indexed only by BRIN indexes. There are no index
pointers to individual tuples in BRIN, and the page range summary will
be updated anyway as it relies on visibility info.
This also removes rd_indexattr list, and replaces it with rd_attrsvalid
flag. The list was not used anywhere, and a simple flag is sufficient.
Patch by Josef Simanek, various fixes and improvements by me.
Author: Josef Simanek
Reviewed-by: Tomas Vondra, Alvaro Herrera
Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
Like 5364b357fb11 did for pg_commit, change the formula used to
determine number of pg_commit_ts buffers, which helps performance with
larger servers.
Discussion: https://postgr.es/m/20210115220744.GA24457@alvherre.pgsql
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 8523492d simplified what it meant for an item to be considered
"dead" to VACUUM: TIDs collected in memory (in preparation for index
vacuuming) must always come from LP_DEAD stub line pointers in heap
pages, found following pruning. This formalized the idea that index
vacuuming (and heap vacuuming) are optional processes. Unlike pruning,
they can be delayed indefinitely, without any risk of that violating
fundamental invariants. For example, leaving LP_DEAD items behind
clearly won't add to the risk of transaction ID wraparound. You can't
have transaction ID wraparound without transaction IDs. Renaming
anything that references DEAD tuples (tuples with storage) reinforces
all this.
Code outside vacuumlazy.c continues to fudge the distinction between
dead/deleted tuples, and LP_DEAD items. This is necessary because
autovacuum scheduling is still mostly driven by "dead items/tuples"
statistics. In the future we may find it useful to replace this model
with something more sophisticated, as a step towards teaching autovacuum
to perform more frequent vacuuming that targeting individual indexes
that happen to be more prone to becoming bloated through version churn.
In passing, simplify some function signatures that deal with VACUUM's
dead_items array.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit moves the timestamp computation of the control file within
the routine of src/common/ in charge of updating the backend's control
file, which is shared by multiple frontend tools (pg_rewind,
pg_checksums and pg_resetwal) and the backend itself.
This change has as direct effect to update the control file's timestamp
when writing the control file in pg_rewind and pg_checksums, something
that is helpful to keep track of control file updates for those
operations, something also tracked by the backend at startup within its
logs. This part is arguably a bug, as ControlFileData->time should be
updated each time a new version of the control file is written, but this
is a behavior change so no backpatch is done.
Author: Amul Sul
Reviewed-by: Nathan Bossart, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/CAAJ_b97nd_ghRpyFV9Djf9RLXkoTbOUqnocq11WGq9TisX09Fw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code. In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.
xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy. It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random(). It is not cryptographically strong, but neither
are the functions it replaces.
Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself
Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
|
|
|
|
|
|
|
|
| |
The term "super-exclusive lock" is an acceptable synonym of "cleanup
lock". Even still, switching from one term to the other in the same
file is confusing. Standardize on "cleanup lock" within vacuumlazy.c.
Per a complaint from Andres Freund.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update vacuumlazy.c file header comments (as well as comments above the
lazy_scan_heap function) that were largely written before the
introduction of the HOT optimization, when lazy_scan_heap did far less,
and didn't actually prune during its initial heap pass.
Since lazy_scan_heap now outsources far more work to lower level
functions, it makes sense to introduce the function by talking about the
high level invariant that dictates the order in which each phase takes
place. Also deemphasize the case where we run out of memory for TIDs,
since delaying that discussion makes it easier to talk about issues of
central importance.
Finally, remove discussion of parallel VACUUM from header comments.
These don't add much, and are in the wrong place.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 2fd8685e7f simplified the checking of modified attributes that
takes place within heap_update(). This included a micro-optimization
affecting pages marked PD_PAGE_FULL: don't even try to use HOT to save a
few cycles on determining HOT safety. The assumption was that it won't
work out this time around, since it can't have worked out last time
around.
Remove the micro-optimization. It could only ever save cycles that are
consumed by the vast majority of heap_update() calls, which hardly seems
worth the added complexity. It also seems quite possible that there are
workloads that will do worse over time by repeated application of the
micro-optimization, despite saving some cycles on average, in the short
term.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAH2-WznU1L3+DMPr1F7o2eJBT7=3bAJoY6ZkWABAxNt+-afyTA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit ff9f111bce24 I mixed up inconsistent definitions of the LSN of
the first record in a page, when the previous record ends exactly at the
page boundary. The correct LSN is adjusted to skip the WAL page header;
I failed to use that when setting XLogReaderState->overwrittenRecPtr,
so at WAL replay time VerifyOverwriteContrecord would refuse to let
replay continue past that record.
Backpatch to 10. 9.6 also contains this bug, but it's no longer being
maintained.
Discussion: https://postgr.es/m/45597.1637694259@sss.pgh.pa.us
|
|
|
|
|
|
|
|
| |
Various places wanted to point out that tuple descriptors don't
contain the variable-length fields of pg_attribute. This started when
attacl was added, but more fields have been added since, and these
comments haven't been kept up to date consistently. Reword so that
the purpose is clearer and we don't have to keep updating them.
|
|
|
|
|
|
|
| |
d2ddfa681db removed ReadRecPtr/EndRecPtr, but two uses within an #ifdef
WAL_DEBUG escaped.
Discussion: https://postgr.es/m/20211124231206.gbadj5bblcljb6d5@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In most places, the variables necessarily store the same value as the
eponymous members of the XLogReaderState that we use during WAL
replay, because ReadRecord() assigns the values from the structure
members to the global variables just after XLogReadRecord() returns.
However, XLogBeginRead() adjusts the structure members but not the
global variables, so after XLogBeginRead() and before the completion
of XLogReadRecord() the values can differ. Otherwise, they must be
identical. According to my analysis, the only place where either
variable is referenced at a point where it might not have the same
value as the structure member is the refrence to EndRecPtr within
XLogPageRead.
Therefore, at every other place where we are using the global
variable, we can just switch to using the structure member instead,
and remove the global variable. However, we can, and in fact should,
do this in XLogPageRead() as well, because at that point in the code,
the global variable will actually store the start of the record we
want to read - either because it's where the last WAL record ended, or
because the read position has been changed using XLogBeginRead since
the last record was read. The structure member, on the other hand,
will already have been updated to point to the end of the record we
just read. Elsewhere, the latter is what we use as an argument to
emode_for_corrupt_record(), so we should do the same here.
This part of the patch is perhaps a bug fix, but I don't think it has
any important consequences, so no back-patch. The point here is just
to continue to whittle down the entirely excessive use of global
variables in xlog.c.
Discussion: http://postgr.es/m/CA+Tgmoao96EuNeSPd+hspRKcsCddu=b1h-QNRuKfY8VmfNQdfg@mail.gmail.com
|