diff options
author | Noah Misch <noah@leadboat.com> | 2020-08-15 10:15:53 -0700 |
---|---|---|
committer | Noah Misch <noah@leadboat.com> | 2020-08-15 10:15:57 -0700 |
commit | d4031d78460cbbb4ed2fb7be635f84bea0e9a0c1 (patch) | |
tree | d7ee345e4522f52d62b51f33db15d96ba9a8535a /src/backend/access | |
parent | 9d472b51e98777102d72f8ccdfb8cef10e087f74 (diff) | |
download | postgresql-d4031d78460cbbb4ed2fb7be635f84bea0e9a0c1.tar.gz postgresql-d4031d78460cbbb4ed2fb7be635f84bea0e9a0c1.zip |
Prevent concurrent SimpleLruTruncate() for any given SLRU.
The SimpleLruTruncate() header comment states the new coding rule. To
achieve this, add locktype "frozenid" and two LWLocks. This closes a
rare opportunity for data loss, which manifested as "apparent
wraparound" or "could not access status of transaction" errors. Data
loss is more likely in pg_multixact, due to released branches' thin
margin between multiStopLimit and multiWrapLimit. If a user's physical
replication primary logged ": apparent wraparound" messages, the user
should rebuild standbys of that primary regardless of symptoms. At less
risk is a cluster having emitted "not accepting commands" errors or
"must be vacuumed" warnings at some point. One can test a cluster for
this data loss by running VACUUM FREEZE in every database. Back-patch
to 9.5 (all supported versions).
Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
Diffstat (limited to 'src/backend/access')
-rw-r--r-- | src/backend/access/transam/slru.c | 8 | ||||
-rw-r--r-- | src/backend/access/transam/subtrans.c | 4 |
2 files changed, 10 insertions, 2 deletions
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c index fad5d363e32..67387979bdb 100644 --- a/src/backend/access/transam/slru.c +++ b/src/backend/access/transam/slru.c @@ -1163,6 +1163,14 @@ SimpleLruFlush(SlruCtl ctl, bool allow_redirtied) /* * Remove all segments before the one holding the passed page number + * + * All SLRUs prevent concurrent calls to this function, either with an LWLock + * or by calling it only as part of a checkpoint. Mutual exclusion must begin + * before computing cutoffPage. Mutual exclusion must end after any limit + * update that would permit other backends to write fresh data into the + * segment immediately preceding the one containing cutoffPage. Otherwise, + * when the SLRU is quite full, SimpleLruTruncate() might delete that segment + * after it has accrued freshly-written data. */ void SimpleLruTruncate(SlruCtl ctl, int cutoffPage) diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c index 4faa21f5aef..ef63b6d98a4 100644 --- a/src/backend/access/transam/subtrans.c +++ b/src/backend/access/transam/subtrans.c @@ -347,8 +347,8 @@ ExtendSUBTRANS(TransactionId newestXact) /* * Remove all SUBTRANS segments before the one holding the passed transaction ID * - * This is normally called during checkpoint, with oldestXact being the - * oldest TransactionXmin of any running transaction. + * oldestXact is the oldest TransactionXmin of any running transaction. This + * is called only during checkpoint. */ void TruncateSUBTRANS(TransactionId oldestXact) |