diff options
author | Andres Freund <andres@anarazel.de> | 2015-09-26 19:04:25 +0200 |
---|---|---|
committer | Andres Freund <andres@anarazel.de> | 2015-09-26 19:04:25 +0200 |
commit | 4f627f897367f15702d59973f75f6391d5d3e06f (patch) | |
tree | da66600641ab3f84bc8e324e5f83eb1a1cb1af19 /src/backend/access/transam/xlog.c | |
parent | 2abfd9d5e9cb7fe5345c415475622a4a95ea61e2 (diff) | |
download | postgresql-4f627f897367f15702d59973f75f6391d5d3e06f.tar.gz postgresql-4f627f897367f15702d59973f75f6391d5d3e06f.zip |
Rework the way multixact truncations work.
The fact that multixact truncations are not WAL logged has caused a fair
share of problems. Amongst others it requires to do computations during
recovery while the database is not in a consistent state, delaying
truncations till checkpoints, and handling members being truncated, but
offset not.
We tried to put bandaids on lots of these issues over the last years,
but it seems time to change course. Thus this patch introduces WAL
logging for multixact truncations.
This allows:
1) to perform the truncation directly during VACUUM, instead of delaying it
to the checkpoint.
2) to avoid looking at the offsets SLRU for truncation during recovery,
we can just use the master's values.
3) simplify a fair amount of logic to keep in memory limits straight,
this has gotten much easier
During the course of fixing this a bunch of additional bugs had to be
fixed:
1) Data was not purged from memory the member's SLRU before deleting
segments. This happened to be hard or impossible to hit due to the
interlock between checkpoints and truncation.
2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
that doesn't work for offsets that haven't yet been flushed to
disk. Add code to flush the SLRUs to fix. Not pretty, but it feels
slightly safer to only make decisions based on actual on-disk state.
3) find_multixact_start() could be called concurrently with a truncation
and thus fail. Via SetOffsetVacuumLimit() that could lead to a round
of emergency vacuuming. The problem remains in
pg_get_multixact_members(), but that's quite harmless.
For now this is going to only get applied to 9.5+, leaving the issues in
the older branches in place. It is quite possible that we need to
backpatch at a later point though.
For the case this gets backpatched we need to handle that an updated
standby may be replaying WAL from a not-yet upgraded primary. We have to
recognize that situation and use "old style" truncation (i.e. looking at
the SLRUs) during WAL replay. In contrast to before, this now happens in
the startup process, when replaying a checkpoint record, instead of the
checkpointer. Doing truncation in the restartpoint is incorrect, they
can happen much later than the original checkpoint, thereby leading to
wraparound. To avoid "multixact_redo: unknown op code 48" errors
standbys would have to be upgraded before primaries.
A later patch will bump the WAL page magic, and remove the legacy
truncation codepaths. Legacy truncation support is just included to make
a possible future backpatch easier.
Discussion: 20150621192409.GA4797@alap3.anarazel.de
Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro
Backpatch: 9.5 for now
Diffstat (limited to 'src/backend/access/transam/xlog.c')
-rw-r--r-- | src/backend/access/transam/xlog.c | 53 |
1 files changed, 16 insertions, 37 deletions
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index a87f09ee47f..1ac1c0550dd 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6330,7 +6330,6 @@ StartupXLOG(void) SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB); SetCommitTsLimit(checkPoint.oldestCommitTs, checkPoint.newestCommitTs); - MultiXactSetSafeTruncate(checkPoint.oldestMulti); XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch; XLogCtl->ckptXid = checkPoint.nextXid; @@ -6347,10 +6346,8 @@ StartupXLOG(void) StartupReorderBuffer(); /* - * Startup MultiXact. We need to do this early for two reasons: one is - * that we might try to access multixacts when we do tuple freezing, and - * the other is we need its state initialized because we attempt - * truncation during restartpoints. + * Startup MultiXact. We need to do this early to be able to replay + * truncations. */ StartupMultiXact(); @@ -8508,12 +8505,6 @@ CreateCheckPoint(int flags) END_CRIT_SECTION(); /* - * Now that the checkpoint is safely on disk, we can update the point to - * which multixact can be truncated. - */ - MultiXactSetSafeTruncate(checkPoint.oldestMulti); - - /* * Let smgr do post-checkpoint cleanup (eg, deleting old files). */ smgrpostckpt(); @@ -8552,11 +8543,6 @@ CreateCheckPoint(int flags) if (!RecoveryInProgress()) TruncateSUBTRANS(GetOldestXmin(NULL, false)); - /* - * Truncate pg_multixact too. - */ - TruncateMultiXact(); - /* Real work is done, but log and update stats before releasing lock. */ LogCheckpointEnd(false); @@ -8887,21 +8873,6 @@ CreateRestartPoint(int flags) } /* - * Due to a historical accident multixact truncations are not WAL-logged, - * but just performed everytime the mxact horizon is increased. So, unless - * we explicitly execute truncations on a standby it will never clean out - * /pg_multixact which obviously is bad, both because it uses space and - * because we can wrap around into pre-existing data... - * - * We can only do the truncation here, after the UpdateControlFile() - * above, because we've now safely established a restart point. That - * guarantees we will not need to access those multis. - * - * It's probably worth improving this. - */ - TruncateMultiXact(); - - /* * Truncate pg_subtrans if possible. We can throw away all data before * the oldest XMIN of any running transaction. No future transaction will * attempt to reference any pg_subtrans entry older than that (see Asserts @@ -9261,9 +9232,14 @@ xlog_redo(XLogReaderState *record) LWLockRelease(OidGenLock); MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset); + + /* + * NB: This may perform multixact truncation when replaying WAL + * generated by an older primary. + */ + MultiXactAdvanceOldest(checkPoint.oldestMulti, + checkPoint.oldestMultiDB); SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB); - SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB); - MultiXactSetSafeTruncate(checkPoint.oldestMulti); /* * If we see a shutdown checkpoint while waiting for an end-of-backup @@ -9353,14 +9329,17 @@ xlog_redo(XLogReaderState *record) LWLockRelease(OidGenLock); MultiXactAdvanceNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset); + + /* + * NB: This may perform multixact truncation when replaying WAL + * generated by an older primary. + */ + MultiXactAdvanceOldest(checkPoint.oldestMulti, + checkPoint.oldestMultiDB); if (TransactionIdPrecedes(ShmemVariableCache->oldestXid, checkPoint.oldestXid)) SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB); - MultiXactAdvanceOldest(checkPoint.oldestMulti, - checkPoint.oldestMultiDB); - MultiXactSetSafeTruncate(checkPoint.oldestMulti); - /* ControlFile->checkPointCopy always tracks the latest ckpt XID */ ControlFile->checkPointCopy.nextXidEpoch = checkPoint.nextXidEpoch; ControlFile->checkPointCopy.nextXid = checkPoint.nextXid; |