Truncate pg_multixact/'s contents during crash recovery

Commit 9dc842f08 of 8.2 era prevented MultiXact truncation during crash recovery, because there was no guarantee that enough state had been setup, and because it wasn't deemed to be a good idea to remove data during crash recovery anyway. Since then, due to Hot-Standby, streaming replication and PITR, the amount of time a cluster can spend doing crash recovery has increased significantly, to the point that a cluster may even never come out of it. This has made not truncating the content of pg_multixact/ not defensible anymore. To fix, take care to setup enough state for multixact truncation before crash recovery starts (easy since checkpoints contain the required information), and move the current end-of-recovery actions to a new TrimMultiXact() function, analogous to TrimCLOG(). At some later point, this should probably done similarly to the way clog.c is doing it, which is to just WAL log truncations, but we can't do that for the back branches. Back-patch to 9.0. 8.4 also has the problem, but since there's no hot standby there, it's much less pressing. In 9.2 and earlier, this patch is simpler than in newer branches, because multixact access during recovery isn't required. Add appropriate checks to make sure that's not happening. Andres Freund
author: Alvaro Herrera <alvherre@alvh.no-ip.org> 2013-11-29 11:26:41 -0300
committer: Alvaro Herrera <alvherre@alvh.no-ip.org> 2013-11-29 22:02:15 -0300
commit: 6d0b8cd2f35257a3f16d70df59aa6a12c89f17ff (patch)
tree: 5685d175ad3a30fae759a36c5ff30012b786f8b0
parent: 4ab4e5c6bb04dc1fc747baed3d42d5aa2ea44dfa (diff)
download: postgresql-6d0b8cd2f35257a3f16d70df59aa6a12c89f17ff.tar.gz
postgresql-6d0b8cd2f35257a3f16d70df59aa6a12c89f17ff.zip
1 files changed, 40 insertions, 14 deletions
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 1cb3dfab375..d2a3a7fbf58 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -827,6 +827,10 @@ GetNewMultiXactId(int nxids, MultiXactOffset *offset)
 	/* MultiXactIdSetOldestMember() must have been called already */
 	Assert(MultiXactIdIsValid(OldestMemberMXactId[MyBackendId]));
 
+	/* safety check, we should never get this far in a HS slave */
+	if (RecoveryInProgress())
+		elog(ERROR, "cannot assign MultiXactIds during recovery");
+
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 
 	/* Handle wraparound of the nextMXact counter */
@@ -914,6 +918,10 @@ GetMultiXactIdMembers(MultiXactId multi, TransactionId **xids)
 
 	Assert(MultiXactIdIsValid(multi));
 
+	/* safety check, we should never get this far in a HS slave */
+	if (RecoveryInProgress())
+		elog(ERROR, "cannot GetMultiXactIdMembers() during recovery");
+
 	/* See if the MultiXactId is in the local cache */
 	length = mXactCacheGetById(multi, xids);
 	if (length >= 0)
@@ -1513,11 +1521,8 @@ ZeroMultiXactMemberPage(int pageno, bool writeXlog)
  * This must be called ONCE during postmaster or standalone-backend startup.
  *
  * StartupXLOG has already established nextMXact/nextOffset by calling
- * MultiXactSetNextMXact and/or MultiXactAdvanceNextMXact.	Note that we
- * may already have replayed WAL data into the SLRU files.
- *
- * We don't need any locks here, really; the SLRU locks are taken
- * only because slru.c expects to be called with locks held.
+ * MultiXactSetNextMXact and/or MultiXactAdvanceNextMXact, but we haven't yet
+ * replayed WAL.
  */
 void
 StartupMultiXact(void)
@@ -1525,13 +1530,39 @@ StartupMultiXact(void)
 	MultiXactId multi = MultiXactState->nextMXact;
 	MultiXactOffset offset = MultiXactState->nextOffset;
 	int			pageno;
+
+	/*
+	 * Initialize our idea of the latest page number.
+	 */
+	pageno = MultiXactIdToOffsetPage(multi);
+	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+
+	/*
+	 * Initialize our idea of the latest page number.
+	 */
+	pageno = MXOffsetToMemberPage(offset);
+	MultiXactMemberCtl->shared->latest_page_number = pageno;
+}
+
+/*
+ * This must be called ONCE at the end of startup/recovery.
+ *
+ * We don't need any locks here, really; the SLRU locks are taken only because
+ * slru.c expects to be called with locks held.
+ */
+void
+TrimMultiXact(void)
+{
+	MultiXactId multi = MultiXactState->nextMXact;
+	MultiXactOffset offset = MultiXactState->nextOffset;
+	int			pageno;
 	int			entryno;
 
 	/* Clean up offsets state */
 	LWLockAcquire(MultiXactOffsetControlLock, LW_EXCLUSIVE);
 
 	/*
-	 * Initialize our idea of the latest page number.
+	 * (Re-)Initialize our idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
 	MultiXactOffsetCtl->shared->latest_page_number = pageno;
@@ -1561,7 +1592,7 @@ StartupMultiXact(void)
 	LWLockAcquire(MultiXactMemberControlLock, LW_EXCLUSIVE);
 
 	/*
-	 * Initialize our idea of the latest page number.
+	 * (Re-)Initialize our idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
 	MultiXactMemberCtl->shared->latest_page_number = pageno;
@@ -1640,14 +1671,9 @@ CheckPointMultiXact(void)
 
 	/*
 	 * Truncate the SLRU files.  This could be done at any time, but
-	 * checkpoint seems a reasonable place for it.	There is one exception: if
-	 * we are called during xlog recovery, then shared->latest_page_number
-	 * isn't valid (because StartupMultiXact hasn't been called yet) and so
-	 * SimpleLruTruncate would get confused.  It seems best not to risk
-	 * removing any data during recovery anyway, so don't truncate.
+	 * checkpoint seems a reasonable place for it.
 	 */
-	if (!RecoveryInProgress())
-		TruncateMultiXact();
+	TruncateMultiXact();
 
 	TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);
 }
author	Alvaro Herrera <alvherre@alvh.no-ip.org>	2013-11-29 11:26:41 -0300
committer	Alvaro Herrera <alvherre@alvh.no-ip.org>	2013-11-29 22:02:15 -0300
commit	6d0b8cd2f35257a3f16d70df59aa6a12c89f17ff (patch)
tree	5685d175ad3a30fae759a36c5ff30012b786f8b0
parent	4ab4e5c6bb04dc1fc747baed3d42d5aa2ea44dfa (diff)
download	postgresql-6d0b8cd2f35257a3f16d70df59aa6a12c89f17ff.tar.gz postgresql-6d0b8cd2f35257a3f16d70df59aa6a12c89f17ff.zip