aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/transam
diff options
context:
space:
mode:
Diffstat (limited to 'src/backend/access/transam')
-rw-r--r--src/backend/access/transam/README45
-rw-r--r--src/backend/access/transam/xact.c15
-rw-r--r--src/backend/access/transam/xlogutils.c18
3 files changed, 14 insertions, 64 deletions
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index 28045f30876..ad4083eb6b5 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -717,38 +717,6 @@ then restart recovery. This is part of the reason for not writing a WAL
entry until we've successfully done the original action.
-Skipping WAL for New RelFileNode
---------------------------------
-
-Under wal_level=minimal, if a change modifies a relfilenode that ROLLBACK
-would unlink, in-tree access methods write no WAL for that change. Code that
-writes WAL without calling RelationNeedsWAL() must check for this case. This
-skipping is mandatory. If a WAL-writing change preceded a WAL-skipping change
-for the same block, REDO could overwrite the WAL-skipping change. If a
-WAL-writing change followed a WAL-skipping change for the same block, a
-related problem would arise. When a WAL record contains no full-page image,
-REDO expects the page to match its contents from just before record insertion.
-A WAL-skipping change may not reach disk at all, violating REDO's expectation
-under full_page_writes=off. For any access method, CommitTransaction() writes
-and fsyncs affected blocks before recording the commit.
-
-Prefer to do the same in future access methods. However, two other approaches
-can work. First, an access method can irreversibly transition a given fork
-from WAL-skipping to WAL-writing by calling FlushRelationBuffers() and
-smgrimmedsync(). Second, an access method can opt to write WAL
-unconditionally for permanent relations. Under these approaches, the access
-method callbacks must not call functions that react to RelationNeedsWAL().
-
-This applies only to WAL records whose replay would modify bytes stored in the
-new relfilenode. It does not apply to other records about the relfilenode,
-such as XLOG_SMGR_CREATE. Because it operates at the level of individual
-relfilenodes, RelationNeedsWAL() can differ for tightly-coupled relations.
-Consider "CREATE TABLE t (); BEGIN; ALTER TABLE t ADD c text; ..." in which
-ALTER TABLE adds a TOAST relation. The TOAST relation will skip WAL, while
-the table owning it will not. ALTER TABLE SET TABLESPACE will cause a table
-to skip WAL, but that won't affect its indexes.
-
-
Asynchronous Commit
-------------------
@@ -852,12 +820,13 @@ Changes to a temp table are not WAL-logged, hence could reach disk in
advance of T1's commit, but we don't care since temp table contents don't
survive crashes anyway.
-Database writes that skip WAL for new relfilenodes are also safe. In these
-cases it's entirely possible for the data to reach disk before T1's commit,
-because T1 will fsync it down to disk without any sort of interlock. However,
-all these paths are designed to write data that no other transaction can see
-until after T1 commits. The situation is thus not different from ordinary
-WAL-logged updates.
+Database writes made via any of the paths we have introduced to avoid WAL
+overhead for bulk updates are also safe. In these cases it's entirely
+possible for the data to reach disk before T1's commit, because T1 will
+fsync it down to disk without any sort of interlock, as soon as it finishes
+the bulk update. However, all these paths are designed to write data that
+no other transaction can see until after T1 commits. The situation is thus
+not different from ordinary WAL-logged updates.
Transaction Emulation during Recovery
-------------------------------------
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 337fee2928d..bd396f0f08f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2107,13 +2107,6 @@ CommitTransaction(void)
*/
PreCommit_on_commit_actions();
- /*
- * Synchronize files that are created and not WAL-logged during this
- * transaction. This must happen before AtEOXact_RelationMap(), so that we
- * don't see committed-but-broken files after a crash.
- */
- smgrDoPendingSyncs(true);
-
/* close large objects before lower-level cleanup */
AtEOXact_LargeObject(true);
@@ -2347,13 +2340,6 @@ PrepareTransaction(void)
*/
PreCommit_on_commit_actions();
- /*
- * Synchronize files that are created and not WAL-logged during this
- * transaction. This must happen before EndPrepare(), so that we don't see
- * committed-but-broken files after a crash and COMMIT PREPARED.
- */
- smgrDoPendingSyncs(true);
-
/* close large objects before lower-level cleanup */
AtEOXact_LargeObject(true);
@@ -2672,7 +2658,6 @@ AbortTransaction(void)
*/
AfterTriggerEndXact(false); /* 'false' means it's abort */
AtAbort_Portals();
- smgrDoPendingSyncs(false);
AtEOXact_LargeObject(false);
AtAbort_Notify();
AtEOXact_RelationMap(false, is_parallel_worker);
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 8fa78d87303..10a663bae62 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -544,8 +544,6 @@ typedef FakeRelCacheEntryData *FakeRelCacheEntry;
* fields related to physical storage, like rd_rel, are initialized, so the
* fake entry is only usable in low-level operations like ReadBuffer().
*
- * This is also used for syncing WAL-skipped files.
- *
* Caller must free the returned entry with FreeFakeRelcacheEntry().
*/
Relation
@@ -554,20 +552,18 @@ CreateFakeRelcacheEntry(RelFileNode rnode)
FakeRelCacheEntry fakeentry;
Relation rel;
+ Assert(InRecovery);
+
/* Allocate the Relation struct and all related space in one block. */
fakeentry = palloc0(sizeof(FakeRelCacheEntryData));
rel = (Relation) fakeentry;
rel->rd_rel = &fakeentry->pgc;
rel->rd_node = rnode;
-
- /*
- * We will never be working with temp rels during recovery or while
- * syncing WAL-skipped files.
- */
+ /* We will never be working with temp rels during recovery */
rel->rd_backend = InvalidBackendId;
- /* It must be a permanent table here */
+ /* It must be a permanent table if we're in recovery. */
rel->rd_rel->relpersistence = RELPERSISTENCE_PERMANENT;
/* We don't know the name of the relation; use relfilenode instead */
@@ -576,9 +572,9 @@ CreateFakeRelcacheEntry(RelFileNode rnode)
/*
* We set up the lockRelId in case anything tries to lock the dummy
* relation. Note that this is fairly bogus since relNode may be
- * different from the relation's OID. It shouldn't really matter though.
- * In recovery, we are running by ourselves and can't have any lock
- * conflicts. While syncing, we already hold AccessExclusiveLock.
+ * different from the relation's OID. It shouldn't really matter though,
+ * since we are presumably running by ourselves and can't have any lock
+ * conflicts ...
*/
rel->rd_lockInfo.lockRelId.dbId = rnode.dbNode;
rel->rd_lockInfo.lockRelId.relId = rnode.relNode;