diff options
author | Andres Freund <andres@anarazel.de> | 2020-08-14 12:15:38 -0700 |
---|---|---|
committer | Andres Freund <andres@anarazel.de> | 2020-08-14 15:33:35 -0700 |
commit | 941697c3c1ae5d6ee153065adb96e1e63ee11224 (patch) | |
tree | 7a81a69bcec132293286fe3867da82a2a754292e /src/backend/access/transam | |
parent | 2ba5b2db7943742e100834d99548c5d2661a105b (diff) | |
download | postgresql-941697c3c1ae5d6ee153065adb96e1e63ee11224.tar.gz postgresql-941697c3c1ae5d6ee153065adb96e1e63ee11224.zip |
snapshot scalability: Introduce dense array of in-progress xids.
The new array contains the xids for all connected backends / in-use
PGPROC entries in a dense manner (in contrast to the PGPROC/PGXACT
arrays which can have unused entries interspersed).
This improves performance because GetSnapshotData() always needs to
scan the xids of all live procarray entries and now there's no need to
go through the procArray->pgprocnos indirection anymore.
As the set of running top-level xids changes rarely, compared to the
number of snapshots taken, this substantially increases the likelihood
of most data required for a snapshot being in l2 cache. In
read-mostly workloads scanning the xids[] array will sufficient to
build a snapshot, as most backends will not have an xid assigned.
To keep the xid array dense ProcArrayRemove() needs to move entries
behind the to-be-removed proc's one further up in the array. Obviously
moving array entries cannot happen while a backend sets it
xid. I.e. locking needs to prevent that array entries are moved while
a backend modifies its xid.
To avoid locking ProcArrayLock in GetNewTransactionId() - a fairly hot
spot already - ProcArrayAdd() / ProcArrayRemove() now needs to hold
XidGenLock in addition to ProcArrayLock. Adding / Removing a procarray
entry is not a very frequent operation, even taking 2PC into account.
Due to the above, the dense array entries can only be read or modified
while holding ProcArrayLock and/or XidGenLock. This prevents a
concurrent ProcArrayRemove() from shifting the dense array while it is
accessed concurrently.
While the new dense array is very good when needing to look at all
xids it is less suitable when accessing a single backend's xid. In
particular it would be problematic to have to acquire a lock to access
a backend's own xid. Therefore a backend's xid is not just stored in
the dense array, but also in PGPROC. This also allows a backend to
only access the shared xid value when the backend had acquired an
xid.
The infrastructure added in this commit will be used for the remaining
PGXACT fields in subsequent commits. They are kept separate to make
review easier.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
Diffstat (limited to 'src/backend/access/transam')
-rw-r--r-- | src/backend/access/transam/README | 29 | ||||
-rw-r--r-- | src/backend/access/transam/clog.c | 8 | ||||
-rw-r--r-- | src/backend/access/transam/twophase.c | 31 | ||||
-rw-r--r-- | src/backend/access/transam/varsup.c | 20 |
4 files changed, 45 insertions, 43 deletions
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README index eab8edd20ec..c5f09667ba1 100644 --- a/src/backend/access/transam/README +++ b/src/backend/access/transam/README @@ -251,10 +251,10 @@ enforce, and it assists with some other issues as explained below.) The implementation of this is that GetSnapshotData takes the ProcArrayLock in shared mode (so that multiple backends can take snapshots in parallel), but ProcArrayEndTransaction must take the ProcArrayLock in exclusive mode -while clearing MyPgXact->xid at transaction end (either commit or abort). -(To reduce context switching, when multiple transactions commit nearly -simultaneously, we have one backend take ProcArrayLock and clear the XIDs -of multiple processes at once.) +while clearing the ProcGlobal->xids[] entry at transaction end (either +commit or abort). (To reduce context switching, when multiple transactions +commit nearly simultaneously, we have one backend take ProcArrayLock and +clear the XIDs of multiple processes at once.) ProcArrayEndTransaction also holds the lock while advancing the shared latestCompletedXid variable. This allows GetSnapshotData to use @@ -278,12 +278,12 @@ present in the ProcArray, or not running anymore. (This guarantee doesn't apply to subtransaction XIDs, because of the possibility that there's not room for them in the subxid array; instead we guarantee that they are present or the overflow flag is set.) If a backend released XidGenLock -before storing its XID into MyPgXact, then it would be possible for another -backend to allocate and commit a later XID, causing latestCompletedXid to -pass the first backend's XID, before that value became visible in the +before storing its XID into ProcGlobal->xids[], then it would be possible for +another backend to allocate and commit a later XID, causing latestCompletedXid +to pass the first backend's XID, before that value became visible in the ProcArray. That would break ComputeXidHorizons, as discussed below. -We allow GetNewTransactionId to store the XID into MyPgXact->xid (or the +We allow GetNewTransactionId to store the XID into ProcGlobal->xids[] (or the subxid array) without taking ProcArrayLock. This was once necessary to avoid deadlock; while that is no longer the case, it's still beneficial for performance. We are thereby relying on fetch/store of an XID to be atomic, @@ -382,12 +382,13 @@ Top-level transactions do not have a parent, so they leave their pg_subtrans entries set to the default value of zero (InvalidTransactionId). pg_subtrans is used to check whether the transaction in question is still -running --- the main Xid of a transaction is recorded in the PGXACT struct, -but since we allow arbitrary nesting of subtransactions, we can't fit all Xids -in shared memory, so we have to store them on disk. Note, however, that for -each transaction we keep a "cache" of Xids that are known to be part of the -transaction tree, so we can skip looking at pg_subtrans unless we know the -cache has been overflowed. See storage/ipc/procarray.c for the gory details. +running --- the main Xid of a transaction is recorded in ProcGlobal->xids[], +with a copy in PGPROC->xid, but since we allow arbitrary nesting of +subtransactions, we can't fit all Xids in shared memory, so we have to store +them on disk. Note, however, that for each transaction we keep a "cache" of +Xids that are known to be part of the transaction tree, so we can skip looking +at pg_subtrans unless we know the cache has been overflowed. See +storage/ipc/procarray.c for the gory details. slru.c is the supporting mechanism for both pg_xact and pg_subtrans. It implements the LRU policy for in-memory buffer pages. The high-level routines diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index dd2f4d5bc7e..a4599e96610 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -285,15 +285,15 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids, * updates for multiple backends so that the number of times XactSLRULock * needs to be acquired is reduced. * - * For this optimization to be safe, the XID in MyPgXact and the subxids - * in MyProc must be the same as the ones for which we're setting the - * status. Check that this is the case. + * For this optimization to be safe, the XID and subxids in MyProc must be + * the same as the ones for which we're setting the status. Check that + * this is the case. * * For this optimization to be efficient, we shouldn't have too many * sub-XIDs and all of the XIDs for which we're adjusting clog should be * on the same page. Check those conditions, too. */ - if (all_xact_same_page && xid == MyPgXact->xid && + if (all_xact_same_page && xid == MyProc->xid && nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT && nsubxids == MyPgXact->nxids && memcmp(subxids, MyProc->subxids.xids, diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index eb5f4680a3d..a0398bf3a3e 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -351,7 +351,7 @@ AtAbort_Twophase(void) /* * This is called after we have finished transferring state to the prepared - * PGXACT entry. + * PGPROC entry. */ void PostPrepare_Twophase(void) @@ -463,7 +463,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid, proc->waitStatus = PROC_WAIT_STATUS_OK; /* We set up the gxact's VXID as InvalidBackendId/XID */ proc->lxid = (LocalTransactionId) xid; - pgxact->xid = xid; + proc->xid = xid; Assert(proc->xmin == InvalidTransactionId); proc->delayChkpt = false; pgxact->vacuumFlags = 0; @@ -768,7 +768,6 @@ pg_prepared_xact(PG_FUNCTION_ARGS) { GlobalTransaction gxact = &status->array[status->currIdx++]; PGPROC *proc = &ProcGlobal->allProcs[gxact->pgprocno]; - PGXACT *pgxact = &ProcGlobal->allPgXact[gxact->pgprocno]; Datum values[5]; bool nulls[5]; HeapTuple tuple; @@ -783,7 +782,7 @@ pg_prepared_xact(PG_FUNCTION_ARGS) MemSet(values, 0, sizeof(values)); MemSet(nulls, 0, sizeof(nulls)); - values[0] = TransactionIdGetDatum(pgxact->xid); + values[0] = TransactionIdGetDatum(proc->xid); values[1] = CStringGetTextDatum(gxact->gid); values[2] = TimestampTzGetDatum(gxact->prepared_at); values[3] = ObjectIdGetDatum(gxact->owner); @@ -829,9 +828,8 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held) for (i = 0; i < TwoPhaseState->numPrepXacts; i++) { GlobalTransaction gxact = TwoPhaseState->prepXacts[i]; - PGXACT *pgxact = &ProcGlobal->allPgXact[gxact->pgprocno]; - if (pgxact->xid == xid) + if (gxact->xid == xid) { result = gxact; break; @@ -987,8 +985,7 @@ void StartPrepare(GlobalTransaction gxact) { PGPROC *proc = &ProcGlobal->allProcs[gxact->pgprocno]; - PGXACT *pgxact = &ProcGlobal->allPgXact[gxact->pgprocno]; - TransactionId xid = pgxact->xid; + TransactionId xid = gxact->xid; TwoPhaseFileHeader hdr; TransactionId *children; RelFileNode *commitrels; @@ -1140,15 +1137,15 @@ EndPrepare(GlobalTransaction gxact) /* * Mark the prepared transaction as valid. As soon as xact.c marks - * MyPgXact as not running our XID (which it will do immediately after + * MyProc as not running our XID (which it will do immediately after * this function returns), others can commit/rollback the xact. * * NB: a side effect of this is to make a dummy ProcArray entry for the - * prepared XID. This must happen before we clear the XID from MyPgXact, - * else there is a window where the XID is not running according to - * TransactionIdIsInProgress, and onlookers would be entitled to assume - * the xact crashed. Instead we have a window where the same XID appears - * twice in ProcArray, which is OK. + * prepared XID. This must happen before we clear the XID from MyProc / + * ProcGlobal->xids[], else there is a window where the XID is not running + * according to TransactionIdIsInProgress, and onlookers would be entitled + * to assume the xact crashed. Instead we have a window where the same + * XID appears twice in ProcArray, which is OK. */ MarkAsPrepared(gxact, false); @@ -1404,7 +1401,6 @@ FinishPreparedTransaction(const char *gid, bool isCommit) { GlobalTransaction gxact; PGPROC *proc; - PGXACT *pgxact; TransactionId xid; char *buf; char *bufptr; @@ -1423,8 +1419,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit) */ gxact = LockGXact(gid, GetUserId()); proc = &ProcGlobal->allProcs[gxact->pgprocno]; - pgxact = &ProcGlobal->allPgXact[gxact->pgprocno]; - xid = pgxact->xid; + xid = gxact->xid; /* * Read and validate 2PC state data. State data will typically be stored @@ -1726,7 +1721,7 @@ CheckPointTwoPhase(XLogRecPtr redo_horizon) for (i = 0; i < TwoPhaseState->numPrepXacts; i++) { /* - * Note that we are using gxact not pgxact so this works in recovery + * Note that we are using gxact not PGPROC so this works in recovery * also */ GlobalTransaction gxact = TwoPhaseState->prepXacts[i]; diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c index 2ef0f4991ca..4c91b343ecd 100644 --- a/src/backend/access/transam/varsup.c +++ b/src/backend/access/transam/varsup.c @@ -38,7 +38,8 @@ VariableCache ShmemVariableCache = NULL; * Allocate the next FullTransactionId for a new transaction or * subtransaction. * - * The new XID is also stored into MyPgXact before returning. + * The new XID is also stored into MyProc->xid/ProcGlobal->xids[] before + * returning. * * Note: when this is called, we are actually already inside a valid * transaction, since XIDs are now not allocated until the transaction @@ -65,7 +66,8 @@ GetNewTransactionId(bool isSubXact) if (IsBootstrapProcessingMode()) { Assert(!isSubXact); - MyPgXact->xid = BootstrapTransactionId; + MyProc->xid = BootstrapTransactionId; + ProcGlobal->xids[MyProc->pgxactoff] = BootstrapTransactionId; return FullTransactionIdFromEpochAndXid(0, BootstrapTransactionId); } @@ -190,10 +192,10 @@ GetNewTransactionId(bool isSubXact) * latestCompletedXid is present in the ProcArray, which is essential for * correct OldestXmin tracking; see src/backend/access/transam/README. * - * Note that readers of PGXACT xid fields should be careful to fetch the - * value only once, rather than assume they can read a value multiple - * times and get the same answer each time. Note we are assuming that - * TransactionId and int fetch/store are atomic. + * Note that readers of ProcGlobal->xids/PGPROC->xid should be careful + * to fetch the value for each proc only once, rather than assume they can + * read a value multiple times and get the same answer each time. Note we + * are assuming that TransactionId and int fetch/store are atomic. * * The same comments apply to the subxact xid count and overflow fields. * @@ -219,7 +221,11 @@ GetNewTransactionId(bool isSubXact) * answer later on when someone does have a reason to inquire.) */ if (!isSubXact) - MyPgXact->xid = xid; /* LWLockRelease acts as barrier */ + { + /* LWLockRelease acts as barrier */ + MyProc->xid = xid; + ProcGlobal->xids[MyProc->pgxactoff] = xid; + } else { int nxids = MyPgXact->nxids; |