Improve performance of subsystems on top of SLRU

More precisely, what we do here is make the SLRU cache sizes configurable with new GUCs, so that sites with high concurrency and big ranges of transactions in flight (resp. multixacts/subtransactions) can benefit from bigger caches. In order for this to work with good performance, two additional changes are made: 1. the cache is divided in "banks" (to borrow terminology from CPU caches), and algorithms such as eviction buffer search only affect one specific bank. This forestalls the problem that linear searching for a specific buffer across the whole cache takes too long: we only have to search the specific bank, whose size is small. This work is authored by Andrey Borodin. 2. Change the locking regime for the SLRU banks, so that each bank uses a separate LWLock. This allows for increased scalability. This work is authored by Dilip Kumar. (A part of this was previously committed as d172b717c6f4.) Special care is taken so that the algorithms that can potentially traverse more than one bank release one bank's lock before acquiring the next. This should happen rarely, but particularly clog.c's group commit feature needed code adjustment to cope with this. I (Álvaro) also added lots of comments to make sure the design is sound. The new GUCs match the names introduced by bcdfa5f2e2f2 in the pg_stat_slru view. The default values for these parameters are similar to the previous sizes of each SLRU. commit_ts, clog and subtrans accept value 0, which means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB of shared_buffers), with a cap of 8MB. (A new slru.c function SimpleLruAutotuneBuffers() was added to support this.) The cap was previously 1MB for clog, so for sites with more than 512MB of shared memory the total memory used increases, which is likely a good tradeoff. However, other SLRUs (notably multixact ones) retain smaller sizes and don't support a configured value of 0. These values based on shared_buffers may need to be revisited, but that's an easy change. There was some resistance to adding these new GUCs: it would be better to adjust to memory pressure automatically somehow, for example by stealing memory from shared_buffers (where the caches can grow and shrink naturally). However, doing that seems to be a much larger project and one which has made virtually no progress in several years, and because this is such a pain point for so many users, here we take the pragmatic approach. Author: Andrey Borodin <x4mmm@yandex-team.ru> Author: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova, Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra, Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev). Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
author: Alvaro Herrera <alvherre@alvh.no-ip.org> 2024-02-28 17:05:31 +0100
committer: Alvaro Herrera <alvherre@alvh.no-ip.org> 2024-02-28 17:05:31 +0100
commit: 53c2a97a92665be6bd7d70bd62ae6158fe4db96e (patch)
tree: 88d853f098fe925024b82e72f2beea523e24cbe6 /src/backend/commands/async.c
parent: 1c1eec0f2d88b7e823af959103b2100da493caa9 (diff)
download: postgresql-53c2a97a92665be6bd7d70bd62ae6158fe4db96e.tar.gz
postgresql-53c2a97a92665be6bd7d70bd62ae6158fe4db96e.zip
1 files changed, 43 insertions, 18 deletions
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 490c84dc199..23444f2a800 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -116,7 +116,7 @@
  * frontend during startup.)  The above design guarantees that notifies from
  * other backends will never be missed by ignoring self-notifies.
  *
- * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS)
+ * The amount of shared memory used for notify management (notify_buffers)
  * can be varied without affecting anything but performance.  The maximum
  * amount of notification data that can be queued at one time is determined
  * by max_notify_queue_pages GUC.
@@ -148,6 +148,7 @@
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
+#include "utils/guc_hooks.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
@@ -234,7 +235,7 @@ typedef struct QueuePosition
  *
  * Resist the temptation to make this really large.  While that would save
  * work in some places, it would add cost in others.  In particular, this
- * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends
+ * should likely be less than notify_buffers, to ensure that backends
  * catch up before the pages they'll need to read fall out of SLRU cache.
  */
 #define QUEUE_CLEANUP_DELAY 4
@@ -266,9 +267,10 @@ typedef struct QueueBackendStatus
  * both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
  * can change the tail pointers.
  *
- * NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
+ * SLRU buffer pool is divided in banks and bank wise SLRU lock is used as
+ * the control lock for the pg_notify SLRU buffers.
  * In order to avoid deadlocks, whenever we need multiple locks, we first get
- * NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
+ * NotifyQueueTailLock, then NotifyQueueLock, and lastly SLRU bank lock.
  *
  * Each backend uses the backend[] array entry with index equal to its
  * BackendId (which can range from 1 to MaxBackends).  We rely on this to make
@@ -492,7 +494,7 @@ AsyncShmemSize(void)
 	size = mul_size(MaxBackends + 1, sizeof(QueueBackendStatus));
 	size = add_size(size, offsetof(AsyncQueueControl, backend));
 
-	size = add_size(size, SimpleLruShmemSize(NUM_NOTIFY_BUFFERS, 0));
+	size = add_size(size, SimpleLruShmemSize(notify_buffers, 0));
 
 	return size;
 }
@@ -541,8 +543,8 @@ AsyncShmemInit(void)
 	 * names are used in order to avoid wraparound.
 	 */
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
-	SimpleLruInit(NotifyCtl, "notify", NUM_NOTIFY_BUFFERS, 0,
-				  NotifySLRULock, "pg_notify", LWTRANCHE_NOTIFY_BUFFER,
+	SimpleLruInit(NotifyCtl, "notify", notify_buffers, 0,
+				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
 				  SYNC_HANDLER_NONE, true);
 
 	if (!found)
@@ -1356,7 +1358,7 @@ asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe)
  * Eventually we will return NULL indicating all is done.
  *
  * We are holding NotifyQueueLock already from the caller and grab
- * NotifySLRULock locally in this function.
+ * page specific SLRU bank lock locally in this function.
  */
 static ListCell *
 asyncQueueAddEntries(ListCell *nextNotify)
@@ -1366,9 +1368,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	int64		pageno;
 	int			offset;
 	int			slotno;
-
-	/* We hold both NotifyQueueLock and NotifySLRULock during this operation */
-	LWLockAcquire(NotifySLRULock, LW_EXCLUSIVE);
+	LWLock	   *prevlock;
 
 	/*
 	 * We work with a local copy of QUEUE_HEAD, which we write back to shared
@@ -1389,6 +1389,11 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	 * page should be initialized already, so just fetch it.
 	 */
 	pageno = QUEUE_POS_PAGE(queue_head);
+	prevlock = SimpleLruGetBankLock(NotifyCtl, pageno);
+
+	/* We hold both NotifyQueueLock and SLRU bank lock during this operation */
+	LWLockAcquire(prevlock, LW_EXCLUSIVE);
+
 	if (QUEUE_POS_IS_ZERO(queue_head))
 		slotno = SimpleLruZeroPage(NotifyCtl, pageno);
 	else
@@ -1434,6 +1439,17 @@ asyncQueueAddEntries(ListCell *nextNotify)
 		/* Advance queue_head appropriately, and detect if page is full */
 		if (asyncQueueAdvance(&(queue_head), qe.length))
 		{
+			LWLock	   *lock;
+
+			pageno = QUEUE_POS_PAGE(queue_head);
+			lock = SimpleLruGetBankLock(NotifyCtl, pageno);
+			if (lock != prevlock)
+			{
+				LWLockRelease(prevlock);
+				LWLockAcquire(lock, LW_EXCLUSIVE);
+				prevlock = lock;
+			}
+
 			/*
 			 * Page is full, so we're done here, but first fill the next page
 			 * with zeroes.  The reason to do this is to ensure that slru.c's
@@ -1460,7 +1476,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
 	/* Success, so update the global QUEUE_HEAD */
 	QUEUE_HEAD = queue_head;
 
-	LWLockRelease(NotifySLRULock);
+	LWLockRelease(prevlock);
 
 	return nextNotify;
 }
@@ -1931,9 +1947,9 @@ asyncQueueReadAllNotifications(void)
 
 			/*
 			 * We copy the data from SLRU into a local buffer, so as to avoid
-			 * holding the NotifySLRULock while we are examining the entries
-			 * and possibly transmitting them to our frontend.  Copy only the
-			 * part of the page we will actually inspect.
+			 * holding the SLRU lock while we are examining the entries and
+			 * possibly transmitting them to our frontend.  Copy only the part
+			 * of the page we will actually inspect.
 			 */
 			slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
 												InvalidTransactionId);
@@ -1953,7 +1969,7 @@ asyncQueueReadAllNotifications(void)
 				   NotifyCtl->shared->page_buffer[slotno] + curoffset,
 				   copysize);
 			/* Release lock that we got from SimpleLruReadPage_ReadOnly() */
-			LWLockRelease(NotifySLRULock);
+			LWLockRelease(SimpleLruGetBankLock(NotifyCtl, curpage));
 
 			/*
 			 * Process messages up to the stop position, end of page, or an
@@ -1994,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
  *
  * The current page must have been fetched into page_buffer from shared
  * memory.  (We could access the page right in shared memory, but that
- * would imply holding the NotifySLRULock throughout this routine.)
+ * would imply holding the SLRU bank lock throughout this routine.)
  *
  * We stop if we reach the "stop" position, or reach a notification from an
  * uncommitted transaction, or reach the end of the page.
@@ -2147,7 +2163,7 @@ asyncQueueAdvanceTail(void)
 	if (asyncQueuePagePrecedes(oldtailpage, boundary))
 	{
 		/*
-		 * SimpleLruTruncate() will ask for NotifySLRULock but will also
+		 * SimpleLruTruncate() will ask for SLRU bank locks but will also
 		 * release the lock again.
 		 */
 		SimpleLruTruncate(NotifyCtl, newtailpage);
@@ -2378,3 +2394,12 @@ ClearPendingActionsAndNotifies(void)
 	pendingActions = NULL;
 	pendingNotifies = NULL;
 }
+
+/*
+ * GUC check_hook for notify_buffers
+ */
+bool
+check_notify_buffers(int *newval, void **extra, GucSource source)
+{
+	return check_slru_buffers("notify_buffers", newval);
+}
author	Alvaro Herrera <alvherre@alvh.no-ip.org>	2024-02-28 17:05:31 +0100
committer	Alvaro Herrera <alvherre@alvh.no-ip.org>	2024-02-28 17:05:31 +0100
commit	53c2a97a92665be6bd7d70bd62ae6158fe4db96e (patch)
tree	88d853f098fe925024b82e72f2beea523e24cbe6 /src/backend/commands/async.c
parent	1c1eec0f2d88b7e823af959103b2100da493caa9 (diff)
download	postgresql-53c2a97a92665be6bd7d70bd62ae6158fe4db96e.tar.gz postgresql-53c2a97a92665be6bd7d70bd62ae6158fe4db96e.zip