diff options
author | Alvaro Herrera <alvherre@alvh.no-ip.org> | 2020-08-08 12:31:55 -0400 |
---|---|---|
committer | Alvaro Herrera <alvherre@alvh.no-ip.org> | 2020-08-08 12:31:55 -0400 |
commit | 1fa6eec974f8f17f8c5fcd54eb33625f9b107050 (patch) | |
tree | 4f97da081a4d52c5a0908868903eb60f6e21699d /src | |
parent | 1db9c80f835e436ca3c27bccf43135ebf7e3c62b (diff) | |
download | postgresql-1fa6eec974f8f17f8c5fcd54eb33625f9b107050.tar.gz postgresql-1fa6eec974f8f17f8c5fcd54eb33625f9b107050.zip |
walsnd: Don't set waiting_for_ping_response spuriously
Ashutosh Bapat noticed that when logical walsender needs to wait for
WAL, and it realizes that it must send a keepalive message to
walreceiver to update the sent-LSN, which *does not* request a reply
from walreceiver, it wrongly sets the flag that it's going to wait for
that reply. That means that any future would-be sender of feedback
messages ends up not sending a feedback message, because they all
believe that a reply is expected.
With built-in logical replication there's not much harm in this, because
WalReceiverMain will send a ping-back every wal_receiver_timeout/2
anyway; but with other logical replication systems (e.g. pglogical) it
can cause significant pain.
This problem was introduced in commit 41d5f8ad734, where the
request-reply flag was changed from true to false to WalSndKeepalive,
without at the same time removing the line that sets
waiting_for_ping_response.
Just removing that line would be a sufficient fix, but it seems better
to shift the responsibility of setting the flag to WalSndKeepalive
itself instead of requiring caller to do it; this is clearly less
error-prone.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>
Backpatch: 9.5 and up
Discussion: https://postgr.es/m/20200806225558.GA22401@alvherre.pgsql
Diffstat (limited to 'src')
-rw-r--r-- | src/backend/replication/walsender.c | 30 |
1 files changed, 16 insertions, 14 deletions
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index ebe986559a2..92bc6577848 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -155,7 +155,7 @@ static XLogRecPtr sendTimeLineValidUpto = InvalidXLogRecPtr; * How far have we sent WAL already? This is also advertised in * MyWalSnd->sentPtr. (Actually, this is the next WAL location to send.) */ -static XLogRecPtr sentPtr = 0; +static XLogRecPtr sentPtr = InvalidXLogRecPtr; /* Buffers for constructing outgoing messages and processing reply messages. */ static StringInfoData output_message; @@ -1352,16 +1352,15 @@ WalSndWaitForWal(XLogRecPtr loc) /* * We only send regular messages to the client for full decoded * transactions, but a synchronous replication and walsender shutdown - * possibly are waiting for a later location. So we send pings - * containing the flush location every now and then. + * possibly are waiting for a later location. So, before sleeping, we + * send a ping containing the flush location. If the receiver is + * otherwise idle, this keepalive will trigger a reply. Processing the + * reply will update these MyWalSnd locations. */ if (MyWalSnd->flush < sentPtr && MyWalSnd->write < sentPtr && !waiting_for_ping_response) - { WalSndKeepalive(false); - waiting_for_ping_response = true; - } /* check whether we're done */ if (loc <= RecentFlushPtr) @@ -2862,10 +2861,7 @@ WalSndDone(WalSndSendDataCallback send_data) proc_exit(0); } if (!waiting_for_ping_response) - { WalSndKeepalive(true); - waiting_for_ping_response = true; - } } /* @@ -3361,10 +3357,13 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS) } /* - * This function is used to send a keepalive message to standby. - * If requestReply is set, sets a flag in the message requesting the standby - * to send a message back to us, for heartbeat purposes. - */ + * Send a keepalive message to standby. + * + * If requestReply is set, the message requests the other party to send + * a message back to us, for heartbeat purposes. We also set a flag to + * let nearby code that we're waiting for that response, to avoid + * repeated requests. + */ static void WalSndKeepalive(bool requestReply) { @@ -3379,6 +3378,10 @@ WalSndKeepalive(bool requestReply) /* ... and send it wrapped in CopyData */ pq_putmessage_noblock('d', output_message.data, output_message.len); + + /* Set local flag */ + if (requestReply) + waiting_for_ping_response = true; } /* @@ -3409,7 +3412,6 @@ WalSndKeepaliveIfNecessary(void) if (last_processing >= ping_time) { WalSndKeepalive(true); - waiting_for_ping_response = true; /* Try to flush pending output to the client */ if (pq_flush_if_writable() != 0) |