aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMasahiko Sawada <msawada@postgresql.org>2024-11-15 17:06:02 -0800
committerMasahiko Sawada <msawada@postgresql.org>2024-11-15 17:06:02 -0800
commit91771b3fbbc33e066e9a28a7d85bde87f5a0c900 (patch)
treef7a51aa40c84047db274150ee6c9862964fcf737
parent2496c3f6f1bf5a735184d27d81527dfea7ad9e9b (diff)
downloadpostgresql-91771b3fbbc33e066e9a28a7d85bde87f5a0c900.tar.gz
postgresql-91771b3fbbc33e066e9a28a7d85bde87f5a0c900.zip
Fix a possibility of logical replication slot's restart_lsn going backwards.
Previously LogicalIncreaseRestartDecodingForSlot() accidentally accepted any LSN as the candidate_lsn and candidate_valid after the restart_lsn of the replication slot was updated, so it potentially caused the restart_lsn to move backwards. A scenario where this could happen in logical replication is: after a logical replication restart, based on previous candidate_lsn and candidate_valid values in memory, the restart_lsn advances upon receiving a subscriber acknowledgment. Then, logical decoding restarts from an older point, setting candidate_lsn and candidate_valid based on an old RUNNING_XACTS record. Subsequent subscriber acknowledgments then update the restart_lsn to an LSN older than the current value. In the reported case, after WAL files were removed by a checkpoint, the retreated restart_lsn prevented logical replication from restarting due to missing WAL segments. This change essentially modifies the 'if' condition to 'else if' condition within the function. The previous code had an asymmetry in this regard compared to LogicalIncreaseXminForSlot(), which does almost the same thing for different fields. The WAL removal issue was reported by Hubert Depesz Lubaczewski. Backpatch to all supported versions, since the bug exists since 9.4 where logical decoding was introduced. Reviewed-by: Tomas Vondra, Ashutosh Bapat, Amit Kapila Discussion: https://postgr.es/m/Yz2hivgyjS1RfMKs%40depesz.com Discussion: https://postgr.es/m/85fff40e-148b-4e86-b921-b4b846289132%40vondra.me Backpatch-through: 13
-rw-r--r--src/backend/replication/logical/logical.c4
1 files changed, 3 insertions, 1 deletions
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 48ae02ca386..a53d42e88a6 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1680,6 +1680,7 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart
/* don't overwrite if have a newer restart lsn */
if (restart_lsn <= slot->data.restart_lsn)
{
+ SpinLockRelease(&slot->mutex);
}
/*
@@ -1690,6 +1691,7 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart
{
slot->candidate_restart_valid = current_lsn;
slot->candidate_restart_lsn = restart_lsn;
+ SpinLockRelease(&slot->mutex);
/* our candidate can directly be used */
updated_lsn = true;
@@ -1700,7 +1702,7 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart
* might never end up updating if the receiver acks too slowly. A missed
* value here will just cause some extra effort after reconnecting.
*/
- if (slot->candidate_restart_valid == InvalidXLogRecPtr)
+ else if (slot->candidate_restart_valid == InvalidXLogRecPtr)
{
slot->candidate_restart_valid = current_lsn;
slot->candidate_restart_lsn = restart_lsn;