aboutsummaryrefslogtreecommitdiff
path: root/src/backend/replication
Commit message (Collapse)AuthorAge
...
* Convert some extern variables to staticPeter Eisentraut2024-07-02
| | | | | | | | These probably should have been static all along, it was only forgotten out of sloppiness. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org
* Preserve CurrentMemoryContext across Start/CommitTransactionCommand.Tom Lane2024-07-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now, committing a transaction has caused CurrentMemoryContext to get set to TopMemoryContext. Most callers did not pay any particular heed to this, which is problematic because TopMemoryContext is a long-lived context that never gets reset. If the caller assumes it can leak memory because it's running in a limited-lifespan context, that behavior translates into a session-lifespan memory leak. The first-reported instance of this involved ProcessIncomingNotify, which is called from the main processing loop that normally runs in MessageContext. That outer-loop code assumes that whatever it allocates will be cleaned up when we're done processing the current client message --- but if we service a notify interrupt, then whatever gets allocated before the next switch to MessageContext will be permanently leaked in TopMemoryContext. sinval catchup interrupts have a similar problem, and I strongly suspect that some places in logical replication do too. To fix this in a generic way, let's redefine the behavior as "CommitTransactionCommand restores the memory context that was current at entry to StartTransactionCommand". This clearly fixes the issue for the notify and sinval cases, and it seems to match the mental model that's in use in the logical replication code, to the extent that anybody thought about it there at all. For consistency, likewise make subtransaction exit restore the context that was current at subtransaction start (rather than always selecting the CurTransactionContext of the parent transaction level). This case has less risk of resulting in a permanent leak than the outer-level behavior has, but it would not meet the principle of least surprise for some CommitTransactionCommand calls to restore the previous context while others don't. While we're here, also change xact.c so that we reset TopTransactionContext at transaction exit and then re-use it in later transactions, rather than dropping and recreating it in each cycle. This probably doesn't save a lot given the context recycling mechanism in aset.c, but it should save a little bit. Per suggestion from David Rowley. (Parenthetically, the text in src/backend/utils/mmgr/README implies that this is how I'd planned to implement it as far back as commit 1aebc3618 --- but the code actually added in that commit just drops and recreates it each time.) This commit also cleans up a few places outside xact.c that were needlessly making TopMemoryContext current, and thus risking more leaks of the same kind. I don't think any of them represent significant leak risks today, but let's deal with them while the issue is top-of-mind. Per bug #18512 from wizardbrony. Commit to HEAD only, as this change seems to have some risk of breaking things for some callers. We'll apply a narrower fix for the known-broken cases in the back branches. Discussion: https://postgr.es/m/3478884.1718656625@sss.pgh.pa.us
* Rename standby_slot_names to synchronized_standby_slots.Amit Kapila2024-07-01
| | | | | | | | | | | | The standby_slot_names GUC allows the specification of physical standby slots that must be synchronized before the logical walsenders associated with logical failover slots. However, for this purpose, the GUC name is too generic. Author: Hou Zhijie Reviewed-by: Bertrand Drouvot, Masahiko Sawada Backpatch-through: 17 Discussion: https://postgr.es/m/ZnWeUgdHong93fQN@momjian.us
* Format better code for xact_decode()'s XLOG_XACT_INVALIDATIONSMichael Paquier2024-07-01
| | | | | | | | This makes the code more consistent with the surroundings. Author: ChangAo Chen Reviewed-by: Ashutosh Bapat Discussion: https://postgr.es/m/CAExHW5tNTevUh58SKddTtcX3yU_5_PDSC8Mdp-Q2hc9PpZHRJg@mail.gmail.com
* Drop the temporary tuple slots allocated by pgoutput.Amit Kapila2024-06-27
| | | | | | | | | | | | | In pgoutput, when converting the child table's tuple format to match the parent table's, we temporarily create a new slot to store the converted tuple. However, we missed to drop such temporary slots, leading to resource leakage. Reported-by: Bowen Shi Author: Hou Zhijie Reviewed-by: Amit Kapila Backpatch-through: 15 Discussion: https://postgr.es/m/CAM_vCudv8dc3sjWiPkXx5F2b27UV7_YRKRbtSCcE-pv=cVACGA@mail.gmail.com
* Harmonize function parameter names for Postgres 17.Peter Geoghegan2024-06-12
| | | | | | | | | | | | | Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced during Postgres 17 development. pg_bsd_indent still has a couple of similar inconsistencies, which I (pgeoghegan) have left untouched for now. This commit was written with help from clang-tidy, by mechanically applying the same rules as similar clean-up commits (the earliest such commit was commit 035ce1fe).
* Fix an assert in CheckPointReplicationSlots().Amit Kapila2024-06-11
| | | | | | | | | | | | | | | | | | Commit e0b2eed047 assumed that the confirmed_flush LSN can't go backward. However, it is possible that confirmed_flush LSN can go backward temporarily when the client acknowledges a prior value of flush location. This can happen when the client (subscriber in this case) acknowledges an LSN it doesn't have to do anything for (say for DDLs) and thus didn't store persistently. After restart, the client sends the prior value of flush LSN which it had stored persistently and the server updates the confirmed_flush LSN with that value. The fix is to remove the assumption and not allow the prior value of confirmed_flush LSN to be flushed to the disk. Author: Vignesh C Reviewed-by: Amit Kapila, Shlok Kyal Discussion: https://postgr.es/m/CALDaNm3hgow2+oEov5jBk4iYP5eQrUCF1yZtW7+dV3J__p4KLQ@mail.gmail.com
* A few follow-up fixes for GUC name quotingPeter Eisentraut2024-05-17
| | | | | | Fixups for 17974ec259: Some messages were missed (and some were new since the patch was originally proposed), and there was a typo introduced.
* Revise GUC names quoting in messages againPeter Eisentraut2024-05-17
| | | | | | | | | | | | | | | After further review, we want to move in the direction of always quoting GUC names in error messages, rather than the previous (PG16) wildly mixed practice or the intermittent (mid-PG17) idea of doing this depending on how possibly confusing the GUC name is. This commit applies appropriate quotes to (almost?) all mentions of GUC names in error messages. It partially supersedes a243569bf65 and 8d9978a7176, which had moved things a bit in the opposite direction but which then were abandoned in a partial state. Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com
* Fix incorrect format placeholderPeter Eisentraut2024-05-08
|
* Fix an assortment of typosDavid Rowley2024-05-04
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/ae9f2fcb-4b24-5bb0-4240-efbbbd944ca1@gmail.com
* Fix duplicated consecutive words in commentsDavid Rowley2024-04-28
| | | | | | | Also, fix a comment incorrectly referencing the "streaming read API". This was renamed to "read stream" shortly before being committed. Discussion: https://postgr.es/m/CAApHDvq-2Zdqytm_Hf3RmVf0qg5PS9jTFAJ5QTc9xH9pwvwDTA@mail.gmail.com
* Post-commit review fixes for slot synchronization.Amit Kapila2024-04-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow pg_sync_replication_slots() to error out during promotion of standby. This makes the behavior of the SQL function consistent with the slot sync worker. We also ensured that pg_sync_replication_slots() cannot be executed if sync_replication_slots is enabled and the slotsync worker is already running to perform the synchronization of slots. Previously, it would have succeeded in cases when the worker is idle and failed when it is performing sync which could confuse users. This patch fixes another issue in the slot sync worker where SignalHandlerForShutdownRequest() needs to be registered *before* setting SlotSyncCtx->pid, otherwise, the slotsync worker could miss handling SIGINT sent by the startup process(ShutDownSlotSync) if it is sent before worker could register SignalHandlerForShutdownRequest(). To be consistent, all signal handlers' registration is moved to a prior location before we set the worker's pid. Ensure that we clean up synced temp slots at the end of pg_sync_replication_slots() to avoid such slots being left over after promotion. Ensure that ShutDownSlotSync() captures SlotSyncCtx->pid under spinlock to avoid accessing invalid value as it can be reset by concurrent slot sync exit due to an error. Author: Shveta Malik Reviewed-by: Hou Zhijie, Bertrand Drouvot, Amit Kapila, Masahiko Sawada Discussion: https://postgr.es/m/CAJpy0uBefXUS_TSz=oxmYKHdg-fhxUT0qfjASW3nmqnzVC3p6A@mail.gmail.com
* Fix the missing table sync due to improper invalidation handling.Amit Kapila2024-04-25
| | | | | | | | | | | | | | | | | | | We missed performing table sync if the invalidation happened while the non-ready tables list was being prepared. This occurs because the sync state was set to valid at the end of non-ready table list preparation irrespective of the invalidations processed while the list is being prepared. Fix it by changing the boolean variable to a tri-state enum and by setting table state to valid only if no invalidations have occurred while the list is being prepared. Reprted-by: Alexander Lakhin Diagnosed-by: Alexander Lakhin Author: Vignesh C Reviewed-by: Hou Zhijie, Alexander Lakhin, Ajin Cherian, Amit Kapila Backpatch-through: 15 Discussion: https://postgr.es/m/711a6afe-edb7-1211-cc27-1bef8239eec7@gmail.com
* Fix typos and duplicate wordsDaniel Gustafsson2024-04-18
| | | | | | | | | | | | This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
* Fix the review comments and a bug in the slot sync code.Amit Kapila2024-04-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure that when updating the catalog_xmin of the synced slots, it is first written to disk before changing the in-memory value (effective_catalog_xmin). This is to prevent a scenario where the in-memory value change triggers a vacuum to remove catalog tuples before the catalog_xmin is written to disk. In the event of a crash before the catalog_xmin is persisted, we would not know that some required catalog tuples have been removed and the synced slot would be invalidated. Change the sanity check to ensure that remote_slot's confirmed_flush LSN can't precede the local/synced slot during slot sync. Note that the restart_lsn of the synced/local slot can be ahead of remote_slot. This can happen when slot advancing machinery finds a running xacts record after reaching the consistent state at a later point than the primary where it serializes the snapshot and updates the restart_lsn. Make the check to sync slots robust by allowing to sync only when the confirmed_lsn, restart_lsn, or catalog_xmin of the remote slot is ahead of the synced/local slot. Reported-by: Amit Kapila and Shveta Malik Author: Hou Zhijie, Shveta Malik Reviewed-by: Amit Kapila, Bertrand Drouvot Discussion: https://postgr.es/m/OS0PR01MB57162B67D3CB01B2756FBA6D94062@OS0PR01MB5716.jpnprd01.prod.outlook.com Discussion: https://postgr.es/m/CAJpy0uCSS5zmdyUXhvw41HSdTbRqX1hbYqkOfHNj7qQ+2zn0AQ@mail.gmail.com
* Use correct datatype for xmin variables in slot.cMichael Paquier2024-04-11
| | | | | | | | | | | | Two variables storing a slot's effective_xmin and effective_catalog_xmin were saved as XLogRecPtr, which is incorrect as these should be TransactionIds. Oversight in 818fefd8fd44. Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACVPSB74mrDTFezz-LV3Oi6F3SN71QA0oUHvndzi5dwTNg@mail.gmail.com Backpatch-through: 16
* Revert indexed and enlargable binary heap implementation.Masahiko Sawada2024-04-11
| | | | | | | | | | | This reverts commit b840508644 and bcb14f4abc. These commits were made for commit 5bec1d6bc5 (Improve eviction algorithm in ReorderBuffer using max-heap for many subtransactions). However, per discussion, commit efb8acc0d0 replaced binary heap + index with pairing heap, and made these commits unnecessary. Reported-by: Jeff Davis Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
* Replace binaryheap + index with pairingheap in reorderbuffer.cMasahiko Sawada2024-04-11
| | | | | | | | | | | | | | | | | | | A pairing heap can perform the same operations as the binary heap + index, with as good or better algorithmic complexity, and that's an existing data structure so that we don't need to invent anything new compared to v16. This commit makes the new binaryheap functionality that was added in commits b840508644 and bcb14f4abc unnecessary, but they will be reverted separately. Remove the optimization to only build and maintain the heap when the amount of memory used is close to the limit, becuase the bookkeeping overhead with the pairing heap seems to be small enough that it doesn't matter in practice. Reported-by: Jeff Davis Author: Heikki Linnakangas Reviewed-by: Michael Paquier, Hayato Kuroda, Masahiko Sawada Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
* Fixup various StringInfo function usagesDavid Rowley2024-04-10
| | | | | | | | | | | | | | | This adjusts various appendStringInfo* function calls to use a more appropriate and efficient function with the same behavior. For example, use appendStringInfoChar() when appending a single character rather than appendStringInfo() and appendStringInfoString() when no formatting is required rather than using appendStringInfo(). All adjustments made here are in code that's new to v17, so it makes sense to fix these now rather than wait a few years and make backpatching harder. Discussion: https://postgr.es/m/CAApHDvojY2UvMiO+9_55ArTj10P1LBNJyyoGB+C65BLDNT0GsQ@mail.gmail.com Reviewed-by: Nathan Bossart, Tom Lane
* Support retrieval of results in chunks with libpq.Tom Lane2024-04-06
| | | | | | | | | | | | | | | | | | | | | | This patch generalizes libpq's existing single-row mode to allow individual partial-result PGresults to contain up to N rows, rather than always one row. This reduces malloc overhead compared to plain single-row mode, and it is very useful for psql's FETCH_COUNT feature, since otherwise we'd have to add code (and cycles) to either merge single-row PGresults into a bigger one or teach psql's results-printing logic to accept arrays of PGresults. To avoid API breakage, PQsetSingleRowMode() remains the same, and we add a new function PQsetChunkedRowsMode() to invoke the more general case. Also, PGresults obtained the old way continue to carry the PGRES_SINGLE_TUPLE status code, while if PQsetChunkedRowsMode() is used then their status code is PGRES_TUPLES_CHUNK. The underlying logic is the same either way, though. Daniel Vérité, reviewed by Laurenz Albe and myself (and whacked around a bit by me, so any remaining bugs are my fault) Discussion: https://postgr.es/m/CAKZiRmxsVTkO928CM+-ADvsMyePmU3L9DQCa9NwqjvLPcEe5QA@mail.gmail.com
* Allow synced slots to have their inactive_since.Amit Kapila2024-04-05
| | | | | | | | | | | | | | | | | | | | This commit does two things: 1) Maintains inactive_since for sync slots whenever the slot is released just like any other regular slot. 2) Ensures the value is set to the current timestamp during the promotion of standby to help correctly interpret the time after promotion. We don't want the slots to appear inactive for a long time after promotion if they haven't been synchronized recently. This would also avoid the invalidation of such slots immediately after promotion if tomorrow we have a feature that invalidates slots based on their inactivity time. Whoever acquires the slot i.e. makes the slot active will reset it to NULL. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik, Masahiko Sawada Discussion: https://postgr.es/m/CAA4eK1KrPGwfZV9LYGidjxHeW+rxJ=E2ThjXvwRGLO=iLNuo=Q@mail.gmail.com Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com
* Ensure that the sync slots reach a consistent state after promotion without ↵Amit Kapila2024-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | losing data. We were directly copying the LSN locations while syncing the slots on the standby. Now, it is possible that at some particular restart_lsn there are some running xacts, which means if we start reading the WAL from that location after promotion, we won't reach a consistent snapshot state at that point. However, on the primary, we would have already been in a consistent snapshot state at that restart_lsn so we would have just serialized the existing snapshot. To avoid this problem we will use the advance_slot functionality unless the snapshot already exists at the synced restart_lsn location. This will help us to ensure that snapbuilder/slot statuses are updated properly without generating any changes. Note that the synced slot will remain as RS_TEMPORARY till the decoding from corresponding restart_lsn can reach a consistent snapshot state after which they will be marked as RS_PERSISTENT. Per buildfarm Author: Hou Zhijie Reviewed-by: Bertrand Drouvot, Shveta Malik, Bharath Rupireddy, Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB5716B3942AE49F3F725ACA92943B2@OS0PR01MB5716.jpnprd01.prod.outlook.com
* Improve eviction algorithm in ReorderBuffer using max-heap for many ↵Masahiko Sawada2024-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | subtransactions. Previously, when selecting the transaction to evict during logical decoding, we check all transactions to find the largest transaction. This could lead to a significant replication lag especially in the case where there are many subtransactions. This commit improves the eviction algorithm in ReorderBuffer using the max-heap with transaction size as the key to efficiently find the largest transaction. The max-heap starts with empty. While the max-heap is empty, we don't do anything for the max-heap when updating the memory counter. Therefore, we get the largest transaction in O(N) time, where N is the number of transactions including top-level transactions and subtransactions. We build the max-heap just before selecting the largest transactions if the number of transactions being decoded is higher than the threshold, MAX_HEAP_TXN_COUNT_THRESHOLD. After building the max-heap, we also update the max-heap when updating the memory counter. The intention is to efficiently find the largest transaction in O(1) time instead of incurring the cost of memory counter updates (O(log N)). Once the number of transactions got lower than the threshold, we reset the max-heap. The performance benchmark results showed significant speed up (more than x30 speed up on my machine) in decoding a transaction with 100k subtransactions, whereas there is no visible overhead in other cases. Reviewed-by: Amit Kapila, Hayato Kuroda, Vignesh C, Ajin Cherian, Tomas Vondra, Shubham Khanna, Peter Smith, Álvaro Herrera, Euler Taveira Discussion: https://postgr.es/m/CAD21AoAfKTgrBrLq96GcTv9d6k97zaQcDM-rxfKEt4GSe0qnaQ%40mail.gmail.com
* Add functions to binaryheap for efficient key removal and update.Masahiko Sawada2024-04-03
| | | | | | | | | | | | | | | | | | | | | | | Previously, binaryheap didn't support updating a key and removing a node in an efficient way. For example, in order to remove a node from the binaryheap, the caller had to pass the node's position within the array that the binaryheap internally has. Removing a node from the binaryheap is done in O(log n) but searching for the key's position is done in O(n). This commit adds a hash table to binaryheap in order to track the position of each nodes in the binaryheap. That way, by using newly added functions such as binaryheap_update_up() etc., both updating a key and removing a node can be done in O(1) on an average and O(log n) in worst case. This is known as the indexed binary heap. The caller can specify to use the indexed binaryheap by passing indexed = true. The current code does not use the new indexing logic, but it will be used by an upcoming patch. Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian, Tomas Vondra, Shubham Khanna Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com
* Change last_inactive_time to inactive_since in pg_replication_slots.Amit Kapila2024-03-27
| | | | | | | | | | | | | Commit a11f330b55 added last_inactive_time to show the last time the slot was inactive. But, it tells the last time that a currently-inactive slot previously *WAS* active. This could be unclear, so we changed the name to inactive_since. Reported-by: Robert Haas Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Shveta Malik, Amit Kapila Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com Discussion: https://postgr.es/m/CALj2ACUXS0SfbHzsX8bqo+7CZhocsV52Kiu7OWGb5HVPAmJqnA@mail.gmail.com
* Fix indentation from a11f330b5Daniel Gustafsson2024-03-25
| | | | Per buildfarm animal koel
* Merge prune, freeze and vacuum WAL record formatsHeikki Linnakangas2024-03-25
| | | | | | | | | | | | | | | | | | | | | | | The new combined WAL record is now used for pruning, freezing and 2nd pass of vacuum. This is in preparation for changing VACUUM to write a combined prune+freeze record per page, instead of separate two records. The new WAL record format now supports that, but the code still always writes separate records for pruning and freezing. This reserves separate XLOG_HEAP2_* info codes for when the pruning record is emitted for on-access pruning or VACUUM, per Peter Geoghegan's suggestion. The record format is identical, but having separate info codes makes it easier analyze pruning and vacuuming with pg_waldump. The function to emit the new WAL record, log_heap_prune_and_freeze(), is in pruneheap.c. The existing heap_log_freeze_plan() and its subroutines are moved to pruneheap.c without changes, to keep them together with log_heap_prune_and_freeze(). Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/CAAKRu_azf-zH%3DDgVbquZ3tFWjMY1w5pO8m-TXJaMdri8z3933g@mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAAKRu_b2oE4GL%3Dq4g9mcByS9yT7wTQvEH9OLpabj28e%2BWKFi2A@mail.gmail.com
* Track last_inactive_time in pg_replication_slots.Amit Kapila2024-03-25
| | | | | | | | | | | | | | | | | | | | | This commit adds a new property called last_inactive_time for slots. It is set to 0 whenever a slot is made active/acquired and set to the current timestamp whenever the slot is inactive/released or restored from the disk. Note that we don't set the last_inactive_time for the slots currently being synced from the primary to the standby because such slots are typically inactive as decoding is not allowed on those. The 'last_inactive_time' will be useful on production servers to debug and analyze inactive replication slots. It will also help to know the lifetime of a replication slot - one can know how long a streaming standby, logical subscriber, or replication slot consumer is down. The 'last_inactive_time' will also be useful to implement inactive timeout-based replication slot invalidation in a future commit. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
* Track invalidation_reason in pg_replication_slots.Amit Kapila2024-03-22
| | | | | | | | | | | | | | | | | | | | | | | | Till now, the reason for replication slot invalidation is not tracked directly in pg_replication_slots. A recent commit 007693f2a3 added 'conflict_reason' to show the reasons for slot conflict/invalidation, but only for logical slots. This commit adds a new column 'invalidation_reason' to show invalidation reasons for both physical and logical slots. And, this commit also turns 'conflict_reason' text column to 'conflicting' boolean column (effectively reverting commit 007693f2a3). The 'conflicting' column is true for invalidation reasons 'rows_removed' and 'wal_level_insufficient' because those make the slot conflict with recovery. When 'conflicting' is true, one can now look at the new 'invalidation_reason' column for the reason for the logical slot's conflict with recovery. The new 'invalidation_reason' column will also be useful to track other invalidation reasons in the future commit. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik Discussion: https://www.postgresql.org/message-id/ZfR7HuzFEswakt/a%40ip-10-97-1-34.eu-west-3.compute.internal Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
* Fix misleading commentsPeter Eisentraut2024-03-19
| | | | To match code changes in 229fb58d4f.
* Refactor postmaster child process launchingHeikki Linnakangas2024-03-18
| | | | | | | | | | | | | | | | Introduce new postmaster_child_launch() function that deals with the differences in EXEC_BACKEND mode. Refactor the mechanism of passing information from the parent to child process. Instead of using different command-line arguments when launching the child process in EXEC_BACKEND mode, pass a variable-length blob of startup data along with all the global variables. The contents of that blob depend on the kind of child process being launched. In !EXEC_BACKEND mode, we use the same blob, but it's simply inherited from the parent to child process. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
* Fix typos in reorderbuffer.c.Amit Kapila2024-03-14
| | | | | Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20240314.132817.1496502692848380820.horikyota.ntt@gmail.com
* Make the order of the header file includes consistentPeter Eisentraut2024-03-13
| | | | | | | | Similar to commit 7e735035f20. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-WhpCFMbXCjtJ%2BFzmjfPrp7Hw1pk4p%2BZpU95Kh3ofZ1A%40mail.gmail.com
* Keep replication slot statistics on invalidationMichael Paquier2024-03-12
| | | | | | | | | | | | | | | | | | | | | | | | | The code path in charge of invalidating a replication slot includes a call to pgstat_drop_replslot(), which would result in removing the statistics of the slot once invalidated. However, there is no need to remove the statistics of an invalidated slot as one could still be interested in looking at them to understand the activity of the slot until its actual removal. The initial design of the feature committed in be87200efd used the approach to drop the slots, which is likely why the statistics were still removed during the invalidation. Another problem with this operation is that it was done without holding ReplicationSlotAllocationLock, leaving it unprotected on concurrent activity. This part is arguably a bug, but that's a limited problem in practice so no backpatch is done. In passing, this commit adds a test to check this behavior. The only remaining code path where slot statistics are dropped now related to the slot getting dropped. Author: Bertrand Drouvot Discussion: https://postgr.es/m/ZermH08Eq6YydHpO@ip-10-97-1-34.eu-west-3.compute.internal
* Remove redundant fetch of the recent flush pointer in WalSndWaitForWal.Amit Kapila2024-03-12
| | | | | | | | | | In WalSndWaitForWal(), we fetch a recent flush pointer both outside the loop and inside the loop. But we start using RecentFlushPtr only after we fetch it inside the loop. So we can remove one outside the loop. Author: Shveta Malik Reviewed-by: Bertrand Drouvot, Matthias van de Meent, Amit Kapila Discussion: https://postgr.es/m/CAJpy0uBSCQz1yMD-WiEthzEe23dti2-Kr_pitVb7vAPFbFKm=A@mail.gmail.com
* Admit deferrable PKs into rd_pkindex, but flag them as suchAlvaro Herrera2024-03-08
| | | | | | | | | | | | | | | | | | | ... and in particular don't return them as replica identity. The motivation for this change is letting the primary keys be seen by code that derives NOT NULL constraints from them, when creating inheritance children; before this change, if you had a deferrable PK, pg_dump would not recreate the attnotnull marking properly, because the column would not be considered as having anything to back said marking after dropping the throwaway NOT NULL constraint. The reason we don't want these PKs as replica identities is that replication can corrupt data, if the uniqueness constraint is transiently broken. Reported-by: Amul Sul <sulamul@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/CAAJ_b94QonkgsbDXofakHDnORQNgafd1y3Oa5QXfpQNJyXyQ7A@mail.gmail.com
* Introduce a new GUC 'standby_slot_names'.Amit Kapila2024-03-08
| | | | | | | | | | | | | | | | | | | | | | This patch provides a way to ensure that physical standbys that are potential failover candidates have received and flushed changes before the primary server making them visible to subscribers. Doing so guarantees that the promoted standby server is not lagging behind the subscribers when a failover is necessary. The logical walsender now guarantees that all local changes are sent and flushed to the standby servers corresponding to the replication slots specified in 'standby_slot_names' before sending those changes to the subscriber. Additionally, the SQL functions pg_logical_slot_get_changes, pg_logical_slot_peek_changes and pg_replication_slot_advance are modified to ensure that they process changes for failover slots only after physical slots specified in 'standby_slot_names' have confirmed WAL receipt for those. Author: Hou Zhijie and Shveta Malik Reviewed-by: Masahiko Sawada, Peter Smith, Bertrand Drouvot, Ajin Cherian, Nisha Moond, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
* Revert "Add recovery TAP test for race condition with slot invalidations"Michael Paquier2024-03-07
| | | | | | | | | | | | This reverts commit 08a52ab151ca, due to some sporadic instability in the test. Getting the test right should require some redesign with a second injection point, but let's revert it for now to avoid these issues in the CI as a lot of patches are under discussion in this last commit fest. Per buildfarm members hachi and gokiburi. Discussion: https://postgr.es/m/ZekQQHCrIqLVpGz5@paquier.xyz
* Add recovery TAP test for race condition with slot invalidationsMichael Paquier2024-03-06
| | | | | | | | | | | | This commit adds a recovery test to provide coverage for the bug fixed in 818fefd8fd, using an injection point to wait just after the process of an active slot is killed. The trick is to give enough time for effective_xmin and effective_catalog_xmin to advance so as the slot invalidation robustness can be checked since the active process is killed without holding its slot's mutex for a short time. Author: Bertrand Drouvot Discussion: https://postgr.es/m/ZdyZya4YrNapWKqz@ip-10-97-1-34.eu-west-3.compute.internal
* Remove unused #include's from backend .c filesPeter Eisentraut2024-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
* Remove unused 'countincludesself' argument to pq_sendcountedtext()Heikki Linnakangas2024-03-04
| | | | It has been unused since we removed support for protocol version 2.
* Use MyBackendType in more places to check what process this isHeikki Linnakangas2024-03-04
| | | | | | | | | | Remove IsBackgroundWorker, IsAutoVacuumLauncherProcess(), IsAutoVacuumWorkerProcess(), and IsLogicalSlotSyncWorker() in favor of new Am*Process() macros that use MyBackendType. For consistency with the existing Am*Process() macros. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi
* Replace BackendIds with 0-based ProcNumbersHeikki Linnakangas2024-03-03
| | | | | | | | | | | | | | | | | | Now that BackendId was just another index into the proc array, it was redundant with the 0-based proc numbers used in other places. Replace all usage of backend IDs with proc numbers. The only place where the term "backend id" remains is in a few pgstat functions that expose backend IDs at the SQL level. Those IDs are now in fact 0-based ProcNumbers too, but the documentation still calls them "backend ids". That term still seems appropriate to describe what the numbers are, so I let it be. One user-visible effect is that pg_temp_0 is now a valid temp schema name, for backend with ProcNumber 0. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
* Fixups for commit 93db6cbda0.Amit Kapila2024-02-29
| | | | | | | | | | | | | | | Ensure to set always-secure search path for both local and remote connections during slot synchronization, so that malicious users can't redirect user code (e.g. operators). In the passing, improve the name of define, remove spurious return statement, and a minor change in one of the comments. Author: Bertrand Drouvot and Shveta Malik Reviewed-by: Amit Kapila, Peter Smith Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com Discussion: https://postgr.es/m/ZdcejBDCr+wlVGnO@ip-10-97-1-34.eu-west-3.compute.internal Discussion: https://postgr.es/m/CAJpy0uBNP=nrkNJkJSfF=jSocEh8vU2Owa8Rtpi=63fG=SvfVQ@mail.gmail.com
* Add helper functions for dshash tables with string keys.Nathan Bossart2024-02-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Presently, string keys are not well-supported for dshash tables. The dshash code always copies key_size bytes into new entries' keys, and dshash.h only provides compare and hash functions that forward to memcmp() and tag_hash(), both of which do not stop at the first NUL. This means that callers must pad string keys so that the data beyond the first NUL does not adversely affect the results of copying, comparing, and hashing the keys. To better support string keys in dshash tables, this commit does a couple things: * A new copy_function field is added to the dshash_parameters struct. This function pointer specifies how the key should be copied into new table entries. For example, we only want to copy up to the first NUL byte for string keys. A dshash_memcpy() helper function is provided and used for all existing in-tree dshash tables without string keys. * A set of helper functions for string keys are provided. These helper functions forward to strcmp(), strcpy(), and string_hash(), all of which ignore data beyond the first NUL. This commit also adjusts the DSM registry's dshash table to use the new helper functions for string keys. Reviewed-by: Andy Fan Discussion: https://postgr.es/m/20240119215941.GA1322079%40nathanxps13
* Use NULL instead of 0 for 'arg' argument in dshash_create() calls.Nathan Bossart2024-02-26
| | | | | | | | | A couple of dshash_create() callers provide 0 for the 'void *arg' argument, which might give readers the incorrect impression that this is some sort of "flags" parameter. Reviewed-by: Andy Fan Discussion: https://postgr.es/m/20240119215941.GA1322079%40nathanxps13
* Make GetSlotInvalidationCause() return RS_INVAL_NONE on unexpected inputMichael Paquier2024-02-22
| | | | | | | | | | | | | 943f7ae1c869 has changed GetSlotInvalidationCause() so as it would return the last element of SlotInvalidationCauses[] when an incorrect conflict reason name is given by a caller, but this should return RS_INVAL_NONE in such cases, even if such a state should never be reached in practice. Per gripe from Peter Smith. Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/CAHut+PtsrSWxczpGkSaSVtJo+BXrvJ3Hwp5gES14bbL-G+HL7A@mail.gmail.com
* Add a new slot sync worker to synchronize logical slots.Amit Kapila2024-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | By enabling slot synchronization, all the failover logical replication slots on the primary (assuming configurations are appropriate) are automatically created on the physical standbys and are synced periodically. The slot sync worker on the standby server pings the primary server at regular intervals to get the necessary failover logical slots information and create/update the slots locally. The slots that no longer require synchronization are automatically dropped by the worker. The nap time of the worker is tuned according to the activity on the primary. The slot sync worker waits for some time before the next synchronization, with the duration varying based on whether any slots were updated during the last cycle. A new parameter sync_replication_slots enables or disables this new process. On promotion, the slot sync worker is shut down by the startup process to drop any temporary slots acquired by the slot sync worker and to prevent the worker from trying to fetch the failover slots. A functionality to allow logical walsenders to wait for the physical will be done in a subsequent commit. Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
* Improve ERROR/LOG messages added by commits ddd5f4f54a and 7a424ece48.Amit Kapila2024-02-22
| | | | | | | | | | | | | | Additionally, in slotsync.c, replace one StringInfoData variable usage with a constant string to avoid palloc/pfree. Also, replace the inclusion of "logical.h" with "slot.h" to prevent the exposure of unnecessary implementation details. Reported-by: Kyotaro Horiguchi, Masahiko Sawada Author: Shveta Malik based on suggestions by Robert Haas and Amit Kapila Reviewed-by: Kyotaro Horiguchi, Amit Kapila Discussion: https://postgr.es/m/20240214.162652.773291409747353211.horikyota.ntt@gmail.com Discussion: https://postgr.es/m/20240219.134015.1888940527023074780.horikyota.ntt@gmail.com Discussion: https://postgr.es/m/CAD21AoCYXhDYOQDAS-rhGasC2T+tYbV=8Y18o94sB=5AxcW+yA@mail.gmail.com