aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
...
* Fix typo in multixact.cMichael Paquier2022-02-10
| | | | | | | Introduced in aa64f23. Author: Nathan Bossart Discussion: https://postgr.es/m/20220209175338.GB1627503@nathanxps13
* Remove MaxBackends variable in favor of GetMaxBackends() function.Robert Haas2022-02-08
| | | | | | | | | | | | | | Previously, it was really easy to write code that accessed MaxBackends before we'd actually initialized it, especially when coding up an extension. To make this less error-prune, introduce a new function GetMaxBackends() which should be used to obtain the correct value. This will ERROR if called too early. Demote the global variable to a file-level static, so that nobody can peak at it directly. Nathan Bossart. Idea by Andres Freund. Review by Greg Sabino Mullane, by Michael Paquier (who had doubts about the approach), and by me. Discussion: http://postgr.es/m/20210802224204.bckcikl45uezv5e4@alap3.anarazel.de
* Reduce non-leaf keys overlap in GiST indexes produced by a sorted buildAlexander Korotkov2022-02-07
| | | | | | | | | | | | | | | | | | The GiST sorted build currently chooses split points according to the only page space utilization. That may lead to higher non-leaf keys overlap and, in turn, slower search query answers. This commit makes the sorted build use the opclass's picksplit method. Once four pages at the level are accumulated, the picksplit method is applied until each split partition fits the page. Some of our split algorithms could show significant performance degradation while processing 4-times more data at once. But those opclasses haven't received the sorted build support and shouldn't receive it before their split algorithms are improved. Discussion: https://postgr.es/m/CAHqSB9jqtS94e9%3D0vxqQX5dxQA89N95UKyz-%3DA7Y%2B_YJt%2BVW5A%40mail.gmail.com Author: Aliaksandr Kalenik, Sergei Shoulbakov, Andrey Borodin Reviewed-by: Björn Harrtell, Darafei Praliaskouski, Andres Freund Reviewed-by: Alexander Korotkov
* Allow archiving via loadable modules.Robert Haas2022-02-03
| | | | | | | | | | | | | | Running a shell command for each file to be archived has a lot of overhead and may not offer as much error checking as you want, or the exact semantics that you want. So, offer the option to call a loadable module for each file to be archived, rather than running a shell command. Also, add a 'basic_archive' contrib module as an example implementation that archives to a local directory. Nathan Bossart, with a little bit of kibitzing by me. Discussion: http://postgr.es/m/20220202224433.GA1036711@nathanxps13
* Add UNIQUE null treatment optionPeter Eisentraut2022-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | The SQL standard has been ambiguous about whether null values in unique constraints should be considered equal or not. Different implementations have different behaviors. In the SQL:202x draft, this has been formalized by making this implementation-defined and adding an option on unique constraint definitions UNIQUE [ NULLS [NOT] DISTINCT ] to choose a behavior explicitly. This patch adds this option to PostgreSQL. The default behavior remains UNIQUE NULLS DISTINCT. Making this happen in the btree code is pretty easy; most of the patch is just to carry the flag around to all the places that need it. The CREATE UNIQUE INDEX syntax extension is not from the standard, it's my own invention. I named all the internal flags, catalog columns, etc. in the negative ("nulls not distinct") so that the default PostgreSQL behavior is the default if the flag is false. Reviewed-by: Maxim Orlov <orlovmg@gmail.com> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bde78@enterprisedb.com
* Remove xloginsert.h from xlog.hAlvaro Herrera2022-01-30
| | | | | | | | | xlog.h is directly and indirectly #included in a lot of places. With this change, xloginsert.h is no longer unnecessarily included in the large number of them that don't need it. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/CALj2ACVe-W+WM5P44N7eG9C2_FmaeM8Dq5aCnD3fHt0Ba=WR6w@mail.gmail.com
* vacuumlazy.c: Rename state field for consistency.Peter Geoghegan2022-01-28
| | | | | Rename pages_removed to removed_pages, for consistency with nearby vacrel fields.
* Improve errors related to incorrect TLI on checkpoint record replayMichael Paquier2022-01-25
| | | | | | | | | | | | | | WAL replay would cause a hard crash if the timeline expected by a XLOG_END_OF_RECOVERY, a XLOG_CHECKPOINT_ONLINE, or a XLOG_CHECKPOINT_SHUTDOWN record is not the same as the timeline being replayed, using the same error message for all three of them. This commit changes those error messages to use different wordings, adapted to each record type, which is useful when it comes to the debugging of an issue in this area. Author: Amul Sul Reviewed-by: Nathan Bossart, Robert Haas Discussion: https://postgr.es/m/CAAJ_b97i1ZerYC_xW6o_AiDSW5n+sGi8k91Yc8KS8bKWKxjqwQ@mail.gmail.com
* Fix various typos, grammar and code style in comments and docsMichael Paquier2022-01-25
| | | | | | | | | This fixes a set of issues that have accumulated over the past months (or years) in various code areas. Most fixes are related to some recent additions, as of the development of v15. Author: Justin Pryzby Discussion: https://postgr.es/m/20220124030001.GQ23027@telsasoft.com
* fsync pg_logical/mappings in CheckPointLogicalRewriteHeap().Andres Freund2022-01-21
| | | | | | | | | | | While individual logical rewrite files were synced to disk, the directory was not. On some filesystems that could lead to loosing directory entries after a crash. Reported-By: Tom Lane <tgl@sss.pgh.pa.us> Author: Nathan Bossart <bossartn@amazon.com> Discussion: https://postgr.es/m/867F2E29-2782-4869-970E-B984C6D35A8F@amazon.com Backpatch: 10-
* Fix one-off bug causing missing commit timestamps for subtransactionsMichael Paquier2022-01-21
| | | | | | | | | | | | | | | | | | | The logic in charge of writing commit timestamps (enabled with track_commit_timestamp) for subtransactions had a one-bug bug, where it would be possible that commit timestamps go missing for the last subtransaction committed. While on it, simplify a bit the iteration logic in the loop writing the commit timestamps, as per suggestions from Kyotaro Horiguchi and Tom Lane, so as some variable initializations are not part of the loop itself. Issue introduced in 73c986a. Analyzed-by: Alex Kingsborough Author: Alex Kingsborough, Kyotaro Horiguchi Discussion: https://postgr.es/m/73A66172-4050-4F2A-B7F1-13508EDA2144@amazon.com Backpatch-through: 10
* Call pg_newlocale_from_collation() also with default collationPeter Eisentraut2022-01-20
| | | | | | | | | | | | | | | | Previously, callers of pg_newlocale_from_collation() did not call it if the collation was DEFAULT_COLLATION_OID and instead proceeded with a pg_locale_t of 0. Instead, now we call it anyway and have it return 0 if the default collation was passed. It already did this, so we just have to adjust the callers. This simplifies all the call sites and also makes future enhancements easier. After discussion and testing, the previous comment in pg_locale.c about avoiding this for performance reasons may have been mistaken since it was testing a very different patch version way back when. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://www.postgresql.org/message-id/ed3baa81-7fac-7788-cc12-41e3f7917e34@enterprisedb.com
* Make logical decoding a part of the rmgr.Jeff Davis2022-01-19
| | | | | | | | | | | Add a new rmgr method, rm_decode, and use that rather than a switch statement. In preparation for rmgr extensibility. Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com Discussion: https://postgr.es/m/20220118095332.6xtlcjoyxobv6cbk@jrouhaud
* heap pruning: Only call BufferGetBlockNumber() once.Andres Freund2022-01-17
| | | | | | | | BufferGetBlockNumber() is not that cheap and obviously cannot change during one heap_prune_page(), so only call it once. We might be able to do better and pass the block number from the caller, but that'd be a larger change... Discussion: https://postgr.es/m/20211211045710.ljtuu4gfloh754rs@alap3.anarazel.de
* Unify VACUUM VERBOSE and autovacuum logging.Peter Geoghegan2022-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The log_autovacuum_min_duration instrumentation used its own dedicated code for logging, which was not reused by VACUUM VERBOSE. This was highly duplicative, and sometimes led to each code path using slightly different accounting for essentially the same information. Clean things up by making VACUUM VERBOSE reuse the same instrumentation code. This code restructuring changes the structure of the VACUUM VERBOSE output itself, but that seems like an overall improvement. The most noticeable change in VACUUM VERBOSE output is that it no longer outputs a distinct message per index per round of index vacuuming. Most of the same information (about each index) is now shown in its new per-operation summary message. This is far more legible. A few details are no longer displayed by VACUUM VERBOSE, but that's no real loss in practice, especially in the common case where we don't need multiple index scans/rounds of vacuuming. This super fine-grained information is still available via DEBUG2 messages, which might still be useful in debugging scenarios. VACUUM VERBOSE now shows new instrumentation, which is typically very useful: all of the log_autovacuum_min_duration instrumentation that it missed out on before now. This includes information about WAL overhead, buffers hit/missed/dirtied information, and I/O timing information. VACUUM VERBOSE still retains a few INFO messages of its own. This is limited to output concerning the progress of heap rel truncation, as well as some basic information about parallel workers. These details are still potentially quite useful. They aren't a good fit for the log output, which must summarize the whole operation. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-WzmW4Me7_qR4X4ka7pxP-jGmn7=Npma_-Z-9Y1eD0MQRLw@mail.gmail.com
* Assert redirect pointers are sensible after heap_page_prune().Andres Freund2022-01-13
| | | | | | | | | | | | | | | | | Corruption of redirect item pointers often only becomes visible well after being corrupted, as e.g. bug #17255 shows: In the original reproducer, gigabyte of WAL were between the source of the corruption and the corruption becoming visible. To make it easier to find / prevent such bugs, verify whether redirect pointers are sensible at the end of heap_page_prune_execute(). 5cd7eb1f1c32 introduced related assertions while modifying the page, but they can't easily detect marking the target of an existing redirect as unused. Sometimes the corruption will be detected later, but that's harder to diagnose. Author: Andres Freund <andres@andres@anarazel.de> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/20211122175914.ayk6gg6nvdwuhrzb@alap3.anarazel.de
* Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while pruning.Andres Freund2022-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since dc7420c2c92 the horizon used for pruning is determined "lazily". A more accurate horizon is built on-demand, rather than in GetSnapshotData(). If a horizon computation is triggered between two HeapTupleSatisfiesVacuum() calls for the same tuple, the result can change from RECENTLY_DEAD to DEAD. heap_page_prune() can process the same tid multiple times (once following an update chain, once "directly"). When the result of HeapTupleSatisfiesVacuum() of a tuple changes from RECENTLY_DEAD during the first access, to DEAD in the second, the "tuple is DEAD and doesn't chain to anything else" path in heap_prune_chain() can end up marking the target of a LP_REDIRECT ItemId unused. Initially not easily visible, Once the target of a LP_REDIRECT ItemId is marked unused, a new tuple version can reuse it. At that point the corruption may become visible, as index entries pointing to the "original" redirect item, now point to a unrelated tuple. To fix, compute HTSV for all tuples on a page only once. This fixes the entire class of problems of HTSV changing inside heap_page_prune(). However, visibility changes can obviously still occur between HTSV checks inside heap_page_prune() and outside (e.g. in lazy_scan_prune()). The computation of HTSV is now done in bulk, in heap_page_prune(), rather than on-demand in heap_prune_chain(). Besides being a bit simpler, it also is faster: Memory accesses can happen sequentially, rather than in the order of HOT chains. There are other causes of HeapTupleSatisfiesVacuum() results changing between two visibility checks for the same tuple, even before dc7420c2c92. E.g. HEAPTUPLE_INSERT_IN_PROGRESS can change to HEAPTUPLE_DEAD when a transaction aborts between the two checks. None of the these other visibility status changes are known to cause corruption, but heap_page_prune()'s approach makes it hard to be confident. A patch implementing a more fundamental redesign of heap_page_prune(), which fixes this bug and simplifies pruning substantially, has been proposed by Peter Geoghegan in https://postgr.es/m/CAH2-WzmNk6V6tqzuuabxoxM8HJRaWU6h12toaS-bqYcLiht16A@mail.gmail.com However, that redesign is larger change than desirable for backpatching. As the new design still benefits from the batched visibility determination introduced in this commit, it makes sense to commit this narrower fix to 14 and master, and then commit Peter's improvement in master. The precise sequence required to trigger the bug is complicated and hard to do exercise in an isolation test (until we have wait points). Due to that the isolation test initially posted at https://postgr.es/m/20211119003623.d3jusiytzjqwb62p%40alap3.anarazel.de and updated in https://postgr.es/m/20211122175914.ayk6gg6nvdwuhrzb%40alap3.anarazel.de isn't committable. A followup commit will introduce additional assertions, to detect problems like this more easily. Bug: #17255 Reported-By: Alexander Lakhin <exclusion@gmail.com> Debugged-By: Andres Freund <andres@anarazel.de> Debugged-By: Peter Geoghegan <pg@bowt.ie> Author: Andres Freund <andres@andres@anarazel.de> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/20211122175914.ayk6gg6nvdwuhrzb@alap3.anarazel.de Backpatch: 14-, the oldest branch containing dc7420c2c92
* vacuumlazy.c: fix "garbage tuples" reference.Peter Geoghegan2022-01-12
| | | | Another minor oversight in commit 4f8d9d12.
* Fix typo in rewriteheap.c.Amit Kapila2022-01-11
| | | | | Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACW7SvfFW8r2uKH6oQm1kNpt8aQMG61kSBPK0S2PHhFbMw@mail.gmail.com
* Rename functions to avoid future conflictsPeter Eisentraut2022-01-10
| | | | | | | | | Rename range_serialize/range_deserialize to brin_range_serialize/brin_range_deserialize, since there are already public range_serialize/range_deserialize in rangetypes.h. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Discussion: https://www.postgresql.org/message-id/CA+renyX0ipvY6A_jUOHeB1q9mL4bEYfAZ5FBB7G7jUo5bykjrA@mail.gmail.com
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Remove redundant initialization of BrinMemTuple.Tom Lane2022-01-04
| | | | | | | | | brin_new_memtuple already did this, so there's no need for initialize_brin_buildstate to do it again. Richard Guo, reviewed by Bharath Rupireddy Discussion: https://postgr.es/m/CAMbWs4-kYYpKNOdiWtsCZ3jbkFFj4nhOVH22JH7dsrMYX=aGjg@mail.gmail.com
* Fix silly mistake in AssertAlvaro Herrera2022-01-04
|
* Allow special SKIP LOCKED condition in Assert()Alvaro Herrera2022-01-04
| | | | | | | | | | | | | | | | | Under concurrency, it is possible for two sessions to be merrily locking and releasing a tuple and marking it again as HEAP_XMAX_INVALID all the while a third session attempts to lock it, miserably fails at it, and then contemplates life, the universe and everything only to eventually fail an assertion that said bit is not set. Before SKIP LOCKED that was indeed a reasonable expectation, but alas! commit df630b0dd5ea falsified it. This bug is as old as time itself, and even older, if you think time begins with the oldest supported branch. Therefore, backpatch to all supported branches. Author: Simon Riggs <simon.riggs@enterprisedb.com> Discussion: https://postgr.es/m/CANbhV-FeEwMnN8yuMyss7if1ZKjOKfjcgqB26n8pqu1e=q0ebg@mail.gmail.com
* Fix incorrect format placeholdersPeter Eisentraut2021-12-29
|
* Move parallel vacuum code to vacuumparallel.c.Amit Kapila2021-12-23
| | | | | | | | | | | | | | | This commit moves parallel vacuum related code to a new file commands/vacuumparallel.c so that any table AM supporting indexes can utilize parallel vacuum in order to call index AM callbacks (ambulkdelete and amvacuumcleanup) with parallel workers. Another reason for this refactoring is that the parallel vacuum isn't specific to heap so it doesn't make sense to keep this code in heap/vacuumlazy.c. Author: Masahiko Sawada, based on suggestion from Andres Freund Reviewed-by: Hou Zhijie, Amit Kapila, Haiying Tang Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
* Move index vacuum routines to vacuum.c.Amit Kapila2021-12-22
| | | | | | | | | | An upcoming patch moves parallel vacuum code out of vacuumlazy.c. This code restructuring will allow both lazy vacuum and parallel vacuum to use index vacuum functions. Author: Masahiko Sawada Reviewed-by: Hou Zhijie, Amit Kapila Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
* Change ProcSendSignal() to take pgprocno.Thomas Munro2021-12-16
| | | | | | | | | | | Instead of referring to target backends by pid, use pgprocno. This means that we don't have to scan the ProcArray and we can drop some special case code for dealing with the startup process. Discussion: https://postgr.es/m/CA%2BhUKGLYRyDaneEwz5Uya_OgFLMx5BgJfkQSD%3Dq9HmwsfRRb-w%40mail.gmail.com Reviewed-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com> Reviewed-by: Ashwin Agrawal <ashwinstar@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de>
* Improve parallel vacuum implementation.Amit Kapila2021-12-15
| | | | | | | | | | | | | | | | | | | | | | | | Previously, in parallel vacuum, we allocated shmem area of IndexBulkDeleteResult only for indexes where parallel index vacuuming is safe and had null-bitmap in shmem area to access them. This logic was too complicated with a small benefit of saving only a few bits per indexes. In this commit, we allocate a dedicated shmem area for the array of LVParallelIndStats that includes a parallel-safety flag, the index vacuum status, and IndexBulkdeleteResult. There is one array element for every index, even those indexes where parallel index vacuuming is unsafe or not worthwhile. This commit makes the code clear by removing all bitmap-related code. Also, add the check each index vacuum status after parallel index vacuum to make sure that all indexes have been processed. Finally, rename parallel vacuum functions to parallel_vacuum_* for consistency. Author: Masahiko Sawada, based on suggestions by Andres Freund Reviewed-by: Hou Zhijie, Amit Kapila Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
* Remove assertion for replication origins in PREPARE TRANSACTIONMichael Paquier2021-12-14
| | | | | | | | | | | | | | | | | When using replication origins, pg_replication_origin_xact_setup() is an optional choice to be able to set a LSN and a timestamp to mark the origin, which would be additionally added to WAL for transaction commits or aborts (including 2PC transactions). An assertion in the code path of PREPARE TRANSACTION assumed that this data should always be set, so it would trigger when using replication origins without setting up an origin LSN. Some tests are added to cover more this kind of scenario. Oversight in commit 1eb6d65. Per discussion with Amit Kapila and Masahiko Sawada. Discussion: https://postgr.es/m/YbbBfNSvMm5nIINV@paquier.xyz Backpatch-through: 11
* Remove InitXLOGAccess().Robert Haas2021-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | It's not great that RecoveryInProgress() calls InitXLOGAccess(), because a status inquiry function typically shouldn't have the side effect of performing initializations. We could fix that by calling InitXLOGAccess() from some other place, but instead, let's remove it altogether. One thing InitXLogAccess() did is initialize wal_segment_size, but it doesn't need to do that. In the postmaster, PostmasterMain() calls LocalProcessControlFile(), and all child processes will inherit that value -- except in EXEC_BACKEND bulds, but then each backend runs SubPostmasterMain() which also calls LocalProcessControlFile(). The other thing InitXLOGAccess() did is update RedoRecPtr and doPageWrites, but that's not critical, because all code that uses them will just retry if it turns out that they've changed. The only difference is that most code will now see an initial value that is definitely invalid instead of one that might have just been way out of date, but that will only happen once per backend lifetime, so it shouldn't be a big deal. Patch by me, reviewed by Nathan Bossart, Michael Paquier, Andres Freund, Heikki Linnakangas, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoY7b65qRjzHN_tWUk8B4sJqk1vj1d31uepVzmgPnZKeLg@mail.gmail.com
* Default to log_checkpoints=on, log_autovacuum_min_duration=10mRobert Haas2021-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | The idea here is that when a performance problem is known to have occurred at a certain point in time, it's a good thing if there is some information available from the logs to help figure out what might have happened around that time. This change attracted an above-average amount of dissent, because it means that a server with default settings will produce some amount of log output even if nothing has gone wrong. However, by my count, the mailing list discussion had about twice as many people in favor of the change as opposed. The reasons for believing that the extra log output is not an issue in practice are: (1) the rate at which messages can be generated by this setting is bounded to one every few minutes on a properly-configured system and (2) production systems tend to have a lot more junk in the log from that due to failed connection attempts, ERROR messages generated by application activity, and the like. Bharath Rupireddy, reviewed by Fujii Masao and by me. Many other people commented on the thread, but as far as I can see that was discussion of the merits of the change rather than review of the patch. Discussion: https://postgr.es/m/CALj2ACX-rW_OeDcp4gqrFUAkf1f50Fnh138dmkd0JkvCNQRKGA@mail.gmail.com
* Improve description of some WAL records with transaction commandsMichael Paquier2021-12-13
| | | | | | | | | | | | | | | | This commit improves the description of some WAL records for the Transaction RMGR: - Track remote_apply for a transaction commit. This GUC is user-settable, so this information can be useful for debugging. - Add replication origin information for PREPARE TRANSACTION, with the origin ID, LSN and timestamp - Same as above, for ROLLBACK PREPARED. This impacts the format of pg_waldump or anything using these description routines, so no backpatch is done. Author: Masahiko Sawada, Michael Paquier Discussion: https://postgr.es/m/CAD21AoD2dJfgsdxk4_KciAZMZQoUiCvmV9sDpp8ZuKLtKCNXaA@mail.gmail.com
* Fix some typos with {a,an}Michael Paquier2021-12-09
| | | | | | | | One of the changes impacts the documentation, so backpatch. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pu6+c+r3mY24VT7u+H+E_s6vMr5OdRiZ8NT3EOa-E5Lmw@mail.gmail.com Backpatch-through: 14
* Standardize cleanup lock terminology.Peter Geoghegan2021-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The term "super-exclusive lock" is a synonym for "buffer cleanup lock" that first appeared in nbtree many years ago. Standardize things by consistently using the term cleanup lock. This finishes work started by commit 276db875. There is no good reason to have two terms. But there is a good reason to only have one: to avoid confusion around why VACUUM acquires a full cleanup lock (not just an ordinary exclusive lock) in index AMs, during ambulkdelete calls. This has nothing to do with protecting the physical index data structure itself. It is needed to implement a locking protocol that ensures that TIDs pointing to the heap/table structure cannot get marked for recycling by VACUUM before it is safe (which is somewhat similar to how VACUUM uses cleanup locks during its first heap pass). Note that it isn't strictly necessary for index AMs to implement this locking protocol -- several index AMs use an MVCC snapshot as their sole interlock to prevent unsafe TID recycling. In passing, update the nbtree README. Cleanly separate discussion of the aforementioned index vacuuming locking protocol from discussion of the "drop leaf page pin" optimization added by commit 2ed5b87f. We now structure discussion of the latter by describing how individual index scans may safely opt out of applying the standard locking protocol (and so can avoid blocking progress by VACUUM). Also document why the optimization is not safe to apply during nbtree index-only scans. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzngHgQa92tz6NQihf4nxJwRzCV36yMJO_i8dS+2mgEVKw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkHPgsBBvGWjz=8PjNhDefy7XRkDKiT5NxMs-n5ZCf2dA@mail.gmail.com
* Fix corruption of toast indexes with REINDEX CONCURRENTLYMichael Paquier2021-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | REINDEX CONCURRENTLY run on a toast index or a toast relation could corrupt the target indexes rebuilt, as a backend running in parallel that manipulates toast values would directly release the lock on the toast relation when its local operation is done, rather than releasing the lock once the transaction that manipulated the toast values committed. The fix done here is simple: we now hold a ROW EXCLUSIVE lock on the toast relation when saving or deleting a toast value until the transaction working on them is committed, so as a concurrent reindex happening in parallel would be able to wait for any activity and see any new rows inserted (or deleted). An isolation test is added to check after the case fixed here, which is a bit fancy by design as it relies on allow_system_table_mods to rename the toast table and its index to fixed names. This way, it is possible to reindex them directly without any dependency on the OID of the underlying relation. Note that this could not use a DO block either, as REINDEX CONCURRENTLY cannot be run in a transaction block. The test is backpatched down to 13, where it is possible, thanks to c4a7a39, to use allow_system_table_mods in a test suite. Reported-by: Alexey Ermakov Analyzed-by: Andres Freund, Noah Misch Author: Michael Paquier Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/17268-d2fb426e0895abd4@postgresql.org Backpatch-through: 12
* Remove mention of TimeLineID update from commentsDaniel Gustafsson2021-12-01
| | | | | | | | Commit 4a92a1c3d removed the TimeLineID update from RecoveryInProgress, update comments accordingly. Author: Amul Sul <sulamul@gmail.com> Discussion: https://postgr.es/m/CAAJ_b96wyzs8N45jc-kYd-bTE02hRWQieLZRpsUtNbhap7_PuQ@mail.gmail.com
* vacuumlazy.c: fix remaining "dead tuple" references.Peter Geoghegan2021-11-30
| | | | | | | Oversight in commit 4f8d9d12. Reported-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoDm38Em0bvRqeQKr4HPvOj65Y8cUgCP4idMk39iaLrxyw@mail.gmail.com
* Ignore BRIN indexes when checking for HOT udpatesTomas Vondra2021-11-30
| | | | | | | | | | | | | | | | When determining whether an index update may be skipped by using HOT, we can ignore attributes indexed only by BRIN indexes. There are no index pointers to individual tuples in BRIN, and the page range summary will be updated anyway as it relies on visibility info. This also removes rd_indexattr list, and replaces it with rd_attrsvalid flag. The list was not used anywhere, and a simple flag is sufficient. Patch by Josef Simanek, various fixes and improvements by me. Author: Josef Simanek Reviewed-by: Tomas Vondra, Alvaro Herrera Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com
* Increase size of shared memory for pg_commit_tsAlvaro Herrera2021-11-30
| | | | | | | | | | Like 5364b357fb11 did for pg_commit, change the formula used to determine number of pg_commit_ts buffers, which helps performance with larger servers. Discussion: https://postgr.es/m/20210115220744.GA24457@alvherre.pgsql Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
* vacuumlazy.c: Rename dead_tuples to dead_items.Peter Geoghegan2021-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
* Centralize timestamp computation of control file on updatesMichael Paquier2021-11-29
| | | | | | | | | | | | | | | | | | | This commit moves the timestamp computation of the control file within the routine of src/common/ in charge of updating the backend's control file, which is shared by multiple frontend tools (pg_rewind, pg_checksums and pg_resetwal) and the backend itself. This change has as direct effect to update the control file's timestamp when writing the control file in pg_rewind and pg_checksums, something that is helpful to keep track of control file updates for those operations, something also tracked by the backend at startup within its logs. This part is arguably a bug, as ControlFileData->time should be updated each time a new version of the control file is written, but this is a behavior change so no backpatch is done. Author: Amul Sul Reviewed-by: Nathan Bossart, Michael Paquier, Bharath Rupireddy Discussion: https://postgr.es/m/CAAJ_b97nd_ghRpyFV9Djf9RLXkoTbOUqnocq11WGq9TisX09Fw@mail.gmail.com
* Replace random(), pg_erand48(), etc with a better PRNG API and algorithm.Tom Lane2021-11-28
| | | | | | | | | | | | | | | | | | | Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating a bunch of platform dependencies as well as fundamentally-obsolete PRNG code. In addition, this API replacement will ease replacing the algorithm again in future, should that become necessary. xoroshiro128** is a few percent slower than the drand48 family, but it can produce full-width 64-bit random values not only 48-bit, and it should be much more trustworthy. It's likely to be noticeably faster than the platform's random(), depending on which platform you are thinking about; and we can have non-global state vectors easily, unlike with random(). It is not cryptographically strong, but neither are the functions it replaces. Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
* vacuumlazy.c: prefer the term "cleanup lock".Peter Geoghegan2021-11-27
| | | | | | | | The term "super-exclusive lock" is an acceptable synonym of "cleanup lock". Even still, switching from one term to the other in the same file is confusing. Standardize on "cleanup lock" within vacuumlazy.c. Per a complaint from Andres Freund.
* Update high level vacuumlazy.c comments.Peter Geoghegan2021-11-27
| | | | | | | | | | | | | | | | | Update vacuumlazy.c file header comments (as well as comments above the lazy_scan_heap function) that were largely written before the introduction of the HOT optimization, when lazy_scan_heap did far less, and didn't actually prune during its initial heap pass. Since lazy_scan_heap now outsources far more work to lower level functions, it makes sense to introduce the function by talking about the high level invariant that dictates the order in which each phase takes place. Also deemphasize the case where we run out of memory for TIDs, since delaying that discussion makes it easier to talk about issues of central importance. Finally, remove discussion of parallel VACUUM from header comments. These don't add much, and are in the wrong place.
* Go back to considering HOT on pages marked full.Peter Geoghegan2021-11-26
| | | | | | | | | | | | | | | | | | | | Commit 2fd8685e7f simplified the checking of modified attributes that takes place within heap_update(). This included a micro-optimization affecting pages marked PD_PAGE_FULL: don't even try to use HOT to save a few cycles on determining HOT safety. The assumption was that it won't work out this time around, since it can't have worked out last time around. Remove the micro-optimization. It could only ever save cycles that are consumed by the vast majority of heap_update() calls, which hardly seems worth the added complexity. It also seems quite possible that there are workloads that will do worse over time by repeated application of the micro-optimization, despite saving some cycles on average, in the short term. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAH2-WznU1L3+DMPr1F7o2eJBT7=3bAJoY6ZkWABAxNt+-afyTA@mail.gmail.com
* Fix determination of broken LSN in OVERWRITTEN_CONTRECORDAlvaro Herrera2021-11-26
| | | | | | | | | | | | | | In commit ff9f111bce24 I mixed up inconsistent definitions of the LSN of the first record in a page, when the previous record ends exactly at the page boundary. The correct LSN is adjusted to skip the WAL page header; I failed to use that when setting XLogReaderState->overwrittenRecPtr, so at WAL replay time VerifyOverwriteContrecord would refuse to let replay continue past that record. Backpatch to 10. 9.6 also contains this bug, but it's no longer being maintained. Discussion: https://postgr.es/m/45597.1637694259@sss.pgh.pa.us
* Update commentsPeter Eisentraut2021-11-26
| | | | | | | | Various places wanted to point out that tuple descriptors don't contain the variable-length fields of pg_attribute. This started when attacl was added, but more fields have been added since, and these comments haven't been kept up to date consistently. Reword so that the purpose is clearer and we don't have to keep updating them.
* Replace straggling uses of ReadRecPtr/EndRecPtr.Andres Freund2021-11-24
| | | | | | | d2ddfa681db removed ReadRecPtr/EndRecPtr, but two uses within an #ifdef WAL_DEBUG escaped. Discussion: https://postgr.es/m/20211124231206.gbadj5bblcljb6d5@alap3.anarazel.de
* xlog.c: Remove global variables ReadRecPtr and EndRecPtr.Robert Haas2021-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In most places, the variables necessarily store the same value as the eponymous members of the XLogReaderState that we use during WAL replay, because ReadRecord() assigns the values from the structure members to the global variables just after XLogReadRecord() returns. However, XLogBeginRead() adjusts the structure members but not the global variables, so after XLogBeginRead() and before the completion of XLogReadRecord() the values can differ. Otherwise, they must be identical. According to my analysis, the only place where either variable is referenced at a point where it might not have the same value as the structure member is the refrence to EndRecPtr within XLogPageRead. Therefore, at every other place where we are using the global variable, we can just switch to using the structure member instead, and remove the global variable. However, we can, and in fact should, do this in XLogPageRead() as well, because at that point in the code, the global variable will actually store the start of the record we want to read - either because it's where the last WAL record ended, or because the read position has been changed using XLogBeginRead since the last record was read. The structure member, on the other hand, will already have been updated to point to the end of the record we just read. Elsewhere, the latter is what we use as an argument to emode_for_corrupt_record(), so we should do the same here. This part of the patch is perhaps a bug fix, but I don't think it has any important consequences, so no back-patch. The point here is just to continue to whittle down the entirely excessive use of global variables in xlog.c. Discussion: http://postgr.es/m/CA+Tgmoao96EuNeSPd+hspRKcsCddu=b1h-QNRuKfY8VmfNQdfg@mail.gmail.com