aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Fix corruption of toast indexes with REINDEX CONCURRENTLYMichael Paquier2021-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | REINDEX CONCURRENTLY run on a toast index or a toast relation could corrupt the target indexes rebuilt, as a backend running in parallel that manipulates toast values would directly release the lock on the toast relation when its local operation is done, rather than releasing the lock once the transaction that manipulated the toast values committed. The fix done here is simple: we now hold a ROW EXCLUSIVE lock on the toast relation when saving or deleting a toast value until the transaction working on them is committed, so as a concurrent reindex happening in parallel would be able to wait for any activity and see any new rows inserted (or deleted). An isolation test is added to check after the case fixed here, which is a bit fancy by design as it relies on allow_system_table_mods to rename the toast table and its index to fixed names. This way, it is possible to reindex them directly without any dependency on the OID of the underlying relation. Note that this could not use a DO block either, as REINDEX CONCURRENTLY cannot be run in a transaction block. The test is backpatched down to 13, where it is possible, thanks to c4a7a39, to use allow_system_table_mods in a test suite. Reported-by: Alexey Ermakov Analyzed-by: Andres Freund, Noah Misch Author: Michael Paquier Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/17268-d2fb426e0895abd4@postgresql.org Backpatch-through: 12
* Remove mention of TimeLineID update from commentsDaniel Gustafsson2021-12-01
| | | | | | | | Commit 4a92a1c3d removed the TimeLineID update from RecoveryInProgress, update comments accordingly. Author: Amul Sul <sulamul@gmail.com> Discussion: https://postgr.es/m/CAAJ_b96wyzs8N45jc-kYd-bTE02hRWQieLZRpsUtNbhap7_PuQ@mail.gmail.com
* vacuumlazy.c: fix remaining "dead tuple" references.Peter Geoghegan2021-11-30
| | | | | | | Oversight in commit 4f8d9d12. Reported-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoDm38Em0bvRqeQKr4HPvOj65Y8cUgCP4idMk39iaLrxyw@mail.gmail.com
* Ignore BRIN indexes when checking for HOT udpatesTomas Vondra2021-11-30
| | | | | | | | | | | | | | | | When determining whether an index update may be skipped by using HOT, we can ignore attributes indexed only by BRIN indexes. There are no index pointers to individual tuples in BRIN, and the page range summary will be updated anyway as it relies on visibility info. This also removes rd_indexattr list, and replaces it with rd_attrsvalid flag. The list was not used anywhere, and a simple flag is sufficient. Patch by Josef Simanek, various fixes and improvements by me. Author: Josef Simanek Reviewed-by: Tomas Vondra, Alvaro Herrera Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com
* Increase size of shared memory for pg_commit_tsAlvaro Herrera2021-11-30
| | | | | | | | | | Like 5364b357fb11 did for pg_commit, change the formula used to determine number of pg_commit_ts buffers, which helps performance with larger servers. Discussion: https://postgr.es/m/20210115220744.GA24457@alvherre.pgsql Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
* vacuumlazy.c: Rename dead_tuples to dead_items.Peter Geoghegan2021-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8523492d simplified what it meant for an item to be considered "dead" to VACUUM: TIDs collected in memory (in preparation for index vacuuming) must always come from LP_DEAD stub line pointers in heap pages, found following pruning. This formalized the idea that index vacuuming (and heap vacuuming) are optional processes. Unlike pruning, they can be delayed indefinitely, without any risk of that violating fundamental invariants. For example, leaving LP_DEAD items behind clearly won't add to the risk of transaction ID wraparound. You can't have transaction ID wraparound without transaction IDs. Renaming anything that references DEAD tuples (tuples with storage) reinforces all this. Code outside vacuumlazy.c continues to fudge the distinction between dead/deleted tuples, and LP_DEAD items. This is necessary because autovacuum scheduling is still mostly driven by "dead items/tuples" statistics. In the future we may find it useful to replace this model with something more sophisticated, as a step towards teaching autovacuum to perform more frequent vacuuming that targeting individual indexes that happen to be more prone to becoming bloated through version churn. In passing, simplify some function signatures that deal with VACUUM's dead_items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
* Centralize timestamp computation of control file on updatesMichael Paquier2021-11-29
| | | | | | | | | | | | | | | | | | | This commit moves the timestamp computation of the control file within the routine of src/common/ in charge of updating the backend's control file, which is shared by multiple frontend tools (pg_rewind, pg_checksums and pg_resetwal) and the backend itself. This change has as direct effect to update the control file's timestamp when writing the control file in pg_rewind and pg_checksums, something that is helpful to keep track of control file updates for those operations, something also tracked by the backend at startup within its logs. This part is arguably a bug, as ControlFileData->time should be updated each time a new version of the control file is written, but this is a behavior change so no backpatch is done. Author: Amul Sul Reviewed-by: Nathan Bossart, Michael Paquier, Bharath Rupireddy Discussion: https://postgr.es/m/CAAJ_b97nd_ghRpyFV9Djf9RLXkoTbOUqnocq11WGq9TisX09Fw@mail.gmail.com
* Replace random(), pg_erand48(), etc with a better PRNG API and algorithm.Tom Lane2021-11-28
| | | | | | | | | | | | | | | | | | | Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating a bunch of platform dependencies as well as fundamentally-obsolete PRNG code. In addition, this API replacement will ease replacing the algorithm again in future, should that become necessary. xoroshiro128** is a few percent slower than the drand48 family, but it can produce full-width 64-bit random values not only 48-bit, and it should be much more trustworthy. It's likely to be noticeably faster than the platform's random(), depending on which platform you are thinking about; and we can have non-global state vectors easily, unlike with random(). It is not cryptographically strong, but neither are the functions it replaces. Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
* vacuumlazy.c: prefer the term "cleanup lock".Peter Geoghegan2021-11-27
| | | | | | | | The term "super-exclusive lock" is an acceptable synonym of "cleanup lock". Even still, switching from one term to the other in the same file is confusing. Standardize on "cleanup lock" within vacuumlazy.c. Per a complaint from Andres Freund.
* Update high level vacuumlazy.c comments.Peter Geoghegan2021-11-27
| | | | | | | | | | | | | | | | | Update vacuumlazy.c file header comments (as well as comments above the lazy_scan_heap function) that were largely written before the introduction of the HOT optimization, when lazy_scan_heap did far less, and didn't actually prune during its initial heap pass. Since lazy_scan_heap now outsources far more work to lower level functions, it makes sense to introduce the function by talking about the high level invariant that dictates the order in which each phase takes place. Also deemphasize the case where we run out of memory for TIDs, since delaying that discussion makes it easier to talk about issues of central importance. Finally, remove discussion of parallel VACUUM from header comments. These don't add much, and are in the wrong place.
* Go back to considering HOT on pages marked full.Peter Geoghegan2021-11-26
| | | | | | | | | | | | | | | | | | | | Commit 2fd8685e7f simplified the checking of modified attributes that takes place within heap_update(). This included a micro-optimization affecting pages marked PD_PAGE_FULL: don't even try to use HOT to save a few cycles on determining HOT safety. The assumption was that it won't work out this time around, since it can't have worked out last time around. Remove the micro-optimization. It could only ever save cycles that are consumed by the vast majority of heap_update() calls, which hardly seems worth the added complexity. It also seems quite possible that there are workloads that will do worse over time by repeated application of the micro-optimization, despite saving some cycles on average, in the short term. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAH2-WznU1L3+DMPr1F7o2eJBT7=3bAJoY6ZkWABAxNt+-afyTA@mail.gmail.com
* Fix determination of broken LSN in OVERWRITTEN_CONTRECORDAlvaro Herrera2021-11-26
| | | | | | | | | | | | | | In commit ff9f111bce24 I mixed up inconsistent definitions of the LSN of the first record in a page, when the previous record ends exactly at the page boundary. The correct LSN is adjusted to skip the WAL page header; I failed to use that when setting XLogReaderState->overwrittenRecPtr, so at WAL replay time VerifyOverwriteContrecord would refuse to let replay continue past that record. Backpatch to 10. 9.6 also contains this bug, but it's no longer being maintained. Discussion: https://postgr.es/m/45597.1637694259@sss.pgh.pa.us
* Update commentsPeter Eisentraut2021-11-26
| | | | | | | | Various places wanted to point out that tuple descriptors don't contain the variable-length fields of pg_attribute. This started when attacl was added, but more fields have been added since, and these comments haven't been kept up to date consistently. Reword so that the purpose is clearer and we don't have to keep updating them.
* Replace straggling uses of ReadRecPtr/EndRecPtr.Andres Freund2021-11-24
| | | | | | | d2ddfa681db removed ReadRecPtr/EndRecPtr, but two uses within an #ifdef WAL_DEBUG escaped. Discussion: https://postgr.es/m/20211124231206.gbadj5bblcljb6d5@alap3.anarazel.de
* xlog.c: Remove global variables ReadRecPtr and EndRecPtr.Robert Haas2021-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In most places, the variables necessarily store the same value as the eponymous members of the XLogReaderState that we use during WAL replay, because ReadRecord() assigns the values from the structure members to the global variables just after XLogReadRecord() returns. However, XLogBeginRead() adjusts the structure members but not the global variables, so after XLogBeginRead() and before the completion of XLogReadRecord() the values can differ. Otherwise, they must be identical. According to my analysis, the only place where either variable is referenced at a point where it might not have the same value as the structure member is the refrence to EndRecPtr within XLogPageRead. Therefore, at every other place where we are using the global variable, we can just switch to using the structure member instead, and remove the global variable. However, we can, and in fact should, do this in XLogPageRead() as well, because at that point in the code, the global variable will actually store the start of the record we want to read - either because it's where the last WAL record ended, or because the read position has been changed using XLogBeginRead since the last record was read. The structure member, on the other hand, will already have been updated to point to the end of the record we just read. Elsewhere, the latter is what we use as an argument to emode_for_corrupt_record(), so we should do the same here. This part of the patch is perhaps a bug fix, but I don't think it has any important consequences, so no back-patch. The point here is just to continue to whittle down the entirely excessive use of global variables in xlog.c. Discussion: http://postgr.es/m/CA+Tgmoao96EuNeSPd+hspRKcsCddu=b1h-QNRuKfY8VmfNQdfg@mail.gmail.com
* Fix corner-case failure to detect improper timeline switch.Robert Haas2021-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rescanLatestTimeLine() contains a guard against switching to a timeline that forked off from the current one prior to the current recovery point, but that guard does not work if the timeline switch occurs before the first WAL recod (which must be the checkpoint record) is read. Without this patch, an improper timeline switch is therefore possible in such cases. This happens because rescanLatestTimeLine() relies on the global variable EndRecPtr to understand the current position of WAL replay. However, EndRecPtr at this point in the code contains the endpoint of the last-replayed record, not the startpoint or endpoint of the record being replayed now. Thus, before any records have been replayed, it's zero, which causes the sanity check to always pass. To fix, pass down the correct timeline explicitly. The EndRecPtr value we want is the one from the xlogreader, which will be the starting position of the record we're about to try to read, rather than the global variable, which is the ending position of the last record we successfully read. They're usually the same, but not in the corner case described here. No back-patch, because in v14 and earlier branhes, we were using the wrong TLI here as well as the wrong LSN. In master, that was fixed by commit 4a92a1c3d1c361ffb031ed05bf65b801241d7cdd, but that and it's prerequisite patches are too invasive to back-patch for such a minor issue. Patch by me, reviewed by Amul Sul. Discussion: http://postgr.es/m/CA+Tgmoao96EuNeSPd+hspRKcsCddu=b1h-QNRuKfY8VmfNQdfg@mail.gmail.com
* Be more specific about OOM in XLogReaderAllocateAlvaro Herrera2021-11-22
| | | | | | | | | | | A couple of spots can benefit from an added errdetail(), which matches what we were already doing in other places; and those that cannot withstand errdetail() can get a more descriptive primary message. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://postgr.es/m/CALj2ACV+cX1eM03GfcA=ZMLXh5fSn1X1auJLz3yuS1duPSb9QA@mail.gmail.com
* Report wait events for local shell commands like archive_command.Fujii Masao2021-11-22
| | | | | | | | | This commit introduces new wait events for archive_command, archive_cleanup_command, restore_command and recovery_end_command. Author: Fujii Masao Reviewed-by: Bharath Rupireddy, Michael Paquier Discussion: https://postgr.es/m/4ca4f920-6b48-638d-08b2-93598356f5d3@oss.nttdata.com
* Remove lazy_scan_heap parallel VACUUM comment block.Peter Geoghegan2021-11-21
| | | | | | This doesn't belong next to very high level discussion of the tasks that lazy_scan_heap performs. There is already a similar, longer comment block at the top of vacuumlazy.c that mentions lazy_scan_heap directly.
* Fix SP-GiST scan initialization logic for binary-compatible cases.Tom Lane2021-11-20
| | | | | | | | | | | | | | | | | | | | | | | Commit ac9099fc1 rearranged the logic in spgGetCache() that determines the index's attType (nominal input data type) and leafType (actual type stored in leaf index tuples). Turns out this broke things for the case where (a) the actual input data type is different from the nominal type, (b) the opclass's config function leaves leafType defaulted, and (c) the opclass has no "compress" function. (b) caused us to assign the actual input data type as leafType, and then since that's not attType, we complained that a "compress" function is required. For non-polymorphic opclasses, condition (a) arises in binary-compatible cases, such as using SP-GiST text_ops for a varchar column, or using any opclass on a domain over its nominal input type. To fix, use attType for leafType when the index's declared column type is different from but binary-compatible with attType. Do this only in the defaulted-leafType case, to avoid overriding any explicit selection made by the opclass. Per bug #17294 from Ilya Anfimov. Back-patch to v14. Discussion: https://postgr.es/m/17294-8f6c7962ce877edc@postgresql.org
* Fix parallel operations that prevent oldest xmin from advancing.Amit Kapila2021-11-19
| | | | | | | | | | | | | | | | | While determining xid horizons, we skip over backends that are running Vacuum. We also ignore Create Index Concurrently, or Reindex Concurrently for the purposes of computing Xmin for Vacuum. But we were not setting the flags corresponding to these operations when they are performed in parallel which was preventing Xid horizon from advancing. The optimization related to skipping Create Index Concurrently, or Reindex Concurrently operations was implemented in PG-14 but the fix is the same for the Parallel Vacuum as well so back-patched till PG-13. Author: Masahiko Sawada Reviewed-by: Amit Kapila Backpatch-through: 13 Discussion: https://postgr.es/m/CAD21AoCLQqgM1sXh9BrDFq0uzd3RBFKi=Vfo6cjjKODm0Onr5w@mail.gmail.com
* Remove global variable "LastRec" in xlog.cMichael Paquier2021-11-17
| | | | | | | | | This variable is used only by StartupXLOG() now, so let's make it local to simplify the code. Author: Amul Sul Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CAAJ_b96Qd023itERBRN9Z7P2saNDT3CYvGuMO8RXwndVNN6z7g@mail.gmail.com
* Move InitXLogInsert() call from InitXLOGAccess() to BaseInit().Robert Haas2021-11-16
| | | | | | | | | | | | | | | | | At present, there is an undocumented coding rule that you must call RecoveryInProgress(), or do something else that results in a call to InitXLogInsert(), before trying to write WAL. Otherwise, the WAL construction buffers won't be initialized, resulting in failures. Since it's not good to rely on a status inquiry function like RecoveryInProgress() having the side effect of initializing critical data structures, instead do the initialization eariler, when the backend first starts up. Patch by me. Reviewed by Nathan Bossart and Michael Paquier. Discussion: http://postgr.es/m/CA+TgmoY7b65qRjzHN_tWUk8B4sJqk1vj1d31uepVzmgPnZKeLg@mail.gmail.com
* Explain pruning pgstats accounting subtleties.Peter Geoghegan2021-11-12
| | | | | | | | | | | | | | | | Add a comment explaining why the pgstats accounting used during opportunistic heap pruning operations (to maintain the current number of dead tuples in the relation) needs to compensate by subtracting away the number of new LP_DEAD items. This is needed so it can avoid completely forgetting about tuples that become LP_DEAD items during pruning -- they should still count. It seems more natural to discuss this issue at the only relevant call site (opportunistic pruning), since the same issue does not apply to the only other caller (the VACUUM call site). Move everything there too. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wzm7f+A6ej650gi_ifTgbhsadVW5cujAL3punpupHff5Yg@mail.gmail.com
* Report any XLogReadRecord() error in XlogReadTwoPhaseData().Noah Misch2021-11-11
| | | | | | | | | | Buildfarm members kittiwake and tadarida have witnessed errors at this site. The site discarded key facts. Back-patch to v10 (all supported versions). Reviewed by Michael Paquier and Tom Lane. Discussion: https://postgr.es/m/20211107013157.GB790288@rfd.leadboat.com
* Update heap_page_prune() free space map comments.Peter Geoghegan2021-11-11
| | | | | | | | It is up to the heap_page_prune() caller to decide what to do about updating the FSM for a page following pruning. Update old comments that address what we might want to do as if it was the responsibility of heap_page_prune() itself. heap_page_prune() doesn't have enough high-level context to make a sensible choice.
* Update another obsolete reference in vacuumlazy.c.Peter Geoghegan2021-11-11
| | | | Addresses an oversight in commit 7ab96cf6.
* Improve performance of pgarch_readyXlog() with many status files.Robert Haas2021-11-11
| | | | | | | | | | | | | | | | | | | Presently, the archive_status directory was scanned for each file to archive. When there are many status files, say because archive_command has been failing for a long time, these directory scans can get very slow. With this change, the archiver remembers several files to archive during each directory scan, speeding things up. To ensure timeline history files are archived as quickly as possible, XLogArchiveNotify() forces the archiver to do a new directory scan as soon as the .ready file for one is created. Nathan Bossart, per a long discussion involving many people. It is not clear to me exactly who out of all those people reviewed this particular patch. Discussion: http://postgr.es/m/CA+TgmobhAbs2yabTuTRkJTq_kkC80-+jw=pfpypdOJ7+gAbQbw@mail.gmail.com Discussion: http://postgr.es/m/620F3CE1-0255-4D66-9D87-0EADE866985A@amazon.com
* More cleanup of 'ThisTimeLineID'.Robert Haas2021-11-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In XLogCtlData, rename the structure member ThisTimeLineID to InsertTimeLineID and update the comments to make clear that it's only expected to be set after recovery is complete. In StartupXLOG, replace the local variables ThisTimeLineID and PrevTimeLineID with new local variables replayTLI and newTLI. In the old scheme, ThisTimeLineID was the replay TLI until we created a new timeline, and after that the replay TLI was in PrevTimeLineID. Now, replayTLI is the TLI from which we last replayed WAL throughout the entire function, and newTLI is either that, or the new timeline created upon promotion. Remove some misleading comments from the comment block just above where recoveryTargetTimeLineGoal and friends are declared. It's become incorrect, not only because ThisTimeLineID as a variable is now gone, but also because the rmgr code does not care about ThisTimeLineID and has not since what used to be the TLI field in the page header was repurposed to store the page checksum. Add a comment GetFlushRecPtr that it's only supposed to be used in normal running, and an assertion to verify that this is so. Per some ideas from Michael Paquier and some of my own. Review by Michael Paquier also. Discussion: http://postgr.es/m/CA+TgmoY1a2d1AnVR3tJcKmGGkhj7GGrwiNwjtKr21dxOuLBzCQ@mail.gmail.com
* Silence uninitialized-variable warning.Tom Lane2021-11-07
| | | | | | | | Quite a few buildfarm animals are warning about this, and lapwing is actually failing (because -Werror). It's a false positive AFAICS, so no need to do more than zero the variable to start with. Discussion: https://postgr.es/m/YYXJnUxgw9dZKxlX@paquier.xyz
* Update obsolete reference in vacuumlazy.c.Peter Geoghegan2021-11-05
| | | | Oversight in commit 7ab96cf6.
* Fix handling of NaN values in BRIN minmax multiTomas Vondra2021-11-06
| | | | | | | | | | | | | | | | When calculating distance between float4/float8 values, we need to be a bit more careful about NaN values in order not to trigger assert. We consider NaN values to be equal (distace 0.0) and in infinite distance from all other values. On builds without asserts, this issue is mostly harmless - the ranges may be merged in less efficient order, but the index is still correct. Per report from Andreas Seltenreich. Backpatch to 14, where this new BRIN opclass was introduced. Reported-by: Andreas Seltenreich Discussion: https://postgr.es/m/87r1bw9ukm.fsf@credativ.de
* Update obsolete heap pruning comments.Peter Geoghegan2021-11-05
| | | | | | | | | | | | | Add new comments that spell out what VACUUM expects from heap pruning: pruning must never leave behind DEAD tuples that still have tuple storage. This has at least been the case since commit 8523492d, which established the principle that vacuumlazy.c doesn't have to deal with DEAD tuples that still have tuple storage directly, except perhaps by simply retrying pruning (to handle a rare corner case involving concurrent transaction abort). In passing, update some references to old symbol names that were missed by the snapshot scalability work (specifically commit dc7420c2c9).
* Change ThisTimeLineID from a global variable to a local variable.Robert Haas2021-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | StartupXLOG() still has ThisTimeLineID as a local variable, but the remaining code in xlog.c now needs to the relevant TimeLineID by some other means. Mostly, this means that we now pass it as a function parameter to a bunch of functions where we didn't previously. However, a few cases require special handling: - In functions that might be called by outside callers who wouldn't necessarily know what timeline to specify, we get the timeline ID from shared memory. XLogCtl->ThisTimeLineID can be used in most cases since recovery is known to have completed by the time those functions are called. In xlog_redo(), we can use XLogCtl->replayEndTLI. - XLogFileClose() needs to know the TLI of the open logfile. Do that with a new global variable openLogTLI. While someone could argue that this is just trading one global variable for another, the new one has a far more narrow purposes and is referenced in just a few places. - read_backup_label() now returns the TLI that it obtains by parsing the backup_label file. Previously, ReadRecord() could be called to parse the checkpoint record without ThisTimeLineID having been initialized. Now, the timeline is passed down, and I didn't want to pass an uninitialized variable; this change lets us avoid that. The old coding didn't seem to have any practical consequences that we need to worry about, but this is cleaner. - In BootstrapXLOG(), it's just a constant. Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and Álvaro Herrera. Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com
* Remove all use of ThisTimeLineID global variable outside of xlog.cRobert Haas2021-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All such code deals with this global variable in one of three ways. Sometimes the same functions use it in more than one of these ways at the same time. First, sometimes it's an implicit argument to one or more functions being called in xlog.c or elsewhere, and must be set to the appropriate value before calling those functions lest they misbehave. In those cases, it is now passed as an explicit argument instead. Second, sometimes it's used to obtain the current timeline after the end of recovery, i.e. the timeline to which WAL is being written and flushed. Such code now calls GetWALInsertionTimeLine() or relies on the new out parameter added to GetFlushRecPtr(). Third, sometimes it's used during recovery to store the current replay timeline. That can change, so such code must generally update the value before each use. It can still do that, but must now use a local variable instead. The net effect of these changes is to reduce by a fair amount the amount of code that is directly accessing this global variable. That's good, because history has shown that we don't always think clearly about which timeline ID it's supposed to contain at any given point in time, or indeed, whether it has been or needs to be initialized at any given point in the code. Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and Álvaro Herrera. Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com
* Add hardening to catch invalid TIDs in indexes.Peter Geoghegan2021-11-04
| | | | | | | | | | | | | | | | | | | Add hardening to the heapam index tuple deletion path to catch TIDs in index pages that point to a heap item that index tuples should never point to. The corruption we're trying to catch here is particularly tricky to detect, since it typically involves "extra" (corrupt) index tuples, as opposed to the absence of required index tuples in the index. For example, a heap TID from an index page that turns out to point to an LP_UNUSED item in the heap page has a good chance of being caught by one of the new checks. There is a decent chance that the recently fixed parallel VACUUM bug (see commit 9bacec15) would have been caught had that particular check been in place for Postgres 14. No backpatch of this extra hardening for now, though. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wzk-4_raTzawWGaiqNvkpwDXxv3y1AQhQyUeHfkU=tFCeA@mail.gmail.com
* Add various assertions to heap pruning code.Peter Geoghegan2021-11-04
| | | | | | | | | | | These assertions document (and verify) our high level assumptions about how pruning can and cannot affect existing items from target heap pages. For example, one of the new assertions verifies that pruning does not set a heap-only tuple to LP_DEAD. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wz=vhvBx1GjF+oueHh8YQcHoQYrMi0F0zFMHEr8yc4sCoA@mail.gmail.com
* Fix parallel amvacuumcleanup safety bug.Peter Geoghegan2021-11-02
| | | | | | | | | | | | | | Commit b4af70cb inverted the return value of the function parallel_processing_is_safe(), but missed the amvacuumcleanup test. Index AMs that don't support parallel cleanup at all were affected. The practical consequences of this bug were not very serious. Hash indexes are affected, but since they just return the number of blocks during hashvacuumcleanup anyway, it can't have had much impact. Author: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoA-Em+aeVPmBbL_s1V-ghsJQSxYL-i3JP8nTfPiD1wjKw@mail.gmail.com Backpatch: 14-, where commit b4af70cb appears.
* Don't overlook indexes during parallel VACUUM.Peter Geoghegan2021-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b4af70cb, which simplified state managed by VACUUM, performed refactoring of parallel VACUUM in passing. Confusion about the exact details of the tasks that the leader process is responsible for led to code that made it possible for parallel VACUUM to miss a subset of the table's indexes entirely. Specifically, indexes that fell under the min_parallel_index_scan_size size cutoff were missed. These indexes are supposed to be vacuumed by the leader (alongside any parallel unsafe indexes), but weren't vacuumed at all. Affected indexes could easily end up with duplicate heap TIDs, once heap TIDs were recycled for new heap tuples. This had generic symptoms that might be seen with almost any index corruption involving structural inconsistencies between an index and its table. To fix, make sure that the parallel VACUUM leader process performs any required index vacuuming for indexes that happen to be below the size cutoff. Also document the design of parallel VACUUM with these below-size-cutoff indexes. It's unclear how many users might be affected by this bug. There had to be at least three indexes on the table to hit the bug: a smaller index, plus at least two additional indexes that themselves exceed the size cutoff. Cases with just one additional index would not run into trouble, since the parallel VACUUM cost model requires two larger-than-cutoff indexes on the table to apply any parallel processing. Note also that autovacuum was not affected, since it never uses parallel processing. Test case based on tests from a larger patch to test parallel VACUUM by Masahiko Sawada. Many thanks to Kamigishi Rei for her invaluable help with tracking this problem down. Author: Peter Geoghegan <pg@bowt.ie> Author: Masahiko Sawada <sawada.mshk@gmail.com> Reported-By: Kamigishi Rei <iijima.yun@koumakan.jp> Reported-By: Andrew Gierth <andrew@tao11.riddles.org.uk> Diagnosed-By: Andres Freund <andres@anarazel.de> Bug: #17245 Discussion: https://postgr.es/m/17245-ddf06aaf85735f36@postgresql.org Discussion: https://postgr.es/m/20211030023740.qbnsl2xaoh2grq3d@alap3.anarazel.de Backpatch: 14-, where the refactoring commit appears.
* Move MarkCurrentTransactionIdLoggedIfAny() out of the critical section.Amit Kapila2021-11-02
| | | | | | | | | | | We don't modify any shared state in this function which could cause problems for any concurrent session. This will make it look similar to the other updates for the same structure (TransactionState) which avoids confusion for future readers of code. Author: Dilip Kumar Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org
* Replace XLOG_INCLUDE_XID flag with a more localized flag.Amit Kapila2021-11-02
| | | | | | | | | | | | | | | | Commit 0bead9af484c introduced XLOG_INCLUDE_XID flag to indicate that the WAL record contains subXID-to-topXID association. It uses that flag later to mark in CurrentTransactionState that top-xid is logged so that we should not try to log it again with the next WAL record in the current subtransaction. However, we can use a localized variable to pass that information. In passing, change the related function and variable names to make them consistent with what the code is actually doing. Author: Dilip Kumar Reviewed-by: Alvaro Herrera, Amit Kapila Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org
* Replace unicode characters in comments with asciiDaniel Gustafsson2021-11-01
| | | | | | | | | | | | | | | | | | | The unicode characters, while in comments and not code, caused MSVC to emit compiler warning C4819: The file contains a character that cannot be represented in the current code page (number). Save the file in Unicode format to prevent data loss. Fix by replacing the characters in print.c with descriptive comments containing the codepoints and symbol names, and remove the character in brin_bloom.c which was a footnote reference copied from the paper citation. Per report from hamerkop in the buildfarm. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/340E4118-0D0C-4E85-8141-8C40EB22DA3A@yesql.se
* Demote pg_unreachable() in heapam to an assertion.Peter Geoghegan2021-10-29
| | | | | | | | | | | | | | Commit d168b66682, which overhauled index deletion, added a pg_unreachable() to the end of a sort comparator used when sorting heap TIDs from an index page. This allows the compiler to apply optimizations that assume that the heap TIDs from the index AM must always be unique. That doesn't seem like a good idea now, given recent reports of corruption involving duplicate TIDs in indexes on Postgres 14. Demote to an assertion, just in case. Backpatch: 14-, where index deletion was overhauled.
* Remove obsolete nbtree LP_DEAD item comments.Peter Geoghegan2021-10-27
| | | | | | | | Comments above _bt_findinsertloc() that talk about LP_DEAD items are now out of place. We already discuss index tuple deletion at an earlier point in the same comment block. Oversight in commit d168b666.
* Fix typos in commentsDaniel Gustafsson2021-10-27
| | | | | Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/CAHut+PsN_gmKu-KfeEb9NDARoTPbs4AN4PPu=6LZXFZRJ13SEw@mail.gmail.com
* Fix ordering of items in nbtree error message.Peter Geoghegan2021-10-27
| | | | | | Oversight in commit a5213adf. Backpatch: 13-, just like commit a5213adf.
* Further harden nbtree posting split code.Peter Geoghegan2021-10-27
| | | | | | | | | | | Add more defensive checks around posting list split code. These should detect corruption involving duplicate table TIDs earlier and more reliably than any existing check. Follow up to commit 8f72bbac. Discussion: https://postgr.es/m/CAH2-WzkrSY_kjyd1_M5xJK1uM0govJXMxPn8JUSvwcUOiHuWVw@mail.gmail.com Backpatch: 13-, where nbtree deduplication was introduced.
* Initialize variable to placate compiler.Robert Haas2021-10-25
| | | | | | Per Nathan Bossart. Discussion: http://postgr.es/m/FECEE7FC-CB74-45A9-BB24-89FEE52A9585@amazon.com
* Report progress of startup operations that take a long time.Robert Haas2021-10-25
| | | | | | | | | | | | | | | | | | | | | | | | Users sometimes get concerned whe they start the server and it emits a few messages and then doesn't emit any more messages for a long time. Generally, what's happening is either that the system is taking a long time to apply WAL, or it's taking a long time to reset unlogged relations, or it's taking a long time to fsync the data directory, but it's not easy to tell which is the case. To fix that, add a new 'log_startup_progress_interval' setting, by default 10s. When an operation that is known to be potentially long-running takes more than this amount of time, we'll log a status update each time this interval elapses. To avoid undesirable log chatter, don't log anything about WAL replay when in standby mode. Nitin Jadhav and Robert Haas, reviewed by Amul Sul, Bharath Rupireddy, Justin Pryzby, Michael Paquier, and Álvaro Herrera. Discussion: https://postgr.es/m/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com Discussion: https://postgr.es/m/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com
* StartupXLOG: Don't repeatedly disable/enable local xlog insertion.Robert Haas2021-10-25
| | | | | | | | | | | | | | | | | | | All the code that runs in the startup process to write WAL records before that's allowed generally is now consecutive, so there's no reason to shut the facility to write WAL locally off and then turn it on again three times in a row. Unfortunately, this requires a slight kludge in the checkpointer, which needs to separately enable writing WAL in order to write the checkpoint record. Because that code might run in the same process as StartupXLOG() if we are in single-user mode, we must save/restore the state of the LocalXLogInsertAllowed flag. Hopefully, we'll be able to eliminate this wart in further refactoring, but it's not too bad anyway. Amul Sul, with modifications by me. Discussion: http://postgr.es/m/CAAJ_b97fysj6sRSQEfOHj-y8Jfd5uPqOgO74qast89B4WfD+TA@mail.gmail.com