aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/transam
Commit message (Collapse)AuthorAge
* More -Wshadow=compatible-local warning fixesDavid Rowley2022-08-26
| | | | | | | | | | | | In a similar effort to f01592f91, here we're targetting fixing the warnings where we've deemed the shadowing variable to serve a close enough purpose to the shadowed variable just to reuse the shadowed version and not declare the shadowing variable at all. By my count, this takes the warning count from 106 down to 71. Author: Justin Pryzby Discussion: https://postgr.es/m/20220825020839.GT2342@telsasoft.com
* Allow parallel workers to retrieve some data from PortMichael Paquier2022-08-24
| | | | | | | | | | | | | | | | | | | | | | | | | This commit moves authn_id into a new global structure called ClientConnectionInfo (mapping to a MyClientConnectionInfo for each backend) which is intended to hold all the client information that should be shared between the backend and any of its parallel workers, access for extensions and triggers being the primary use case. There is no need to push all the data of Port to the workers, and authn_id is quite a generic concept so using a separate structure provides the best balance (the name of the structure has been suggested by Robert Haas). While on it, and per discussion as this would be useful for a potential SYSTEM_USER that can be accessed through parallel workers, a second field is added for the authentication method, copied directly from Port. ClientConnectionInfo is serialized and restored using a new parallel key and a structure tracks the length of the authn_id, making the addition of more fields straight-forward. Author: Jacob Champion Reviewed-by: Bertrand Drouvot, Stephen Frost, Robert Haas, Tom Lane, Michael Paquier, Julien Rouhaud Discussion: https://postgr.es/m/793d990837ae5c06a558d58d62de9378ab525d83.camel@vmware.com
* Remove empty statementJohn Naylor2022-08-23
| | | | | | Peter Smith Discussion: https://www.postgresql.org/message-id/CAHut%2BPtRGVuj8Q_GpHHxZyk7fGwdYDG8_s4GSfKoc_4Yd9vR-w%40mail.gmail.com
* Adjust assertion in XLogDecodeNextRecord.Robert Haas2022-08-18
| | | | | | | | | | | | | | | | | As written, if you use XLogBeginRead() to position an xlogreader at the beginning of a WAL page and then try to read WAL, this assertion will fail. However, the header comment for XLogBeginRead() claims that positioning an xlogreader at the beginning of a page is valid, and the code here is perfectly able to cope with it. It's only the assertion that causes trouble. So relax it. This is formally a bug in all supported branches, but as it doesn't seem to have any consequences for current uses of the xlogreader facility, no back-patch, at least for now. Dilip Kumar and Robert Haas Discussion: http://postgr.es/m/CA+TgmoaJSs2_7WHW2GzFYe9+zfPtxBKvT3GW47+x=ptUE=cULw@mail.gmail.com
* Use SetInstallXLogFileSegmentActive() in more places in xlog.cMichael Paquier2022-08-17
| | | | | | | | | | This reduces the code paths where XLogCtl->InstallXLogFileSegmentActive is directly touched, and this wrapper function does the same thing as the original code replaced by the function call. Author: Bharath Rupireddy Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/CALj2ACVhkf-bC5CX-=6iBUfkO5GqmBntQH+m=HpY0iQ=-g1pRg@mail.gmail.com
* Move basebackup code to new directory src/backend/backupRobert Haas2022-08-10
| | | | | | Reviewed by David Steele and Justin Pryzby Discussion: http://postgr.es/m/CA+TgmoafqboATDSoXHz8VLrSwK_MDhjthK4hEpYjqf9_1Fmczw%40mail.gmail.com
* Fix obsolete comment in commit_ts.c.Thomas Munro2022-08-09
| | | | | | | | Commit 08aa89b removed COMMIT_TS_SETTS, but left a reference in a comment. Author: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://postgr.es/m/20220726173343.GA154110%40nathanxps13
* Remove configure probe for fdatasync.Thomas Munro2022-08-05
| | | | | | | | | | | | | | | | | fdatasync() is in SUSv2, and all targeted Unix systems have it. We have a replacement function for Windows. We retain the probe for the function declaration, which allows us to supply the mysteriously missing declaration for macOS, and also for Windows. No need to keep a HAVE_FDATASYNC macro around. Also rename src/port/fdatasync.c to win32fdatasync.c since it's only for Windows. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
* Remove dead pread and pwrite replacement code.Thomas Munro2022-08-05
| | | | | | | | | | | | | | | | | | | | | | | pread() and pwrite() are in SUSv2, and all targeted Unix systems have them. Previously, we defined pg_pread and pg_pwrite to emulate these function with lseek() on old Unixen. The names with a pg_ prefix were a reminder of a portability hazard: they might change the current file position. That hazard is gone, so we can drop the prefixes. Since the remaining replacement code is Windows-only, move it into src/port/win32p{read,write}.c, and move the declarations into src/include/port/win32_port.h. No need for vestigial HAVE_PREAD, HAVE_PWRITE macros as they were only used for declarations in port.h which have now moved into win32_port.h. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
* Remove configure probes for symlink/readlink, and dead code.Thomas Munro2022-08-05
| | | | | | | | | | | | | | | | | | | symlink() and readlink() are in SUSv2 and all targeted Unix systems have them. We have partial emulation on Windows. Code that raised runtime errors on systems without it has been dead for years, so we can remove that and also references to such systems in the documentation. Define HAVE_READLINK and HAVE_SYMLINK macros on Unix. Our Windows replacement functions based on junction points can't be used for relative paths or for non-directories, so the macros can be used to check for full symlink support. The places that deal with tablespaces can just use symlink functions without checking the macros. (If they did check the macros, they'd need to provide an #else branch with a runtime or compile time error, and it'd be dead code.) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
* Rephrase comments to make them clearerDaniel Gustafsson2022-08-04
| | | | | | | | | | | | The use of "we" when referring to the active backend might be misunderstood, so rephrase to make it clearer who is performing the actions discussed in the comment. Author: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Erikjan Rijkers <er@xs4all.nl> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAEG8a3LRSMqkvjiURiJoSi4aGWORpiXUmUfQQK5PaD6WfPzu3w@mail.gmail.com
* Fix replay of create database records on standbyAlvaro Herrera2022-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Crash recovery on standby may encounter missing directories when replaying database-creation WAL records. Prior to this patch, the standby would fail to recover in such a case; however, the directories could be legitimately missing. Consider the following sequence of commands: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, crash recovery must be able to continue. A fix for this problem was already attempted in 49d9cfc68bf4, but it was reverted because of design issues. This new version is based on Robert Haas' proposal: any missing tablespaces are created during recovery before reaching consistency. Tablespaces are created as real directories, and should be deleted by later replay. CheckRecoveryConsistency ensures they have disappeared. The problems detected by this new code are reported as PANIC, except when allow_in_place_tablespaces is set to ON, in which case they are WARNING. Apart from making tests possible, this gives users an escape hatch in case things don't go as planned. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Asim R Praveen <apraveen@pivotal.io> Author: Paul Guo <paulguo@gmail.com> Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions) Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions) Reviewed-by: Michaƫl Paquier <michael@paquier.xyz> Diagnosed-by: Paul Guo <paulguo@gmail.com> Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
* Add overflow protection for block-related data in WAL recordsMichael Paquier2022-07-27
| | | | | | | | | | | | | | | | | | | | | | | | | | XLogRecordBlockHeader, the header holding the information for the data related to a block, tracks the length of the data appended to the WAL record with data_length (uint16). This limitation in size was not enforced by the public routine in charge of registering the data assembled later to form the WAL record inserted, XLogRegisterBufData(). Incorrectly used, it could lead to the generation of records with some of its data overflowed. This commit adds some safeguards to prevent that for the block data, complaining immediately if attempting to add to a record block information with a size larger than UINT16_MAX, which is the limit implied by the internal logic. Note that this also adjusts XLogRegisterData() and XLogRegisterBufData() so as the length of the WAL record data given by the caller is unsigned, matching with what gets stored in XLogRecData->len. Extracted from a larger patch by the same author. The original patch includes more protections when assembling a record in full that will be looked at separately later. Author: Matthias van de Meent Reviewed-by: Andres Freund, Heikki Linnakangas, Michael Paquier, David Zhang Discussion: https://postgr.es/m/CAEze2WgGiw+LZt+vHf8tWqB_6VxeLsMeoAuod0N=ij1q17n5pw@mail.gmail.com
* Force immediate commit after CREATE DATABASE etc in extended protocol.Tom Lane2022-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a few commands that "can't run in a transaction block", meaning that if they complete their processing but then we fail to COMMIT, we'll be left with inconsistent on-disk state. However, the existing defenses for this are only watertight for simple query protocol. In extended protocol, we didn't commit until receiving a Sync message. Since the client is allowed to issue another command instead of Sync, we're in trouble if that command fails or is an explicit ROLLBACK. In any case, sitting in an inconsistent state while waiting for a client message that might not come seems pretty risky. This case wasn't reachable via libpq before we introduced pipeline mode, but it's always been an intended aspect of extended query protocol, and likely there are other clients that could reach it before. To fix, set a flag in PreventInTransactionBlock that tells exec_execute_message to force an immediate commit. This seems to be the approach that does least damage to existing working cases while still preventing the undesirable outcomes. While here, add some documentation to protocol.sgml that explicitly says how to use pipelining. That's latent in the existing docs if you know what to look for, but it's better to spell it out; and it provides a place to document this new behavior. Per bug #17434 from Yugo Nagata. It's been wrong for ages, so back-patch to all supported branches. Discussion: https://postgr.es/m/17434-d9f7a064ce2a88a3@postgresql.org
* Remove useless arguments in ReadCheckpointRecord().Fujii Masao2022-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | This commit removes two arguments "report" and "whichChkpt" in ReadCheckpointRecord(). "report" is obviously useless because it's always true, i.e., there are two callers of the function and they always specify true as "report". Commit 1d919de5eb removed the only call with "report" = false. "whichChkpt" indicated where the specified checkpoint location came from, pg_control or backup_label. This information was used to report different error messages depending on where the invalid checkpoint record came from, when it was found. But ReadCheckpointRecord() doesn't need to do that because its callers already do that and users can still identify where the invalid checkpoint record came from, by reading such log messages. Also when "whichChkpt" was 0, the word "primary checkpoint" was used in the log message and could confuse users because the concept of primary and secondary checkpoints was already removed before. These are why this commit removes "whichChkpt" argument. Author: Fujii Masao Reviewed-by: Bharath Rupireddy, Kyotaro Horiguchi Discussion: https://postgr.es/m/fa2e12eb-81c3-0717-0272-755f8a81c8f2@oss.nttdata.com
* Remove unnecessary Windows-specific basebackup code.Thomas Munro2022-07-22
| | | | | | | | | | | | | | | | | Commit c6f2f016 added an explicit check for a Windows "junction point". That turned out to be needed only because get_dirent_type() was busted on Windows. It's been fixed by commit 9d3444dc, so remove it. Add a TAP-test to demonstrate that in-place tablespaces are copied by pg_basebackup. This exercises the codepath that would fail before c6f2f016 on Windows, and shows that it still doesn't fail now that we're using get_dirent_type() on both Windows and Unix. Back-patch to 15, where in-place tablespaces arrived and caused this problem (ie directories where previously only symlinks were expected). Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGLzLK4PUPx0_AwXEWXOYAejU%3D7XpxnYE55Y%2Be7hB2N3FA%40mail.gmail.com
* Remove O_FSYNC and associated macros.Thomas Munro2022-07-22
| | | | | | | | | | | | | | | | | | O_FSYNC was a pre-POSIX way of spelling O_SYNC, supported since commit 9d645fd84c3 for non-conforming operating systems of the time. It's not needed on any modern system. We can just use standard O_SYNC directly if it exists (= all targeted systems except Windows), and get rid of our OPEN_SYNC_FLAG macro. Similarly for standard O_DSYNC, we can just use that directly if it exists (= all targeted systems except DragonFlyBSD), and get rid of our OPEN_DATASYNC_FLAG macro. We still avoid choosing open_datasync as a default value for wal_sync_method if O_DSYNC has the same value as O_SYNC (= only OpenBSD), so there is no change in default behavior. Discussion: https://postgr.es/m/CA%2BhUKGJE7y92NY7FG2ftUbZUaqohBU65_Ys_7xF5mUHo4wirTQ%40mail.gmail.com
* Fix assertion failure and segmentation fault in backup code.Fujii Masao2022-07-20
| | | | | | | | | | | | | | | | | | | | | | When a non-exclusive backup is canceled, do_pg_abort_backup() is called and resets some variables set by pg_backup_start (pg_start_backup in v14 or before). But previously it forgot to reset the session state indicating whether a non-exclusive backup is in progress or not in this session. This issue could cause an assertion failure when the session running BASE_BACKUP is terminated after it executed pg_backup_start and pg_backup_stop (pg_stop_backup in v14 or before). Also it could cause a segmentation fault when pg_backup_stop is called after BASE_BACKUP in the same session is canceled. This commit fixes the issue by making do_pg_abort_backup reset that session state. Back-patch to all supported branches. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Michael Paquier, Robert Haas Discussion: https://postgr.es/m/3374718f-9fbf-a950-6d66-d973e027f44c@oss.nttdata.com
* Replace many MemSet calls with struct initializationPeter Eisentraut2022-07-16
| | | | | | | | | | | | | | This replaces all MemSet() calls with struct initialization where that is easily and obviously possible. (For example, some cases have to worry about padding bits, so I left those.) (The same could be done with appropriate memset() calls, but this patch is part of an effort to phase out MemSet(), so it doesn't touch memset() calls.) Reviewed-by: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/9847b13c-b785-f4e2-75c3-12ec77a3b05c@enterprisedb.com
* Add checkpoint and REDO LSN to log_checkpoints message.Fujii Masao2022-07-07
| | | | | | | | | | | | | It is useful for debugging purposes to report the checkpoint LSN and REDO LSN in log_checkpoints message. It can give more context while analyzing checkpoint-related issues. pg_controldata reports the last checkpoint LSN and REDO LSN, but having this information alongside the log message helps analyze issues that happened previously, connect the dots and identify the root cause. Author: Bharath Rupireddy, Kyotaro Horiguchi Reviewed-by: Michael Paquier, Julien Rouhaud, Nathan Bossart, Fujii Masao, Greg Stark Discussion: https://postgr.es/m/CALj2ACWt6kqriAHrO+AJj+OmP=suwbktHT5JoYAn-nqZe2gd2g@mail.gmail.com
* Change internal RelFileNode references to RelFileNumber or RelFileLocator.Robert Haas2022-07-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
* Replace durable_rename_excl() by durable_rename(), take twoMichael Paquier2022-07-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | durable_rename_excl() attempts to avoid overwriting any existing files by using link() and unlink(), and it falls back to rename() on some platforms (aka WIN32), which offers no such overwrite protection. Most callers use durable_rename_excl() just in case there is an existing file, but in practice there shouldn't be one (see below for more details). Furthermore, failures during durable_rename_excl() can result in multiple hard links to the same file. As per Nathan's tests, it is possible to end up with two links to the same file in pg_wal after a crash just before unlink() during WAL recycling. Specifically, the test produced links to the same file for the current WAL file and the next one because the half-recycled WAL file was re-recycled upon restarting, leading to WAL corruption. This change replaces all the calls of durable_rename_excl() to durable_rename(). This removes the protection against accidentally overwriting an existing file, but some platforms are already living without it and ordinarily there shouldn't be one. The function itself is left around in case any extensions are using it. It will be removed on HEAD via a follow-up commit. Here is a summary of the existing callers of durable_rename_excl() (see second discussion link at the bottom), replaced by this commit. First, basic_archive used it to avoid overwriting an archive concurrently created by another server, but as mentioned above, it will still overwrite files on some platforms. Second, xlog.c uses it to recycle past WAL segments, where an overwrite should not happen (origin of the change at f0e37a8) because there are protections about the WAL segment to select when recycling an entry. The third and last area is related to the write of timeline history files. writeTimeLineHistory() will write a new timeline history file at the end of recovery on promotion, so there should be no such files for the same timeline. What remains is writeTimeLineHistoryFile(), that can be used in parallel by a WAL receiver and the startup process, and some digging of the buildfarm shows that EEXIST from a WAL receiver can happen with an error of "could not link file \"pg_wal/xlogtemp.NN\" to \"pg_wal/MM.history\", which would cause an automatic restart of the WAL receiver as it is promoted to FATAL, hence this should improve the stability of the WAL receiver as rename() would overwrite an existing TLI history file already fetched by the startup process at recovery. This is a bug fix, but knowing the unlikeliness of the problem involving one or more crashes at an exceptionally bad moment, no backpatch is done. Also, I want to be careful with such changes (aaa3aed did the opposite of this change by removing HAVE_WORKING_LINK so as Windows would do a link() rather than a rename() but this was not concurrent-safe). A backpatch could be revisited in the future. This is the second time this change is attempted, ccfbd92 being the first one, but this time no assertions are added for the case of a TLI history file written concurrently by the WAL receiver or the startup process because we can expect one to exist (some of the TAP tests are able to trigger with a proper timing). Author: Nathan Bossart Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13 Discussion: https://postgr.es/m/Ym6GZbqQdlalSKSG@paquier.xyz
* Fix code comments still referring to pg_start/stop_backup()Michael Paquier2022-07-01
| | | | | | | | | pg_start_backup() and pg_stop_backup() have been respectively renamed to pg_backup_start() and pg_backup_stop() as of 39969e2, but a few comments did not get the call. Reviewed-by: Kyotaro Horiguchi, David Steele Discussion: https://postgr.es/m/YrqGlj1+4DF3dbZ/@paquier.xyz
* pgindent run prior to branching v15.Tom Lane2022-06-30
| | | | pgperltidy and reformat-dat-files too. Not many changes.
* Fix visibility check when XID is committed in CLOG but not in procarray.Heikki Linnakangas2022-06-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TransactionIdIsInProgress had a fast path to return 'false' if the single-item CLOG cache said that the transaction was known to be committed. However, that was wrong, because a transaction is first marked as committed in the CLOG but doesn't become visible to others until it has removed its XID from the proc array. That could lead to an error: ERROR: t_xmin is uncommitted in tuple to be updated or for an UPDATE to go ahead without blocking, before the previous UPDATE on the same row was made visible. The window is usually very short, but synchronous replication makes it much wider, because the wait for synchronous replica happens in that window. Another thing that makes it hard to hit is that it's hard to get such a commit-in-progress transaction into the single item CLOG cache. Normally, if you call TransactionIdIsInProgress on such a transaction, it determines that the XID is in progress without checking the CLOG and without populating the cache. One way to prime the cache is to explicitly call pg_xact_status() on the XID. Another way is to use a lot of subtransactions, so that the subxid cache in the proc array is overflown, making TransactionIdIsInProgress rely on pg_subtrans and CLOG checks. This has been broken ever since it was introduced in 2008, but the race condition is very hard to hit, especially without synchronous replication. There were a couple of reports of the error starting from summer 2021, but no one was able to find the root cause then. TransactionIdIsKnownCompleted() is now unused. In 'master', remove it, but I left it in place in backbranches in case it's used by extensions. Also change pg_xact_status() to check TransactionIdIsInProgress(). Previously, it only checked the CLOG, and returned "committed" before the transaction was actually made visible to other queries. Note that this also means that you cannot use pg_xact_status() to reproduce the bug anymore, even if the code wasn't fixed. Report and analysis by Konstantin Knizhnik. Patch by Simon Riggs, with the pg_xact_status() change added by me. Author: Simon Riggs Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru
* Be more careful about GucSource for internally-driven GUC settings.Tom Lane2022-06-08
| | | | | | | | | | | | | | | | | | | | | | | The original advice for hard-wired SetConfigOption calls was to use PGC_S_OVERRIDE, particularly for PGC_INTERNAL GUCs. However, that's really overkill for PGC_INTERNAL GUCs, since there is no possibility that we need to override a user-provided setting. Instead use PGC_S_DYNAMIC_DEFAULT in most places, so that the value will appear with source = 'default' in pg_settings and thereby not be shown by psql's new \dconfig command. The one exception is that when changing in_hot_standby in a hot-standby session, we still use PGC_S_OVERRIDE, because people felt that seeing that in \dconfig would be a good thing. Similarly use PGC_S_DYNAMIC_DEFAULT for the auto-tune value of wal_buffers (if possible, that is if wal_buffers wasn't explicitly set to -1), and for the typical 2MB value of max_stack_depth. In combination these changes remove four not-very-interesting entries from the typical output of \dconfig, all of which people fingered as "why is that showing up?" in the discussion thread. Discussion: https://postgr.es/m/3118455.1649267333@sss.pgh.pa.us
* Revert "Add single-item cache when looking at topmost XID of a subtrans XID"Michael Paquier2022-05-28
| | | | | | | | | | | | | | | | | | This reverts commit 06f5295 as per issues with this approach, both in terms of efficiency impact and stability. First, contrary to the single-item cache for transaction IDs in transam.c, the cache may finish by not be hit for a long time, and without an invalidation mechanism to clear it, it would cause inconsistent results on wraparound for example. Second, the use of SubTransGetTopmostTransaction() for the caching has a limited impact on performance. SubTransGetParent() could have more impact, though the benchmarking of the single-item approach still needs to be proved, particularly under the conditions where SLRU lookups are stressed in parallel with overflowed snapshots (aka more than 64 subxids generated, for example). After discussion with Andres Freund. Discussion: https://postgr.es/m/20220524235250.gtt3uu5zktfkr4hv@alap3.anarazel.de
* Add 'static' to file-local variables missing it.Andres Freund2022-05-12
| | | | | | | | Noticed when comparing the set of exported symbols without / with -fvisibility=hidden after adding PGDLLIMPORT to intentionally exported symbols. Discussion: https://postgr.es/m/20220512164513.vaheofqp2q24l65r@alap3.anarazel.de
* Pre-beta mechanical code beautification.Tom Lane2022-05-12
| | | | | Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.
* Fix control file update done in restartpoints still running after promotionMichael Paquier2022-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a cluster is promoted (aka the control file shows a state different than DB_IN_ARCHIVE_RECOVERY) while CreateRestartPoint() is still processing, this function could miss an update of the control file for "checkPoint" and "checkPointCopy" but still do the recycling and/or removal of the past WAL segments, assuming that the to-be-updated LSN values should be used as reference points for the cleanup. This causes a follow-up restart attempting crash recovery to fail with a PANIC on a missing checkpoint record if the end-of-recovery checkpoint triggered by the promotion did not complete while the cluster abruptly stopped or crashed before the completion of this checkpoint. The PANIC would be caused by the redo LSN referred in the control file as located in a segment already gone, recycled by the previous restartpoint with "checkPoint" out-of-sync in the control file. This commit fixes the update of the control file during restartpoints so as "checkPoint" and "checkPointCopy" are updated even if the cluster has been promoted while a restartpoint is running, to be on par with the set of WAL segments actually recycled in the end of CreateRestartPoint(). This problem exists in all the stable branches. However, commit 7ff23c6, by removing the last call of CreateCheckPoint() from the startup process, has made this bug much easier to reason about as concurrent checkpoints are not possible anymore. No backpatch is done yet, mostly out of caution from me as a point release is close by, but we need to think harder about the case of concurrent checkpoints at promotion if the bgwriter is not considered as running by the startup process in ~v14, so this change is done only on HEAD for the moment. Reported-by: Fujii Masao, Rui Zhao Author: Kyotaro Horiguchi Reviewed-by: Nathan Bossart, Michael Paquier Discussion: https://postgr.es/m/20220316.102444.2193181487576617583.horikyota.ntt@gmail.com
* pg_walinspect: fix case where flush LSN is in the middle of a record.Jeff Davis2022-04-30
| | | | | | | | | | | | | | | | | | | | | Instability in the test for pg_walinspect revealed that pg_get_wal_records_info_till_end_of_wal(x) would try to decode all the records with a start LSN earlier than the flush LSN, even though that might include a partial record at the end of the range. In that case, read_local_xlog_page_no_wait() would return NULL when it tried to read past the flush LSN, which would be interpreted as an error by the caller. That caused a test failure only on a BF animal that had been restarted recently, but could be expected to happen in the wild quite easily depending on the alignment of various parameters. Fix by using private data in read_local_xlog_page_no_wait() to signal end-of-wal to the caller, so that it can be properly distinguished from a real error. Discussion: https://postgr.es/m/Ymd/e5eeZMNAkrXo%40paquier.xyz Discussion: https://postgr.es/m/111657.1650910309@sss.pgh.pa.us Authors: Thomas Munro, Bharath Rupireddy.
* Revert recent changes with durable_rename_excl()Michael Paquier2022-04-28
| | | | | | | | | | | | | | | This reverts commits 2c902bb and ccfbd92. Per buildfarm members kestrel, rorqual and calliphoridae, the assertions checking that a TLI history file should not exist when created by a WAL receiver have been failing, and switching to durable_rename() over durable_rename_excl() would cause the newest TLI history file to overwrite the existing one. We need to think harder about such cases, so revert the new logic for now. Note that all the failures have been reported in the test 025_stuck_on_old_timeline. Discussion: https://postgr.es/m/511362.1651116498@sss.pgh.pa.us
* Replace existing durable_rename_excl() calls with durable_rename()Michael Paquier2022-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | durable_rename_excl() attempts to avoid overwriting any existing files by using link() and unlink(), falling back to rename() on some platforms (e.g., Windows where link() followed by unlink() is not concurrent-safe, see 909b449). Most callers of durable_rename_excl() use it just in case there is an existing file, but it happens that for all of them we never expect a target file to exist (WAL segment recycling, creation of timeline history file and basic_archive). basic_archive used durable_rename_excl() to avoid overwriting an archive concurrently created by another server. Now, there is a stat() call to avoid overwriting an existing archive a couple of lines above, so note that this change opens a small TOCTOU window in this module between the stat() call and durable_rename(). Furthermore, as mentioned in the top comment of durable_rename_excl(), this routine can result in multiple hard links to the same file and data corruption, with two or more links to the same file in pg_wal/ if a crash happens before the unlink() call during WAL recycling. Specifically, this would produce links to the same file for the current WAL file and the next one because the half-recycled WAL file was re-recycled during crash recovery of a follow-up cluster restart. This change replaces all calls to durable_rename_excl() with durable_rename(). This removes the protection against accidentally overwriting an existing file, but some platforms are already living without it, and all those code paths never expect an existing file (a couple of assertions are added to check after that, in case). This is a bug fix, but knowing the unlikeliness of the problem involving one of more crashes at an exceptionally bad moment, no backpatch is done. This could be revisited in the future. Author: Nathan Bossart Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
* Rethink method for assigning OIDs to the template0 and postgres DBs.Tom Lane2022-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit aa0105141 assigned fixed OIDs to template0 and postgres in a very ad-hoc way. Notably, instead of teaching Catalog.pm about these OIDs, the unused_oids script was just hacked to not show them as unused. That's problematic since, for example, duplicate_oids wouldn't report any future conflict. Hence, invent a macro DECLARE_OID_DEFINING_MACRO() that can be used to define an OID that is known to Catalog.pm and will participate in duplicate-detection as well as renumbering by renumber_oids.pl. (We don't anticipate renumbering these particular OIDs, but we might as well build out all the Catalog.pm infrastructure while we're here.) Another issue is that aa0105141 neglected to touch IsPinnedObject, with the result that it now claimed template0 and postgres are pinned. The right thing to do there seems to be to teach it that no database is pinned, since in fact DROP DATABASE doesn't check for pinned-ness (and at least for these cases, that is an intentional choice). It's not clear whether this wrong answer had any visible effect, but perhaps it could have resulted in erroneous management of dependency entries. In passing, rename the TemplateDbOid macro to Template1DbOid to reduce confusion (likely we should have done that way back when we invented template0, but we didn't), and rename the OID macros for template0 and postgres to have a similar style. There are no changes to postgres.bki here, so no need for a catversion bump. Discussion: https://postgr.es/m/2935358.1650479692@sss.pgh.pa.us
* Don't retry restore_command while reading ahead.Thomas Munro2022-04-17
| | | | | | | | | | | | | | | | | | | | | | | | | | Suppress further attempts to read ahead in the WAL if we run out of data, until the records already decoded have been replayed. This restores the traditional behavior for continuous archive recovery, which is to retry the failing restore_command only every 5 seconds. With the coding in 5dc0418f, we would start retrying every time through the recovery loop when our WAL decoding window hit the end of the current segment and we tried to look ahead into a not-yet-available next file. That was very slow. Also change the no_readahead_until mechanism to use <= rather than <, which seems more useful. Otherwise we'd either get one extra unwanted retry of restore_command, or we'd need to add 1 to an LSN. No change in behavior for regular streaming. That was already limited by the flushedUpto variable, which won't be updated until we replay what we have already. Reported by Andres Freund while analyzing the failure of a TAP test on build farm animal skink (investigation ongoing but probably due to otherwise unrelated timing bugs triggered by this slowness magnified by valgrind). Discussion: https://postgr.es/m/20220409005910.alw46xqmmgny2sgr%40alap3.anarazel.de
* Remove extraneous blank lines before block-closing bracesAlvaro Herrera2022-04-13
| | | | | | | | | These are useless and distracting. We wouldn't have written the code with them to begin with, so there's no reason to keep them. Author: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch
* Revert the addition of GetMaxBackends() and related stuff.Robert Haas2022-04-12
| | | | | | | | | | | | This reverts commits 0147fc7, 4567596, aa64f23, and 5ecd018. There is no longer agreement that introducing this function was the right way to address the problem. The consensus now seems to favor trying to make a correct value for MaxBackends available to mdules executing their _PG_init() functions. Nathan Bossart Discussion: http://postgr.es/m/20220323045229.i23skfscdbvrsuxa@jrouhaud
* Make XLogRecGetBlockTag() throw error if there's no such block.Tom Lane2022-04-11
| | | | | | | | | | | | | | | | | | | | | | | | All but a few existing callers assume without checking that this function succeeds. While it probably will, that's a poor excuse for not checking. Let's make it return void and instead throw an error if it doesn't find the block reference. Callers that actually need to handle the no-such-block case must now use the underlying function XLogRecGetBlockTagExtended. In addition to being a bit less error-prone, this should also serve to suppress some Coverity complaints about XLogRecGetBlockRefInfo. While at it, clean up some inconsistency about use of the XLogRecHasBlockRef macro: make XLogRecGetBlockTagExtended use that instead of open-coding the same condition, and avoid calling XLogRecHasBlockRef twice in relevant code paths. (That is, calling XLogRecHasBlockRef followed by XLogRecGetBlockTag is now deprecated: use XLogRecGetBlockTagExtended instead.) Patch HEAD only; this doesn't seem to have enough value to consider a back-branch API break. Discussion: https://postgr.es/m/425039.1649701221@sss.pgh.pa.us
* Remove dead code in do_pg_backup_start().Tom Lane2022-04-11
| | | | | | | | | | As of commit 39969e2a1, no caller of do_pg_backup_start() passes NULL for labelfile or tblspcmapfile, nor is it plausible that any would do so in the future. Remove the code that coped with that case, as (a) it's dead and (b) it causes Coverity to bleat about possibly leaked storage. While here, do some janitorial work on the function's header comment.
* Fix various typos and spelling mistakes in code commentsDavid Rowley2022-04-11
| | | | | Author: Justin Pryzby Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
* Rename delayChkpt to delayChkptFlags.Robert Haas2022-04-08
| | | | | | | | | | | | | | | | | | Before commit 412ad7a55639516f284cd0ef9757d6ae5c7abd43, delayChkpt was a Boolean. Now it's an integer. Extensions using it need to be appropriately updated, so let's rename the field to make sure that a hard compilation failure occurs. Replacing delayChkpt with delayChkptFlags made a few comments extend past 80 characters, so I reflowed them and changed some wording very slightly. The back-branches will need a different change to restore compatibility with existing minor releases; this is just for master. Per suggestion from Tom Lane. Discussion: http://postgr.es/m/a7880f4d-1d74-582a-ada7-dad168d046d1@enterprisedb.com
* Check XLogRecHasBlockRef() before XLogRecHasBlockImage().Jeff Davis2022-04-08
| | | | Trial fix of buildfarm failures on kestrel and tamandua.
* Add contrib/pg_walinspect.Jeff Davis2022-04-08
| | | | | | | | | Provides similar functionality to pg_waldump, but from a SQL interface rather than a separate utility. Author: Bharath Rupireddy Reviewed-by: Greg Stark, Kyotaro Horiguchi, Andres Freund, Ashutosh Sharma, Nitin Jadhav, RKN Sai Krishna Discussion: https://postgr.es/m/CALj2ACUGUYXsEQdKhEdsBzhGEyF3xggvLdD8C0VT72TNEfOiog%40mail.gmail.com
* Fix typo in xlogrecovery.c code commentDaniel Gustafsson2022-04-07
| | | | | Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/CALj2ACUoPtnReT=yAQMcWLtcCpk7p83xjeA8tiRX8Q0_sjh8kw@mail.gmail.com
* Prefetch data referenced by the WAL, take II.Thomas Munro2022-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new GUC recovery_prefetch. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Since not all OSes have that system call, "try" is provided so that it can be enabled where available. Better mechanisms for asynchronous I/O are possible in later work. Set to "try" for now for test coverage. Default setting to be finalized before release. The GUC wal_decode_buffer_size limits the distance we can look ahead in bytes of decoded data. The existing GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os allowed, based on pessimistic heuristics used to infer that I/Os have begun and completed. We'll also not look more than maintenance_io_concurrency * 4 block references ahead. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (earlier version) Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (earlier version) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (earlier version) Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> (earlier version) Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> (earlier version) Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
* Fix warning introduced in 5c279a6d350.Jeff Davis2022-04-07
| | | | | | | | | Change two macros to be static inline functions instead to keep the data type consistent. This avoids a "comparison is always true" warning that was occurring with -Wtype-limits. In the process, change the names to look less like macros. Discussion: https://postgr.es/m/20220407063505.njnnrmbn4sxqfsts@alap3.anarazel.de
* Fix compilation with WAL_DEBUG.Andres Freund2022-04-06
| | | | | | Broke with 5c279a6d350. But looks like it had been half-broken since 70e81861fad, because 'rmid' didn't refer to the current record's rmid anymore, but to rmid from "Initialize resource managers" - a constant.
* Custom WAL Resource Managers.Jeff Davis2022-04-06
| | | | | | | | | | | | | Allow extensions to specify a new custom resource manager (rmgr), which allows specialized WAL. This is meant to be used by a Table Access Method or Index Access Method. Prior to this commit, only Generic WAL was available, which offers support for recovery and physical replication but not logical replication. Reviewed-by: Julien Rouhaud, Bharath Rupireddy, Andres Freund Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
* Add single-item cache when looking at topmost XID of a subtrans XIDMichael Paquier2022-04-07
| | | | | | | | | | | | This change affects SubTransGetTopmostTransaction(), used to find the topmost transaction ID of a given transaction ID. The cache is able to store one value, so as we can save the backend from unnecessary lookups at pg_subtrans/ on repetitive calls of this routine. There is a similar practice in transam.c, for example. Author: Simon Riggs Reviewed-by: Andrey Borodin, Julien Rouhaud Discussion: https://postgr.es/m/CANbhV-G8Co=yq4v4BkW7MJDqVt68K_8A48nAZ_+8UQS7LrwLEQ@mail.gmail.com
* pgstat: store statistics in shared memory.Andres Freund2022-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de