aboutsummaryrefslogtreecommitdiff
path: root/src/include/access
Commit message (Collapse)AuthorAge
* Change xl_hash_vacuum_one_page.ntuples from int to uint16.Amit Kapila2023-02-27
| | | | | | | | | | | This will create two bytes of padding space in xl_hash_vacuum_one_page which can be used for future patches. This makes the datatype of xl_hash_vacuum_one_page.ntuples same as gistxlogDelete.ntodelete which is advisable as both are used for the same purpose. Author: Bertrand Drouvot Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/b0e20c40-cb7a-fc1c-c607-2a78dac5021e@gmail.com
* Remove stray duplicated comment in heapam.hDavid Rowley2023-02-08
| | | | | | | This is just the same as what's written under the rs_numblocks field. Reported-by: Melanie Plageman Discussion: https://postgr.es/m/20230207204127.7vs6krqjqn5farr7@liskov
* Revert refactoring of restore command code to shell_restore.cMichael Paquier2023-02-06
| | | | | | | | | | | | | | | | | | | | | This reverts commits 24c35ec and 57169ad. PreRestoreCommand() and PostRestoreCommand() need to be put closer to the system() call calling a restore_command, as they enable in_restore_command for the startup process which would in turn trigger an immediate proc_exit() in the SIGTERM handler. Perhaps we could get rid of this behavior entirely, but 24c35ec has made the window where the flag is enabled much larger than it was, and any Postgres-like actions (palloc, etc.) taken by code paths while the flag is enabled could lead to more severe issues in the shutdown processing. Note that curculio has showed that there are much more problems in this area, unrelated to this change, actually, hence the issues related to that had better be addressed first. Keeping the code of HEAD in line with the stable branches should make that a bit easier. Per discussion with Andres Freund and Nathan Bossart. Discussion: https://postgr.es/m/Y979NR3U5VnWrTwB@paquier.xyz
* Further refactor of heapgettup and heapgettup_pagemodeDavid Rowley2023-02-03
| | | | | | | | | | | | | | | Backward and forward scans share much of the same page acquisition code. Here we consolidate that code to reduce some duplication. Additionally, add a new rs_coffset field to HeapScanDescData to track the offset of the current tuple. The new field fits nicely into the padding between a bool and BlockNumber field and saves having to look at the last returned tuple to figure out which offset we should be looking at for the current tuple. Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com
* Remove dead NoMovementScanDirection codeDavid Rowley2023-02-01
| | | | | | | | | | | | | | | | | | | | | | Here remove some dead code from heapgettup() and heapgettup_pagemode() which was trying to support NoMovementScanDirection scans. This code can never be reached as standard_ExecutorRun() never calls ExecutePlan with NoMovementScanDirection. Additionally, plans which were scanning an unordered index would use NoMovementScanDirection rather than ForwardScanDirection. There was no real need for this, so here we adjust this so we use ForwardScanDirection for unordered index scans. A comment in pathnodes.h claimed that NoMovementScanDirection was used for PathKey reasons, but if that was true, it no longer is, per code in build_index_paths(). This does change the non-text format of the EXPLAIN output so that unordered index scans now have a "Forward" scan direction rather than "NoMovement". The text format of EXPLAIN has not changed. Author: Melanie Plageman Reviewed-by: Tom Lane, David Rowley Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com
* Refactor code in charge of running shell-based recovery commandsMichael Paquier2023-01-16
| | | | | | | | | | | | | | | | | | | | | | | | The code specific to the execution of archive_cleanup_command, recovery_end_command and restore_command is moved to a new file named shell_restore.c. The code is split into three functions: - shell_restore(), that attempts the execution of a shell-based restore_command. - shell_archive_cleanup(), for archive_cleanup_command. - shell_recovery_end(), for recovery_end_command. This introduces no functional changes, with failure patterns and logs generated in consequence being the same as before (one case actually generates one less DEBUG2 message "could not restore" when a restore command succeeds but the follow-up stat() to check the size fails, but that only matters with a elevel high enough). This is preparatory work for allowing recovery modules, a facility similar to archive modules, with callbacks shaped similarly to the functions introduced here. Author: Nathan Bossart Reviewed-by: Andres Freund, Michael Paquier Discussion: https://postgr.es/m/20221227192449.GA3672473@nathanxps13
* Remove function declarations from headers for some undefined functionsMichael Paquier2023-01-11
| | | | | | | | The functions whose declarations are removed here have been removed in the past, but their respective headers forgot the call. Author: Justin Pryzby Discussion: https://postgr.es/m/20230110045722.GD9837@telsasoft.com
* New header varatt.h split off from postgres.hPeter Eisentraut2023-01-10
| | | | | | | | | This new header contains all the variable-length data types support (TOAST support) from postgres.h, which isn't needed by large parts of the backend code. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/ddcce239-0f29-6e62-4b47-1f8ca742addf%40enterprisedb.com
* Delay commit status checks until freezing executes.Peter Geoghegan2023-01-03
| | | | | | | | | | | | | | pg_xact lookups are relatively expensive. Move the xmin/xmax commit status checks from the point that freeze plans are prepared to the point that they're actually executed. Otherwise we'll repeat many commit status checks whenever multiple successive VACUUM operations scan the same pages and decide against freezing each time, which is a waste of cycles. Oversight in commit 1de58df4, which added page-level freezing. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkZpe4K6qMfEt8H4qYJCKc2R7TPvKsBva7jc9w7iGXQSw@mail.gmail.com
* Refine the definition of page-level freezing.Peter Geoghegan2023-01-03
| | | | | | | | | | | | | Improve comments added by commit 1de58df4 which describe the lazy_scan_prune "freeze the page" path. These newly revised comments are based on suggestions from Jeff Davis. In passing, remove nearby visibility_cutoff_xid comments left over from commit 6daeeb1f. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/ebc857107fe3edd422ef8a65191ca4a8da568b9b.camel@j-davis.com
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Add page-level freezing to VACUUM.Peter Geoghegan2022-12-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach VACUUM to decide on whether or not to trigger freezing at the level of whole heap pages. Individual XIDs and MXIDs fields from tuple headers now trigger freezing of whole pages, rather than independently triggering freezing of each individual tuple header field. Managing the cost of freezing over time now significantly influences when and how VACUUM freezes. The overall amount of WAL written is the single most important freezing related cost, in general. Freezing each page's tuples together in batch allows VACUUM to take full advantage of the freeze plan WAL deduplication optimization added by commit 9e540599. Also teach VACUUM to trigger page-level freezing whenever it detects that heap pruning generated an FPI. We'll have already written a large amount of WAL just to do that much, so it's very likely a good idea to get freezing out of the way for the page early. This only happens in cases where it will directly lead to marking the page all-frozen in the visibility map. In most cases "freezing a page" removes all XIDs < OldestXmin, and all MXIDs < OldestMxact. It doesn't quite work that way in certain rare cases involving MultiXacts, though. It is convenient to define "freeze the page" in a way that gives FreezeMultiXactId the leeway to put off the work of processing an individual tuple's xmax whenever it happens to be a MultiXactId that would require an expensive second pass to process aggressively (allocating a new multi is especially worth avoiding here). FreezeMultiXactId is eager when processing is cheap (as it usually is), and lazy in the event of an individual multi that happens to require expensive second pass processing. This avoids regressions related to processing of multis that page-level freezing might otherwise cause. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com
* Refactor how VACUUM passes around its XID cutoffs.Peter Geoghegan2022-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use a dedicated struct for the XID/MXID cutoffs used by VACUUM, such as FreezeLimit and OldestXmin. This state is initialized in vacuum.c, and then passed around by code from vacuumlazy.c to heapam.c freezing related routines. The new convention is that everybody works off of the same cutoff state, which is passed around via pointers to const. Also simplify some of the logic for dealing with frozen xmin in heap_prepare_freeze_tuple: add dedicated "xmin_already_frozen" state to clearly distinguish xmin XIDs that we're going to freeze from those that were already frozen from before. That way the routine's xmin handling code is symmetrical with the existing xmax handling code. This is preparation for an upcoming commit that will add page level freezing. Also refactor the control flow within FreezeMultiXactId(), while adding stricter sanity checks. We now test OldestXmin directly, instead of using FreezeLimit as an inexact proxy for OldestXmin. This is further preparation for the page level freezing work, which will make the function's caller cede control of page level freezing to the function where appropriate (where heap_prepare_freeze_tuple sees a tuple that happens to contain a MultiXactId in its xmax). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/CAH2-WznS9TxXmz2_=SY+SyJyDFbiOftKofM9=aDo68BbXNBUMA@mail.gmail.com
* Static assertions cleanupPeter Eisentraut2022-12-15
| | | | | | | | | | | | | | | | | | | | | Because we added StaticAssertStmt() first before StaticAssertDecl(), some uses as well as the instructions in c.h are now a bit backwards from the "native" way static assertions are meant to be used in C. This updates the guidance and moves some static assertions to better places. Specifically, since the addition of StaticAssertDecl(), we can put static assertions at the file level. This moves a number of static assertions out of function bodies, where they might have been stuck out of necessity, to perhaps better places at the file level or in header files. Also, when the static assertion appears in a position where a declaration is allowed, then using StaticAssertDecl() is more native than StaticAssertStmt(). Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/941a04e7-dd6f-c0e4-8cdf-a33b3338cbda%40enterprisedb.com
* Rethink handling of [Prevent|Is]InTransactionBlock in pipeline mode.Tom Lane2022-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commits f92944137 et al. made IsInTransactionBlock() set the XACT_FLAGS_NEEDIMMEDIATECOMMIT flag before returning "false", on the grounds that that kept its API promises equivalent to those of PreventInTransactionBlock(). This turns out to be a bad idea though, because it allows an ANALYZE in a pipelined series of commands to cause an immediate commit, which is unexpected. Furthermore, if we return "false" then we have another issue, which is that ANALYZE will decide it's allowed to do internal commit-and-start-transaction sequences, thus possibly unexpectedly committing the effects of previous commands in the pipeline. To fix the latter situation, invent another transaction state flag XACT_FLAGS_PIPELINING, which explicitly records the fact that we have executed some extended-protocol command and not yet seen a commit for it. Then, require that flag to not be set before allowing InTransactionBlock() to return "false". Having done that, we can remove its setting of NEEDIMMEDIATECOMMIT without fear of causing problems. This means that the API guarantees of IsInTransactionBlock now diverge from PreventInTransactionBlock, which is mildly annoying, but it seems OK given the very limited usage of IsInTransactionBlock. (In any case, a caller preferring the old behavior could always set NEEDIMMEDIATECOMMIT for itself.) For consistency also require XACT_FLAGS_PIPELINING to not be set in PreventInTransactionBlock. This too is meant to prevent commands such as CREATE DATABASE from silently committing previous commands in a pipeline. Per report from Peter Eisentraut. As before, back-patch to all supported branches (which sadly no longer includes v10). Discussion: https://postgr.es/m/65a899dd-aebc-f667-1d0a-abb89ff3abf8@enterprisedb.com
* Generalize ri_RootToPartitionMap to use for non-partition childrenAlvaro Herrera2022-12-02
| | | | | | | | | | | | | | | | | | | | | | | ri_RootToPartitionMap is currently only initialized for tuple routing target partitions, though a future commit will need the ability to use it even for the non-partition child tables, so make adjustments to the decouple it from the partitioning code. Also, make it lazily initialized via ExecGetRootToChildMap(), making that function its preferred access path. Existing third-party code accessing it directly should no longer do so; consequently, it's been renamed to ri_RootToChildMap, which also makes it consistent with ri_ChildToRootMap. ExecGetRootToChildMap() houses the logic of setting the map appropriately depending on whether a given child relation is partition or not. To support this, also add a separate entry point for TupleConversionMap creation that receives an AttrMap. No new code here, just split an existing function in two. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqEYUhDXSK5BTvG_xk=eaAEJCD4GS3C6uH7ybBvv+Z_Tmg@mail.gmail.com
* Add 'missing_ok' argument to build_attrmap_by_nameAlvaro Herrera2022-11-29
| | | | | | | | | | | | When it's given as true, return a 0 in the position of the missing column rather than raising an error. This is currently unused, but it allows us to reimplement column permission checking in a subsequent commit. It seems worth breaking into a separate commit because it affects unrelated code. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFfiai=qBxPDTjaio_ZcaqUKh+FC=prESrB8ogZgFNNNQ@mail.gmail.com
* Remove promote_trigger_file.Thomas Munro2022-11-29
| | | | | | | | | | | | | | | | | | | Previously, an idle startup (recovery) process would wake up every 5 seconds to have a chance to poll for promote_trigger_file, even if that GUC was not configured. That promotion triggering mechanism was effectively superseded by pg_ctl promote and pg_promote() a long time ago. There probably aren't many users left and it's very easy to change to the modern mechanisms, so we agreed to remove the feature. This is part of a campaign to reduce wakeups on idle systems. Author: Simon Riggs <simon.riggs@enterprisedb.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Ian Lawrence Barwick <barwick@gmail.com> Discussion: https://postgr.es/m/CANbhV-FsjnzVOQGBpQ589%3DnWuL1Ex0Ykn74Nh1hEjp2usZSR5g%40mail.gmail.com
* Speedup hash index builds by skipping needless binary searchesDavid Rowley2022-11-24
| | | | | | | | | | | | | | | | | When building hash indexes using the spool method, tuples are added to the index page in hashkey order. Because of this, we can safely skip performing the binary search on the existing tuples on the page to find the location to insert the tuple based on its hashkey value. For this case, we can just always put the tuple at the end of the item array as the tuples will always arrive in hashkey order. Testing has shown that this can improve hash index build speeds by 5-15% with a unique set of integer values. Author: Simon Riggs Reviewed-by: David Rowley Tested-by: David Zhang, Tomas Vondra Discussion: https://postgr.es/m/CANbhV-GBc5JoG0AneUGPZZW3o4OK5LjBGeKe_icpC3R1McrZWQ@mail.gmail.com
* Standardize rmgrdesc recovery conflict XID output.Peter Geoghegan2022-11-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Standardize on the name snapshotConflictHorizon for all XID fields from WAL records that generate recovery conflicts when in hot standby mode. This supersedes the previous latestRemovedXid naming convention. The new naming convention places emphasis on how the values are actually used by REDO routines. How the values are generated during original execution (details of which vary by record type) is deemphasized. Users of tools like pg_waldump can now grep for snapshotConflictHorizon to see all potential sources of recovery conflicts in a standardized way, without necessarily having to consider which specific record types might be involved. Also bring a couple of WAL record types that didn't follow any kind of naming convention into line. These are heapam's VISIBLE record type and SP-GiST's VACUUM_REDIRECT record type. Now every WAL record whose REDO routine calls ResolveRecoveryConflictWithSnapshot() passes through the snapshotConflictHorizon field from its WAL record. This is follow-up work to the refactoring from commit 9e540599 that made FREEZE_PAGE WAL records use a standard snapshotConflictHorizon style XID cutoff. No bump in XLOG_PAGE_MAGIC, since the underlying format of affected WAL records doesn't change. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wzm2CQUmViUq7Opgk=McVREHSOorYaAjR1ZpLYkRN7_dPw@mail.gmail.com
* Variable renaming in preparation for refactoringPeter Eisentraut2022-11-16
| | | | | | | | Rename page -> block and dp -> page where appropriate. The old naming mixed up block and page in confusing ways. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
* Turn HeapKeyTest macro into inline functionPeter Eisentraut2022-11-16
| | | | | | | | It is easier to read as a function. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
* Mark argument of RegisterCustomRmgr() as const.Jeff Davis2022-11-15
|
* Deduplicate freeze plans in freeze WAL records.Peter Geoghegan2022-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make heapam WAL records that describe freezing performed by VACUUM more space efficient by storing each distinct "freeze plan" once, alongside an array of associated page offset numbers (one per freeze plan). The freeze plans required for most heap pages tend to naturally have a great deal of redundancy, so this technique is very effective in practice. It often leads to freeze WAL records that are less than 20% of the size of equivalent WAL records generated using the previous approach. The freeze plan concept was introduced by commit 3b97e6823b, which fixed bugs in VACUUM's handling of MultiXacts. We retain the concept of freeze plans, but go back to using page offset number arrays. There is no loss of generality here because deduplication is an additive process that gets applied mechanically when FREEZE_PAGE WAL records are built. More than anything else, freeze plan deduplication is an optimization that reduces the marginal cost of freezing additional tuples on pages that will need to have at least one or two tuples frozen in any case. Ongoing work that adds page-level freezing to VACUUM will take full advantage of the improved cost profile through batching. Also refactor some of the details surrounding recovery conflicts needed to REDO freeze records in passing: make original execution responsible for generating a standard latestRemovedXid cutoff, rather than working backwards to get the same cutoff in the REDO routine. Bugfix commit 66fbcb0d2e did it the other way around, which is equivalent but obscures what's going on. Also rename the cutoff field from the WAL record/struct (rename the field cutoff_xid to latestRemovedXid to match similar WAL records). Processing of conflicts by REDO routines is already completely uniform, so tools like pg_waldump should present the information driving the process uniformly. There are two remaining WAL record types that still don't quite follow this convention (heapam's VISIBLE record type and SP-GiST's VACUUM_REDIRECT record type). They can be brought into line by later work that totally standardizes how the cutoffs are presented. Bump XLOG_PAGE_MAGIC. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Nathan Bossart <nathandbossart@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CAH2-Wz=XytErMnb8FAyFd+OQEbiipB0Q2FmFdXrggPL4VBnRYQ@mail.gmail.com
* Remove redundant declaration for XidInMVCCSnapshotAlvaro Herrera2022-11-09
| | | | | | | | This was added for no good reason by c91560defc57, after b7eda3e0e334 had just moved the prototype from utils/tqual.h to utils/snapmgr.h. Author: Japin Li <japinli@hotmail.com> Discussion: https://postgr.es/m/MEYP282MB16693A409F3282A9DB287BADB63E9@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
* Reduce xlog.h inclusion footprintAlvaro Herrera2022-10-12
| | | | | | | | | | This file needs xlogreader.h only for the XLogReaderState typedef; but we can dodge that by forward-declaring it. Many files use xlog.h for reasons other than reading WAL, and it's not good to force all those files to include xlogreader.h, so take it out. Surprisingly, there is no fallout in core code from making this change. Perhaps external code will have to start including xlogreader.h.
* Restore pg_pread and friends.Thomas Munro2022-09-29
| | | | | | | | | | | | | | | | | | | | Commits cf112c12 and a0dc8271 were a little too hasty in getting rid of the pg_ prefixes where we use pread(), pwrite() and vectored variants. We dropped support for ancient Unixes where we needed to use lseek() to implement replacements for those, but it turns out that Windows also changes the current position even when you pass in an offset to ReadFile() and WriteFile() if the file handle is synchronous, despite its documentation saying otherwise. Switching to asynchronous file handles would fix that, but have other complications. For now let's just put back the pg_ prefix and add some comments to highlight the non-standard side-effect, which we can now describe as Windows-only. Reported-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/20220923202439.GA1156054%40nathanxps13
* Revert 56-bit relfilenode change and follow-up commits.Robert Haas2022-09-28
| | | | | | | | There are still some alignment-related failures in the buildfarm, which might or might not be able to be fixed quickly, but I've also just realized that it increased the size of many WAL records by 4 bytes because a block reference contains a RelFileLocator. The effect of that hasn't been studied or discussed, so revert for now.
* Convert *GetDatum() and DatumGet*() macros to inline functionsPeter Eisentraut2022-09-27
| | | | | | | | | | | | | | The previous macro implementations just cast the argument to a target type but did not check whether the input type was appropriate. The function implementation can do better type checking of the input type. For the *GetDatumFast() macros, converting to an inline function doesn't work in the !USE_FLOAT8_BYVAL case, but we can use AssertVariableIsOfTypeMacro() to get a similar level of type checking. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
* Increase width of RelFileNumbers from 32 bits to 56 bits.Robert Haas2022-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | RelFileNumbers are now assigned using a separate counter, instead of being assigned from the OID counter. This counter never wraps around: if all 2^56 possible RelFileNumbers are used, an internal error occurs. As the cluster is limited to 2^64 total bytes of WAL, this limitation should not cause a problem in practice. If the counter were 64 bits wide rather than 56 bits wide, we would need to increase the width of the BufferTag, which might adversely impact buffer lookup performance. Also, this lets us use bigint for pg_class.relfilenode and other places where these values are exposed at the SQL level without worrying about overflow. This should remove the need to keep "tombstone" files around until the next checkpoint when relations are removed. We do that to keep RelFileNumbers from being recycled, but now that won't happen anyway. However, this patch doesn't actually change anything in this area; it just makes it possible for a future patch to do so. Dilip Kumar, based on an idea from Andres Freund, who also reviewed some earlier versions of the patch. Further review and some wordsmithing by me. Also reviewed at various points by Ashutosh Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane. Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
* Mark ParallelMessagePending as sig_atomic_tMichael Paquier2022-09-27
| | | | | | | | | ParallelMessagePending was previously marked as a boolean which should be fine on modern platforms, but the C standard recommends the use of sig_atomic_t for variables manipulated in signal handlers. Author: Hayato Kuroda Discussion: https://postgr.es/m/TYAPR01MB58667C15A95A234720F4F876F5529@TYAPR01MB5866.jpnprd01.prod.outlook.com
* Remove dependency to StringInfo in xlogbackup.{c.h}Michael Paquier2022-09-27
| | | | | | | | | | This was used as the returned result type of the generated contents for the backup_label and backup history files. This is replaced by a simple string, reducing the cleanup burden of all the callers of build_backup_content(). Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/YzERvNPaZivHEKZJ@paquier.xyz
* Refactor creation of backup_label and backup history filesMichael Paquier2022-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | This change simplifies some of the logic related to the generation and creation of the backup_label and backup history files, which has become unnecessarily complicated since the removal of the exclusive backup mode in commit 39969e2. The code was previously generating the contents of these files as a string (start phase for the backup_label and stop phase for the backup history file), one problem being that the contents of the backup_label string were scanned to grab some of its internal contents at the stop phase. This commit changes the logic so as we store the data required to build these files in an intermediate structure named BackupState. The backup_label file and backup history file strings are generated when they are ready to be sent back to the client. Both files are now generated with the same code path. While on it, this commit renames some variables for clarity. Two new files named xlogbackup.{c,h} are introduced in this commit, to remove from xlog.c some of the logic around base backups. Note that more could be moved to this new set of files. Author: Bharath Rupireddy, Michael Paquier Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CALj2ACXWwTDgJqCjdaPyfR7djwm6SrybGcrZyrvojzcsmt4FFw@mail.gmail.com
* Harmonize parameter names in storage and AM code.Peter Geoghegan2022-09-19
| | | | | | | | | | | | | | | Make sure that function declarations use names that exactly match the corresponding names from function definitions in storage, catalog, access method, executor, and logical replication code, as well as in miscellaneous utility/library code. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
* Harmonize heapam and tableam parameter names.Peter Geoghegan2022-09-19
| | | | | | | | | | | | | | | | Make sure that function declarations use names that exactly match the corresponding names from function definitions. Having parameter names that are reliably consistent in this way will make it easier to reason about groups of related C functions from the same translation unit as a module. It will also make certain refactoring tasks easier. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
* Split up guc.c for better build speed and ease of maintenance.Tom Lane2022-09-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | guc.c has grown to be one of our largest .c files, making it a bottleneck for compilation. It's also acquired a bunch of knowledge that'd be better kept elsewhere, because of our not very good habit of putting variable-specific check hooks here. Hence, split it up along these lines: * guc.c itself retains just the core GUC housekeeping mechanisms. * New file guc_funcs.c contains the SET/SHOW interfaces and some SQL-accessible functions for GUC manipulation. * New file guc_tables.c contains the data arrays that define the built-in GUC variables, along with some already-exported constant tables. * GUC check/assign/show hook functions are moved to the variable's home module, whenever that's clearly identifiable. A few hard- to-classify hooks ended up in commands/variable.c, which was already a home for miscellaneous GUC hook functions. To avoid cluttering a lot more header files with #include "guc.h", I also invented a new header file utils/guc_hooks.h and put all the GUC hook functions' declarations there, regardless of their originating module. That allowed removal of #include "guc.h" from some existing headers. The fallout from that (hopefully all caught here) demonstrates clearly why such inclusions are best minimized: there are a lot of files that, for example, were getting array.h at two or more levels of remove, despite not having any connection at all to GUCs in themselves. There is some very minor code beautification here, such as renaming a couple of inconsistently-named hook functions and improving some comments. But mostly this just moves code from point A to point B and deals with the ensuing needs for #include adjustments and exporting a few functions that previously weren't exported. Patch by me, per a suggestion from Andres Freund; thanks also to Michael Paquier for the idea to invent guc_funcs.c. Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us
* Revert "Convert *GetDatum() and DatumGet*() macros to inline functions"Peter Eisentraut2022-09-12
| | | | | | This reverts commit 595836e99bf1ee6d43405b885fb69bb8c6d3ee23. It has problems when USE_FLOAT8_BYVAL is off.
* Convert *GetDatum() and DatumGet*() macros to inline functionsPeter Eisentraut2022-09-12
| | | | | | | | | The previous macro implementations just cast the argument to a target type but did not check whether the input type was appropriate. The function implementation can do better type checking of the input type. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
* Fix recovery_prefetch with low maintenance_io_concurrency.Thomas Munro2022-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | We should process completed IOs *before* trying to start more, so that it is always possible to decode one more record when the decoded record queue is empty, even if maintenance_io_concurrency is set so low that a single earlier WAL record might have saturated the IO queue. That bug was hidden because the effect of maintenance_io_concurrency was arbitrarily clamped to be at least 2. Fix the ordering, and also remove that clamp. We need a special case for 0, which is now treated the same as recovery_prefetch=off, but otherwise the number is used directly. This allows for testing with 1, which would have made the problem obvious in simple test scenarios. Also add an explicit error message for missing contrecords. It was a bit strange that we didn't report an error already, and became a latent bug with prefetching, since the internal state that tracks aborted contrecords would not survive retrying, as revealed by 026_overwrite_contrecord.pl with this adjustment. Reporting an error prevents that. Back-patch to 15. Reported-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20220831140128.GS31833%40telsasoft.com
* Fix cache invalidation bug in recovery_prefetch.Thomas Munro2022-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XLogPageRead() can retry internally after a pread() system call has succeeded, in the case of short reads, and page validation failures while in standby mode (see commit 0668719801). Due to an oversight in commit 3f1ce973, these cases could leave stale data in the internal cache of xlogreader.c without marking it invalid. The main defense against stale cached data on failure to read a page was in the error handling path of the calling function ReadPageInternal(), but that wasn't quite enough for errors handled internally by XLogPageRead()'s retry loop if we then exited with XLREAD_WOULDBLOCK. 1. ReadPageInternal() now marks the cache invalid before calling the page_read callback, by setting state->readLen to 0. It'll be set to a non-zero value only after a successful read. It'll stay valid as long as the caller requests data in the cached range. 2. XLogPageRead() no long performs internal retries while reading ahead. While such retries should work, the general philosophy is that we should give up prefetching if anything unusual happens so we can handle it when recovery catches up, to reduce the complexity of the system. Let's do that here too. 3. While here, a new function XLogReaderResetError() improves the separation between xlogrecovery.c and xlogreader.c, where the former previously clobbered the latter's internal error buffer directly. The new function makes this more explicit, and also clears a related flag, without which a standby would needlessly retry in the outer function. Thanks to Noah Misch for tracking down the conditions required for a rare build farm failure in src/bin/pg_ctl/t/003_promote.pl, and providing a reproducer. Back-patch to 15. Reported-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20220807003627.GA4168930%40rfd.leadboat.com
* Update the comment in rmgrlist.h to match it to the code.Amit Kapila2022-08-30
| | | | | | Author: Hayato Kuroda Reviwed-by: Amit Kapila Discussion: https://postgr.es/m/TYAPR01MB58665F20F412EDF27B0759CFF5769@TYAPR01MB5866.jpnprd01.prod.outlook.com
* Add missing parenthesis to max item size macro.Peter Geoghegan2022-08-05
| | | | Oversight in commit 92f37505, per buildfarm.
* Remove configure probe for fdatasync.Thomas Munro2022-08-05
| | | | | | | | | | | | | | | | | fdatasync() is in SUSv2, and all targeted Unix systems have it. We have a replacement function for Windows. We retain the probe for the function declaration, which allows us to supply the mysteriously missing declaration for macOS, and also for Windows. No need to keep a HAVE_FDATASYNC macro around. Also rename src/port/fdatasync.c to win32fdatasync.c since it's only for Windows. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
* Fix nbtree maximum item size macro.Peter Geoghegan2022-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit dd299df8189, which made heap TID a tiebreaker nbtree index column, introduced new rules on page space management to make suffix truncation safe for v4+ indexes. New pivot tuples (generated by suffix truncation during leaf page splits) sometimes require dedicated extra space at the end of a new leaf page high key/pivot to store a heap TID using a special representation (a representation only used in pivot tuples). The definition of "1/3 of a page" was reduced by a single MAXALIGN() quantum for v4 indexes to make sure that the final enlarged pivot tuple always fit, even with a split point whose firstright tuple happened to already be at the "1/3 of a page" limit (limit for non-pivot tuples). Internal pages (which only contain pivot tuples) stuck with the original "1/3 of a page" definition. This scheme made it impossible for any page split to fail to free enough space for its newitem, which is never okay. The macro that determines whether non-pivot tuples exceed their "1/3 of a leaf page" restriction was structured as if space was needed for all three tuples during a leaf page split (the new pivot plus two very large adjoining non-pivots that are separated by the split). This was subtly wrong, in that it accidentally relied on implementation details that could (at least in theory) change in the future. To fix, make the macro subtract a single MAXALIGN() quantum, once. The macro evaluates to exactly the same value as before in practice. But it no longer depends on the current layout of nbtree's special area struct. No backpatch, since this isn't a live bug. Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Robert Haas <robertmhaas@gmail.com> Diagnosed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CA+Tgmoa7UBxivM7f6Ocx_qbq4=ky3uXc+WZNOBcVX+kvJvWOEA@mail.gmail.com
* Remove dead pread and pwrite replacement code.Thomas Munro2022-08-05
| | | | | | | | | | | | | | | | | | | | | | | pread() and pwrite() are in SUSv2, and all targeted Unix systems have them. Previously, we defined pg_pread and pg_pwrite to emulate these function with lseek() on old Unixen. The names with a pg_ prefix were a reminder of a portability hazard: they might change the current file position. That hazard is gone, so we can drop the prefixes. Since the remaining replacement code is Windows-only, move it into src/port/win32p{read,write}.c, and move the declarations into src/include/port/win32_port.h. No need for vestigial HAVE_PREAD, HAVE_PWRITE macros as they were only used for declarations in port.h which have now moved into win32_port.h. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
* Fix inconsistent comments for some function declarations in headersMichael Paquier2022-08-04
| | | | | | | | | Some of the headers list a couple of function prototypes located in a different file than what is referred to. This fixes a couple of places where this inconsistency exists. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs4__RdcSNXPa7L62Ozvo_Q4LvT60o3Bnp8yrQ_m9y5CKvg@mail.gmail.com
* Add overflow protection for block-related data in WAL recordsMichael Paquier2022-07-27
| | | | | | | | | | | | | | | | | | | | | | | | | | XLogRecordBlockHeader, the header holding the information for the data related to a block, tracks the length of the data appended to the WAL record with data_length (uint16). This limitation in size was not enforced by the public routine in charge of registering the data assembled later to form the WAL record inserted, XLogRegisterBufData(). Incorrectly used, it could lead to the generation of records with some of its data overflowed. This commit adds some safeguards to prevent that for the block data, complaining immediately if attempting to add to a record block information with a size larger than UINT16_MAX, which is the limit implied by the internal logic. Note that this also adjusts XLogRegisterData() and XLogRegisterBufData() so as the length of the WAL record data given by the caller is unsigned, matching with what gets stored in XLogRecData->len. Extracted from a larger patch by the same author. The original patch includes more protections when assembling a record in full that will be looked at separately later. Author: Matthias van de Meent Reviewed-by: Andres Freund, Heikki Linnakangas, Michael Paquier, David Zhang Discussion: https://postgr.es/m/CAEze2WgGiw+LZt+vHf8tWqB_6VxeLsMeoAuod0N=ij1q17n5pw@mail.gmail.com
* Force immediate commit after CREATE DATABASE etc in extended protocol.Tom Lane2022-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a few commands that "can't run in a transaction block", meaning that if they complete their processing but then we fail to COMMIT, we'll be left with inconsistent on-disk state. However, the existing defenses for this are only watertight for simple query protocol. In extended protocol, we didn't commit until receiving a Sync message. Since the client is allowed to issue another command instead of Sync, we're in trouble if that command fails or is an explicit ROLLBACK. In any case, sitting in an inconsistent state while waiting for a client message that might not come seems pretty risky. This case wasn't reachable via libpq before we introduced pipeline mode, but it's always been an intended aspect of extended query protocol, and likely there are other clients that could reach it before. To fix, set a flag in PreventInTransactionBlock that tells exec_execute_message to force an immediate commit. This seems to be the approach that does least damage to existing working cases while still preventing the undesirable outcomes. While here, add some documentation to protocol.sgml that explicitly says how to use pipelining. That's latent in the existing docs if you know what to look for, but it's better to spell it out; and it provides a place to document this new behavior. Per bug #17434 from Yugo Nagata. It's been wrong for ages, so back-patch to all supported branches. Discussion: https://postgr.es/m/17434-d9f7a064ce2a88a3@postgresql.org
* Remove O_FSYNC and associated macros.Thomas Munro2022-07-22
| | | | | | | | | | | | | | | | | | O_FSYNC was a pre-POSIX way of spelling O_SYNC, supported since commit 9d645fd84c3 for non-conforming operating systems of the time. It's not needed on any modern system. We can just use standard O_SYNC directly if it exists (= all targeted systems except Windows), and get rid of our OPEN_SYNC_FLAG macro. Similarly for standard O_DSYNC, we can just use that directly if it exists (= all targeted systems except DragonFlyBSD), and get rid of our OPEN_DATASYNC_FLAG macro. We still avoid choosing open_datasync as a default value for wal_sync_method if O_DSYNC has the same value as O_SYNC (= only OpenBSD), so there is no change in default behavior. Discussion: https://postgr.es/m/CA%2BhUKGJE7y92NY7FG2ftUbZUaqohBU65_Ys_7xF5mUHo4wirTQ%40mail.gmail.com
* Convert macros to static inline functions (itup.h)Peter Eisentraut2022-07-19
| | | | | Reviewed-by: Amul Sul <sulamul@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/5b558da8-99fb-0a99-83dd-f72f05388517%40enterprisedb.com