aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
...
* Make lazy_vacuum_heap_rel match lazy_scan_heap.Peter Geoghegan2023-01-11
| | | | | | | | | | | Make lazy_vacuum_heap_rel variable names match those from lazy_scan_heap where that makes sense. Extracted from a larger patch to deal with issues with how vacuumlazy.c sets pages all-frozen. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznuNGSzF8v6OsgjaC5aYsb3cZ6HW6MLm30X0d65cmSH6A@mail.gmail.com
* vacuumlazy.c: Tweak local variable name.Peter Geoghegan2023-01-11
| | | | | | | | | | Make a local variable name consistent with the name from its WAL record. Extracted from a larger patch to deal with issues with how vacuumlazy.c sets pages all-frozen. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznuNGSzF8v6OsgjaC5aYsb3cZ6HW6MLm30X0d65cmSH6A@mail.gmail.com
* Rename and relocate freeze plan dedup routines.Peter Geoghegan2023-01-11
| | | | | | | | | | | | | | | Rename the heapam.c freeze plan deduplication routines added by commit 9e540599 to names that follow conventions for functions in heapam.c. Also relocate the functions so that they're next to their caller, which runs during original execution, when FREEZE_PAGE WAL records are built. The routines were initially placed next to (and followed the naming conventions of) conceptually related REDO routine code, but that scheme turned out to be kind of jarring when considered in a wider context. Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20230109214308.icz26oqvt3k2274c@awork3.anarazel.de
* Improve TransactionIdDidAbort() documentation.Peter Geoghegan2023-01-11
| | | | | | | | | | | | | | Document that TransactionIdDidAbort() won't indicate that transactions that were in-progress during a crash have aborted. Tie this to existing discussion of the TransactionIdDidCommit() and TransactionIdDidCommit() protocol that code in heapam_visibility.c (and a few other places) must observe. Follow-up to bugfix commit eb5ad4ff. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wzn4bEEqgmaUQL3aJ73yM9gAeK-wE4ngi7kjRjLztb+P0w@mail.gmail.com
* Common function for percent placeholder replacementPeter Eisentraut2023-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are a number of places where a shell command is constructed with percent-placeholders (like %x). It's cumbersome to have to open-code this several times. This factors out this logic into a separate function. This also allows us to ensure consistency for and document some subtle behaviors, such as what to do with unrecognized placeholders. The unified handling is now that incorrect and unknown placeholders are an error, where previously in most cases they were skipped or ignored. This affects the following settings: - archive_cleanup_command - archive_command - recovery_end_command - restore_command - ssl_passphrase_command The following settings are part of this refactoring but already had stricter error handling and should be unchanged in their behavior: - basebackup_to_shell.command Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/5238bbed-0b01-83a6-d4b2-7eb0562a054e%40enterprisedb.com
* Fix typos in code and commentsMichael Paquier2023-01-11
| | | | | Author: Justin Pryzby Discussion: https://postgr.es/m/20230110045722.GD9837@telsasoft.com
* New header varatt.h split off from postgres.hPeter Eisentraut2023-01-10
| | | | | | | | | This new header contains all the variable-length data types support (TOAST support) from postgres.h, which isn't needed by large parts of the backend code. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/ddcce239-0f29-6e62-4b47-1f8ca742addf%40enterprisedb.com
* Perform apply of large transactions by parallel workers.Amit Kapila2023-01-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, for large transactions, the publisher sends the data in multiple streams (changes divided into chunks depending upon logical_decoding_work_mem), and then on the subscriber-side, the apply worker writes the changes into temporary files and once it receives the commit, it reads from those files and applies the entire transaction. To improve the performance of such transactions, we can instead allow them to be applied via parallel workers. In this approach, we assign a new parallel apply worker (if available) as soon as the xact's first stream is received and the leader apply worker will send changes to this new worker via shared memory. The parallel apply worker will directly apply the change instead of writing it to temporary files. However, if the leader apply worker times out while attempting to send a message to the parallel apply worker, it will switch to "partial serialize" mode - in this mode, the leader serializes all remaining changes to a file and notifies the parallel apply workers to read and apply them at the end of the transaction. We use a non-blocking way to send the messages from the leader apply worker to the parallel apply to avoid deadlocks. We keep this parallel apply assigned till the transaction commit is received and also wait for the worker to finish at commit. This preserves commit ordering and avoid writing to and reading from files in most cases. We still need to spill if there is no worker available. This patch also extends the SUBSCRIPTION 'streaming' parameter so that the user can control whether to apply the streaming transaction in a parallel apply worker or spill the change to disk. The user can set the streaming parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means the streaming will be applied via a parallel apply worker, if available. The parameter value 'on' means the streaming transaction will be spilled to disk. The default value is 'off' (same as current behaviour). In addition, the patch extends the logical replication STREAM_ABORT message so that abort_lsn and abort_time can also be sent which can be used to update the replication origin in parallel apply worker when the streaming transaction is aborted. Because this message extension is needed to support parallel streaming, parallel streaming is not supported for publications on servers < PG16. Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
* Wake up a subscription's replication worker processes after DDL.Tom Lane2023-01-06
| | | | | | | | | | | | | | | | Waken related worker processes immediately at commit of a transaction that has performed ALTER SUBSCRIPTION (including the RENAME and OWNER variants). This reduces the response time for such operations. In the real world that might not be worth much, but it shaves several seconds off the runtime for the subscription test suite. In the case of PREPARE, we just throw away this notification state; it doesn't seem worth the work to preserve it. The workers will still react after the eventual COMMIT PREPARED, but not as quickly. Nathan Bossart Discussion: https://postgr.es/m/20221122004119.GA132961@nathanxps13
* Check that xmax didn't commit in freeze check.Peter Geoghegan2023-01-03
| | | | | | | | | | | | | | We cannot rely on TransactionIdDidAbort here, since in general it may report transactions that were in-progress at the time of an earlier hard crash as not aborted, effectively behaving as if they were still in progress even after crash recovery completes. Go back to defensively verifying that xmax didn't commit instead. Oversight in commit 79d4bf4e. Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20230104035636.hy5djyr2as4gbc4q@awork3.anarazel.de
* Update obsolete multixact.c comments.Peter Geoghegan2023-01-03
| | | | | | | | | | | | | | Commit 4f627f89 switched SLRU truncation for multixacts back to being a task performed during VACUUM, but missed some comments that continued to reference truncation happening as part of checkpointing. Update those comments now. Also update comments that became obsolete when commit c3ffa731 changed the way that vacuum_multixact_freeze_min_age is applied by VACUUM as it computes its MultiXactCutoff cutoff (which is used by VACUUM to decide what to freeze). Explain the same issues by referencing how OldestMxact is the latest valid value that relminmxid can ever be advanced to at the end of a VACUUM (following the work in commit 0b018fab).
* vacuumlazy.c: Save get_database_name() in vacrel.Peter Geoghegan2023-01-03
| | | | | | | | This brings dbname strings in line with namespace and relation name strings. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkQ1TKU-DdNvnGeL870di3+CU1UTo-7nw7xFDpVE-XGjA@mail.gmail.com
* Delay commit status checks until freezing executes.Peter Geoghegan2023-01-03
| | | | | | | | | | | | | | pg_xact lookups are relatively expensive. Move the xmin/xmax commit status checks from the point that freeze plans are prepared to the point that they're actually executed. Otherwise we'll repeat many commit status checks whenever multiple successive VACUUM operations scan the same pages and decide against freezing each time, which is a waste of cycles. Oversight in commit 1de58df4, which added page-level freezing. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkZpe4K6qMfEt8H4qYJCKc2R7TPvKsBva7jc9w7iGXQSw@mail.gmail.com
* Refine the definition of page-level freezing.Peter Geoghegan2023-01-03
| | | | | | | | | | | | | Improve comments added by commit 1de58df4 which describe the lazy_scan_prune "freeze the page" path. These newly revised comments are based on suggestions from Jeff Davis. In passing, remove nearby visibility_cutoff_xid comments left over from commit 6daeeb1f. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/ebc857107fe3edd422ef8a65191ca4a8da568b9b.camel@j-davis.com
* Fix typos in comments, code and documentationMichael Paquier2023-01-03
| | | | | | | | | | While on it, newlines are removed from the end of two elog() strings. The others are simple grammar mistakes. One comment in pg_upgrade referred incorrectly to sequences since a7e5457. Author: Justin Pryzby Discussion: https://postgr.es/m/20221230231257.GI1153@telsasoft.com Backpatch-through: 11
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Adjust VACUUM hastup LP_REDIRECT comments.Peter Geoghegan2023-01-02
| | | | | | | The term "truncation" has been ambiguous since commit 10a8d13823 added line pointer array truncation during heap pruning. Clear things up by specifying that we're talking about rel truncation here, to match nearby comments that apply to tuples with storage.
* Avoid special XID snapshotConflictHorizon values.Peter Geoghegan2023-01-02
| | | | | | | | | | | | | Don't allow VACUUM to WAL-log the value FrozenTransactionId as the snapshotConflictHorizon of freezing or visibility map related WAL records. The only special XID value that's an allowable snapshotConflictHorizon is InvalidTransactionId, which is interpreted as "record definitely doesn't require a recovery conflict". Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznuNGSzF8v6OsgjaC5aYsb3cZ6HW6MLm30X0d65cmSH6A@mail.gmail.com
* Push lpp variable closer to usage in heapgetpage()Peter Eisentraut2023-01-02
| | | | | Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
* Fix assert in BRIN build_distancesTomas Vondra2022-12-30
| | | | | | | | | | | | | | | | | | | | | When brin_minmax_multi_union merges summaries, we may end up with just a single range after merge_overlapping_ranges. The summaries may contain just one range each, and they may overlap (or be exactly the same). With a single range there's no distance to calculate, but we happen to call build_distances anyway - which is fine, we don't calculate the distance in this case, except that with asserts this failed due to a check there are at least two ranges. The assert is unnecessarily strict, so relax it a bit and bail out if there's just a single range. The relaxed assert would be enough, but this way we don't allocate unnecessary memory for distance. Backpatch to 14, where minmax-multi opclasses were introduced. Reported-by: Jaime Casanova Backpatch-through: 14 Discussion: https://postgr.es/m/YzVA55qS0hgz8P3r@ahch-to
* Add const to BufFileWritePeter Eisentraut2022-12-30
| | | | | | | | Make data buffer argument to BufFileWrite a const pointer and bubble this up to various callers and related APIs. This makes the APIs clearer and more consistent. Discussion: https://www.postgresql.org/message-id/flat/11dda853-bb5b-59ba-a746-e168b1ce4bdb%40enterprisedb.com
* Add page-level freezing to VACUUM.Peter Geoghegan2022-12-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach VACUUM to decide on whether or not to trigger freezing at the level of whole heap pages. Individual XIDs and MXIDs fields from tuple headers now trigger freezing of whole pages, rather than independently triggering freezing of each individual tuple header field. Managing the cost of freezing over time now significantly influences when and how VACUUM freezes. The overall amount of WAL written is the single most important freezing related cost, in general. Freezing each page's tuples together in batch allows VACUUM to take full advantage of the freeze plan WAL deduplication optimization added by commit 9e540599. Also teach VACUUM to trigger page-level freezing whenever it detects that heap pruning generated an FPI. We'll have already written a large amount of WAL just to do that much, so it's very likely a good idea to get freezing out of the way for the page early. This only happens in cases where it will directly lead to marking the page all-frozen in the visibility map. In most cases "freezing a page" removes all XIDs < OldestXmin, and all MXIDs < OldestMxact. It doesn't quite work that way in certain rare cases involving MultiXacts, though. It is convenient to define "freeze the page" in a way that gives FreezeMultiXactId the leeway to put off the work of processing an individual tuple's xmax whenever it happens to be a MultiXactId that would require an expensive second pass to process aggressively (allocating a new multi is especially worth avoiding here). FreezeMultiXactId is eager when processing is cheap (as it usually is), and lazy in the event of an individual multi that happens to require expensive second pass processing. This avoids regressions related to processing of multis that page-level freezing might otherwise cause. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com
* Remove overzealous MultiXact freeze assertion.Peter Geoghegan2022-12-26
| | | | | | | | | | | | | When VACUUM determines that an existing MultiXact should use a freeze plan that sets xmax to InvalidTransactionId, the original Multi may or may not be before OldestMxact. Remove an incorrect assertion that expected it to always be from before OldestMxact. Oversight in commit 4ce3af. Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/TYAPR01MB5866B24104FD80B5D7E65C3EF5ED9@TYAPR01MB5866.jpnprd01.prod.outlook.com
* Rename pg_dissect_walfile_name() to pg_split_walfile_name()Michael Paquier2022-12-23
| | | | | | | | | | | | | | | The former name was discussed as being confusing, so use "split", as per a suggestion from Magnus Hagander. While on it, one of the output arguments is renamed from "segno" to "segment_number", as per a suggestion from Kyotaro Horiguchi. The documentation is updated to reflect all these changes. Bump catalog version. Author: Bharath Rupireddy, Michael Paquier Discussion: https://postgr.es/m/CABUevEytQVaOOhGdoh0D7hGwe3fuKcRF6NthsSW7ww04EmtFgQ@mail.gmail.com
* Use scanned_pages to decide when to failsafe check.Peter Geoghegan2022-12-22
| | | | | | | | | | | | | | | | Perform a failsafe check every time VACUUM's first heap scan scans a further FAILSAFE_EVERY_PAGES pages, rather than using an approach based on the number of physical blocks that our current blkno is from the blkno at the time of the previous failsafe check. That way VACUUM will perform a failsafe check every time it has scanned a uniform number of pages, without it mattering when or how VACUUM skipped pages using the visibility map. Sami Imseih, with changes to FAILSAFE_EVERY_PAGES comments added by me. Author: Sami Imseih <simseih@amazon.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/401CE010-4049-4B94-9961-0B610A5D254D%40amazon.com
* Refactor how VACUUM passes around its XID cutoffs.Peter Geoghegan2022-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use a dedicated struct for the XID/MXID cutoffs used by VACUUM, such as FreezeLimit and OldestXmin. This state is initialized in vacuum.c, and then passed around by code from vacuumlazy.c to heapam.c freezing related routines. The new convention is that everybody works off of the same cutoff state, which is passed around via pointers to const. Also simplify some of the logic for dealing with frozen xmin in heap_prepare_freeze_tuple: add dedicated "xmin_already_frozen" state to clearly distinguish xmin XIDs that we're going to freeze from those that were already frozen from before. That way the routine's xmin handling code is symmetrical with the existing xmax handling code. This is preparation for an upcoming commit that will add page level freezing. Also refactor the control flow within FreezeMultiXactId(), while adding stricter sanity checks. We now test OldestXmin directly, instead of using FreezeLimit as an inexact proxy for OldestXmin. This is further preparation for the page level freezing work, which will make the function's caller cede control of page level freezing to the function where appropriate (where heap_prepare_freeze_tuple sees a tuple that happens to contain a MultiXactId in its xmax). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/CAH2-WznS9TxXmz2_=SY+SyJyDFbiOftKofM9=aDo68BbXNBUMA@mail.gmail.com
* Switch some system functions to use get_call_result_type()Michael Paquier2022-12-21
| | | | | | | | | | | | | | | | | This shaves some code by replacing the combinations of CreateTemplateTupleDesc()/TupleDescInitEntry() hardcoding a mapping of the attributes listed in pg_proc.dat by get_call_result_type() to build the TupleDesc needed for the rows generated. get_call_result_type() is more expensive than the former style, but this removes some duplication with the lists of OUT parameters (pg_proc.dat and the attributes hardcoded in these code paths). This is applied to functions that are not considered as critical (aka that could be called repeatedly for monitoring purposes). Author: Bharath Rupireddy Reviewed-by: Robert Haas, Álvaro Herrera, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CALj2ACV23HW5HP5hFjd89FNS-z5X8r2jNXdMXcpN2BgTtKd87w@mail.gmail.com
* Add copyright notices to meson filesAndrew Dunstan2022-12-20
| | | | Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net
* Add pg_dissect_walfile_name()Michael Paquier2022-12-20
| | | | | | | | | | | | | | | | | | | This function takes in input a WAL segment name and returns a tuple made of the segment sequence number (dependent on the WAL segment size of the cluster) and its timeline, as of a thin SQL wrapper around the existing XLogFromFileName(). This function has multiple usages, like being able to compile a LSN from a file name and an offset, or finding the timeline of a segment without having to do to some maths based on the first eight characters of the segment. Bump catalog version. Author: Bharath Rupireddy Reviewed-by: Nathan Bossart, Kyotaro Horiguchi, Maxim Orlov, Michael Paquier Discussion: https://postgr.es/m/CALj2ACWV=FCddsxcGbVOA=cvPyMr75YCFbSQT6g4KDj=gcJK4g@mail.gmail.com
* Static assertions cleanupPeter Eisentraut2022-12-15
| | | | | | | | | | | | | | | | | | | | | Because we added StaticAssertStmt() first before StaticAssertDecl(), some uses as well as the instructions in c.h are now a bit backwards from the "native" way static assertions are meant to be used in C. This updates the guidance and moves some static assertions to better places. Specifically, since the addition of StaticAssertDecl(), we can put static assertions at the file level. This moves a number of static assertions out of function bodies, where they might have been stuck out of necessity, to perhaps better places at the file level or in header files. Also, when the static assertion appears in a position where a declaration is allowed, then using StaticAssertDecl() is more native than StaticAssertStmt(). Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/941a04e7-dd6f-c0e4-8cdf-a33b3338cbda%40enterprisedb.com
* Rethink handling of [Prevent|Is]InTransactionBlock in pipeline mode.Tom Lane2022-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commits f92944137 et al. made IsInTransactionBlock() set the XACT_FLAGS_NEEDIMMEDIATECOMMIT flag before returning "false", on the grounds that that kept its API promises equivalent to those of PreventInTransactionBlock(). This turns out to be a bad idea though, because it allows an ANALYZE in a pipelined series of commands to cause an immediate commit, which is unexpected. Furthermore, if we return "false" then we have another issue, which is that ANALYZE will decide it's allowed to do internal commit-and-start-transaction sequences, thus possibly unexpectedly committing the effects of previous commands in the pipeline. To fix the latter situation, invent another transaction state flag XACT_FLAGS_PIPELINING, which explicitly records the fact that we have executed some extended-protocol command and not yet seen a commit for it. Then, require that flag to not be set before allowing InTransactionBlock() to return "false". Having done that, we can remove its setting of NEEDIMMEDIATECOMMIT without fear of causing problems. This means that the API guarantees of IsInTransactionBlock now diverge from PreventInTransactionBlock, which is mildly annoying, but it seems OK given the very limited usage of IsInTransactionBlock. (In any case, a caller preferring the old behavior could always set NEEDIMMEDIATECOMMIT for itself.) For consistency also require XACT_FLAGS_PIPELINING to not be set in PreventInTransactionBlock. This too is meant to prevent commands such as CREATE DATABASE from silently committing previous commands in a pipeline. Per report from Peter Eisentraut. As before, back-patch to all supported branches (which sadly no longer includes v10). Discussion: https://postgr.es/m/65a899dd-aebc-f667-1d0a-abb89ff3abf8@enterprisedb.com
* Allow DateTimeParseError to handle bad-timezone error messages.Tom Lane2022-12-09
| | | | | | | | | | | | | | | | | | | | | | | | Pay down some ancient technical debt (dating to commit 022fd9966): fix a couple of places in datetime parsing that were throwing ereport's immediately instead of returning a DTERR code that could be interpreted by DateTimeParseError. The reason for that was that there was no mechanism for passing any auxiliary data (such as a zone name) to DateTimeParseError, and these errors seemed to really need it. Up to now it didn't matter that much just where the error got thrown, but now we'd like to have a hard policy that datetime parse errors get thrown from just the one place. Hence, invent a "DateTimeErrorExtra" struct that can be used to carry any extra values needed for specific DTERR codes. Perhaps in the future somebody will be motivated to use this to improve the specificity of other DateTimeParseError messages, but for now just deal with the timezone-error cases. This is on the way to making the datetime input functions report parse errors softly; but it's really an independent change, so commit separately. Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
* Generate pg_stat_get*() functions for tables using macrosMichael Paquier2022-12-06
| | | | | | | | | | | | | The same code pattern is repeated 17 times for int64 counters (0 for missing entry) and 5 times for timestamps (NULL for missing entry) on table entries. This code is switched to use a macro for the basic code instead, shaving a few hundred lines of originally-duplicated code. The function names remain the same, but some fields of PgStat_StatTabEntry have to be renamed to cope with the new style. Author: Bertrand Drouvot Reviewed-by: Nathan Bossart Discussion: https:/postgr.es/m/20221204173207.GA2669116@nathanxps13
* Check the snapshot argument of index_beginscan and familyAlexander Korotkov2022-12-06
| | | | | | | | | | | | Passing a NULL snapshot (InvalidSnapshot) is going to work but only as long as the index can't find any matching rows. This can be confusing for the extension authors, so add an explicit check for this argument. The check is implemented with Assert() in order to avoid overhead in release builds. Reported-by: Sven Klemm Discussion: https://postgr.es/m/CAJ7c6TPxitD4vbKyP-mpmC1XwyHdPPqvjLzm%2BVpB88h8LGgneQ%40mail.gmail.com Author: Aleksander Alekseev Reviewed-by: Pavel Borisov
* Add LSN location in some error messages related to WAL pagesMichael Paquier2022-12-05
| | | | | | | | | | | | | | The error messages reported during any failures while reading or validating the header of a WAL currently includes only the offset of the page but not the compiled LSN referring to the page, requiring an extra step to compile it if looking at the surroundings with pg_waldump or similar. Adding this information costs a bit in translation, but also eases debugging. Author: Bharath Rupireddy Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, Maxim Orlov, Michael Paquier Discussion: https://postgr.es/m/CALj2ACWV=FCddsxcGbVOA=cvPyMr75YCFbSQT6g4KDj=gcJK4g@mail.gmail.com
* Generalize ri_RootToPartitionMap to use for non-partition childrenAlvaro Herrera2022-12-02
| | | | | | | | | | | | | | | | | | | | | | | ri_RootToPartitionMap is currently only initialized for tuple routing target partitions, though a future commit will need the ability to use it even for the non-partition child tables, so make adjustments to the decouple it from the partitioning code. Also, make it lazily initialized via ExecGetRootToChildMap(), making that function its preferred access path. Existing third-party code accessing it directly should no longer do so; consequently, it's been renamed to ri_RootToChildMap, which also makes it consistent with ri_ChildToRootMap. ExecGetRootToChildMap() houses the logic of setting the map appropriately depending on whether a given child relation is partition or not. To support this, also add a separate entry point for TupleConversionMap creation that receives an AttrMap. No new code here, just split an existing function in two. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqEYUhDXSK5BTvG_xk=eaAEJCD4GS3C6uH7ybBvv+Z_Tmg@mail.gmail.com
* Fix memory leak for hashing with nondeterministic collations.Jeff Davis2022-12-01
| | | | | | | Backpatch through 12, where nondeterministic collations were introduced (5e1963fb76). Backpatch-through: 12
* Improve heuristics for compressing the KnownAssignedXids array.Tom Lane2022-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we'd compress only when the active range of array entries reached Max(4 * PROCARRAY_MAXPROCS, 2 * pArray->numKnownAssignedXids). If max_connections is large, the first term could result in not compressing for a long time, resulting in much wastage of cycles in hot-standby backends scanning the array to take snapshots. Get rid of that term, and just bound it to 2 * pArray->numKnownAssignedXids. That however creates the opposite risk, that we might spend too much effort compressing. Hence, consider compressing only once every 128 commit records. (This frequency was chosen by benchmarking. While we only tried one benchmark scenario, the results seem stable over a fairly wide range of frequencies.) Also, force compression when processing RecoveryInfo WAL records (which should be infrequent); the old code could perform compression then, but would do so only after the same array-range check as for the transaction-commit path. Also, opportunistically run compression if the startup process is about to wait for WAL, though not oftener than once a second. This should prevent cases where we waste lots of time by leaving the array not-compressed for long intervals due to low WAL traffic. Lastly, add a simple check to keep us from uselessly compressing when the array storage is already compact. Back-patch, as the performance problem is worse in pre-v14 branches than in HEAD. Simon Riggs and Michail Nikolaev, with help from Tom Lane and Andres Freund. Discussion: https://postgr.es/m/CALdSSPgahNUD_=pB_j=1zSnDBaiOtqVfzo8Ejt5J_k7qZiU1Tw@mail.gmail.com
* Add 'missing_ok' argument to build_attrmap_by_nameAlvaro Herrera2022-11-29
| | | | | | | | | | | | When it's given as true, return a 0 in the position of the missing column rather than raising an error. This is currently unused, but it allows us to reimplement column permission checking in a subsequent commit. It seems worth breaking into a separate commit because it affects unrelated code. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFfiai=qBxPDTjaio_ZcaqUKh+FC=prESrB8ogZgFNNNQ@mail.gmail.com
* Remove promote_trigger_file.Thomas Munro2022-11-29
| | | | | | | | | | | | | | | | | | | Previously, an idle startup (recovery) process would wake up every 5 seconds to have a chance to poll for promote_trigger_file, even if that GUC was not configured. That promotion triggering mechanism was effectively superseded by pg_ctl promote and pg_promote() a long time ago. There probably aren't many users left and it's very easy to change to the modern mechanisms, so we agreed to remove the feature. This is part of a campaign to reduce wakeups on idle systems. Author: Simon Riggs <simon.riggs@enterprisedb.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Ian Lawrence Barwick <barwick@gmail.com> Discussion: https://postgr.es/m/CANbhV-FsjnzVOQGBpQ589%3DnWuL1Ex0Ykn74Nh1hEjp2usZSR5g%40mail.gmail.com
* Improve indenting in _hash_pgaddtupDavid Rowley2022-11-25
| | | | | | | | | The Assert added in d09dbeb9b came out rather ugly after having run pgindent on that code. Here we adjust things to use some local variables so that the Assert remains within the 80-character margin. Author: Ted Yu Discussion: https://postgr.es/m/CALte62wLSir1=x93Jf0xZvHaO009FEJfhVMFwnaR8q=csPP8kQ@mail.gmail.com
* Make multixact error message more explicitAlvaro Herrera2022-11-24
| | | | | | | | | | | | | | There are recent reports involving a very old error message that we have no history of hitting -- perhaps a recently introduced bug. Improve the error message in an attempt to improve our chances of investigating the bug. Per reports from Dimos Stamatakis and Bob Krier. Backpatch to 11. Discussion: https://postgr.es/m/CO2PR0801MB2310579F65529380A4E5EDC0E20A9@CO2PR0801MB2310.namprd08.prod.outlook.com Discussion: https://postgr.es/m/17518-04e368df5ad7f2ee@postgresql.org
* Speedup hash index builds by skipping needless binary searchesDavid Rowley2022-11-24
| | | | | | | | | | | | | | | | | When building hash indexes using the spool method, tuples are added to the index page in hashkey order. Because of this, we can safely skip performing the binary search on the existing tuples on the page to find the location to insert the tuple based on its hashkey value. For this case, we can just always put the tuple at the end of the item array as the tuples will always arrive in hashkey order. Testing has shown that this can improve hash index build speeds by 5-15% with a unique set of integer values. Author: Simon Riggs Reviewed-by: David Rowley Tested-by: David Zhang, Tomas Vondra Discussion: https://postgr.es/m/CANbhV-GBc5JoG0AneUGPZZW3o4OK5LjBGeKe_icpC3R1McrZWQ@mail.gmail.com
* Simplify vacuum_set_xid_limits() signature.Peter Geoghegan2022-11-23
| | | | | | | | | | | | Pass VACUUM parameters (VacuumParams state) to vacuum_set_xid_limits() directly, rather than passing most individual VacuumParams fields as separate arguments. Also make vacuum_set_xid_limits() output parameter symbol names match those used by its vacuumlazy.c caller. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wz=TE7gW5DgSahDkf0UEZigFGAoHNNN6EvSrdzC=Kn+hrA@mail.gmail.com
* Don't test HEAP_XMAX_INVALID when freezing xmax.Peter Geoghegan2022-11-23
| | | | | | | | | | | | | | | | | | | | We shouldn't ever need to rely on whether HEAP_XMAX_INVALID is set in t_infomask when considering whether or not an xmax should be deemed already frozen, since that status flag is just a hint. The only acceptable representation for an "xmax_already_frozen" raw xmax field is the transaction ID value zero (also known as InvalidTransactionId). Adjust code that superficially appeared to rely on HEAP_XMAX_INVALID to make the rule about xmax_already_frozen clear. Also avoid needlessly rereading the tuple's raw xmax. Oversight in bugfix commit d2599ecf. There is no evidence that this ever led to incorrect behavior, so no backpatch. The worst consequence of this bug was that VACUUM could hypothetically fail to notice and report on certain kinds of corruption, which seems fairly benign. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wzkh3DMCDRPfhZxj9xCq9v3WmzvmbiCpf1dNKUBPadhCbQ@mail.gmail.com
* lwlock: Fix quadratic behavior with very long wait listsAndres Freund2022-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now LWLockDequeueSelf() sequentially searched the list of waiters to see if the current proc is still is on the list of waiters, or has already been removed. In extreme workloads, where the wait lists are very long, this leads to a quadratic behavior. #backends iterating over a list #backends long. Additionally, the likelihood of needing to call LWLockDequeueSelf() in the first place also increases with the increased length of the wait queue, as it becomes more likely that a lock is released while waiting for the wait list lock, which is held for longer during lock release. Due to the exponential back-off in perform_spin_delay() this is surprisingly hard to detect. We should make that easier, e.g. by adding a wait event around the pg_usleep() - but that's a separate patch. The fix is simple - track whether a proc is currently waiting in the wait list or already removed but waiting to be woken up in PGPROC->lwWaiting. In some workloads with a lot of clients contending for a small number of lwlocks (e.g. WALWriteLock), the fix can substantially increase throughput. As the quadratic behavior arguably is a bug, we might want to decide to backpatch this fix in the future. Author: Andres Freund <andres@anarazel.de> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/20221027165914.2hofzp4cvutj6gin@awork3.anarazel.de Discussion: https://postgr.es/m/CALj2ACXktNbG=K8Xi7PSqbofTZozavhaxjatVc14iYaLu4Maag@mail.gmail.com
* Standardize rmgrdesc recovery conflict XID output.Peter Geoghegan2022-11-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Standardize on the name snapshotConflictHorizon for all XID fields from WAL records that generate recovery conflicts when in hot standby mode. This supersedes the previous latestRemovedXid naming convention. The new naming convention places emphasis on how the values are actually used by REDO routines. How the values are generated during original execution (details of which vary by record type) is deemphasized. Users of tools like pg_waldump can now grep for snapshotConflictHorizon to see all potential sources of recovery conflicts in a standardized way, without necessarily having to consider which specific record types might be involved. Also bring a couple of WAL record types that didn't follow any kind of naming convention into line. These are heapam's VISIBLE record type and SP-GiST's VACUUM_REDIRECT record type. Now every WAL record whose REDO routine calls ResolveRecoveryConflictWithSnapshot() passes through the snapshotConflictHorizon field from its WAL record. This is follow-up work to the refactoring from commit 9e540599 that made FREEZE_PAGE WAL records use a standard snapshotConflictHorizon style XID cutoff. No bump in XLOG_PAGE_MAGIC, since the underlying format of affected WAL records doesn't change. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wzm2CQUmViUq7Opgk=McVREHSOorYaAjR1ZpLYkRN7_dPw@mail.gmail.com
* Use correct type name in comments about freezing.Peter Geoghegan2022-11-17
| | | | Oversight in commit 9e540599, which added freeze plan deduplication.
* Variable renaming in preparation for refactoringPeter Eisentraut2022-11-16
| | | | | | | | Rename page -> block and dp -> page where appropriate. The old naming mixed up block and page in confusing ways. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
* Remove useless castsPeter Eisentraut2022-11-16
| | | | | Maybe these are left from when PageGetItem() was a macro, but now they are clearly useless.