aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* With GB18030, prevent SIGSEGV from reading past end of allocation.Noah Misch2025-05-05
| | | | | | | | | | | | | | | | | | | | | With GB18030 as source encoding, applications could crash the server via SQL functions convert() or convert_from(). Applications themselves could crash after passing unterminated GB18030 input to libpq functions PQescapeLiteral(), PQescapeIdentifier(), PQescapeStringConn(), or PQescapeString(). Extension code could crash by passing unterminated GB18030 input to jsonapi.h functions. All those functions have been intended to handle untrusted, unterminated input safely. A crash required allocating the input such that the last byte of the allocation was the last byte of a virtual memory page. Some malloc() implementations take measures against that, making the SIGSEGV hard to reach. Back-patch to v13 (all supported versions). Author: Noah Misch <noah@leadboat.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Backpatch-through: 13 Security: CVE-2025-4207
* Translation updatesPeter Eisentraut2025-05-05
| | | | | Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 5385d44a7a0e6f93326380101d0605cf1ca78890
* Handle self-referencing FKs correctly in partitioned tablesÁlvaro Herrera2025-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For self-referencing foreign keys in partitioned tables, we weren't handling creation of pg_constraint rows during CREATE TABLE PARTITION AS as well as ALTER TABLE ATTACH PARTITION. This is an old bug -- mostly, we broke this in 614a406b4ff1 while trying to fix it (so 12.13, 13.9, 14.6 and 15.0 and up all behave incorrectly). This commit reverts part of that with additional fixes for full correctness, and installs more tests to verify the parts we broke, not just the catalog contents but also the user-visible behavior. Backpatch to all live branches. In branches 13 and 14, commit 46a8c27a7226 changed the behavior during DETACH to drop a FK constraint rather than trying to repair it, because the complete fix of repairing catalog constraints was problematic due to lack of previous fixes. For this reason, the test behavior in those branches is a bit different. However, as best as I can tell, the fix works correctly there. In release notes we have to recommend that all self-referencing foreign keys on partitioned tables be recreated if partitions have been created or attached after the FK was created, keeping in mind that violating rows might already be present on the referencing side. Reported-by: Guillaume Lelarge <guillaume@lelarge.info> Reported-by: Matthew Gabeler-Lee <fastcat@gmail.com> Reported-by: Luca Vallisa <luca.vallisa@gmail.com> Discussion: https://postgr.es/m/CAECtzeWHCA+6tTcm2Oh2+g7fURUJpLZb-=pRXgeWJ-Pi+VU=_w@mail.gmail.com Discussion: https://postgr.es/m/18156-a44bc7096f0683e6@postgresql.org Discussion: https://postgr.es/m/CAAT=myvsiF-Attja5DcWoUWh21R12R-sfXECY2-3ynt8kaOqjw@mail.gmail.com
* Fix xmin advancement during fast_forward decoding.Amit Kapila2025-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During logical decoding, we advance catalog_xmin of logical too early in fast_forward mode, resulting in required catalog data being removed by vacuum. This mode is normally used to advance the slot without processing the changes, but we still can't let the slot's xmin to advance to an incorrect value. Commit f49a80c481 fixed a similar issue where the logical slot's catalog_xmin was getting advanced prematurely during non-fast-forward mode. During xl_running_xacts processing, instead of directly advancing the slot's xmin to the oldest running xid in the record, it allowed the xmin to be held back for snapshots that can be used for not-yet-replayed transactions, as those might consider older txns as running too. However, it missed the fact that the same problem can happen during fast_forward mode decoding, as we won't build a base snapshot in that mode, and the future call to get_changes from the same slot can miss seeing the required catalog changes leading to incorrect reslts. This commit allows building the base snapshot even in fast_forward mode to prevent the early advancement of xmin. Reported-by: Amit Kapila <amit.kapila16@gmail.com> Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 13 Discussion: https://postgr.es/m/CAA4eK1LqWncUOqKijiafe+Ypt1gQAQRjctKLMY953J79xDBgAg@mail.gmail.com Discussion: https://postgr.es/m/OS0PR01MB57163087F86621D44D9A72BF94BB2@OS0PR01MB5716.jpnprd01.prod.outlook.com
* Fix data loss in logical replication.Amit Kapila2025-04-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit is a backpatch of commit 4909b38af0 for 13. Data loss can happen when the DDLs like ALTER PUBLICATION ... ADD TABLE ... or ALTER TYPE ... that don't take a strong lock on table happens concurrently to DMLs on the tables involved in the DDL. This happens because logical decoding doesn't distribute invalidations to concurrent transactions and those transactions use stale cache data to decode the changes. The problem becomes bigger because we keep using the stale cache even after those in-progress transactions are finished and skip the changes required to be sent to the client. This commit fixes the issue by distributing invalidation messages from catalog-modifying transactions to all concurrent in-progress transactions. This allows the necessary rebuild of the catalog cache when decoding new changes after concurrent DDL. The fix for 13 is different from what we did in branches 14 and above, such that for 13, the concurrent DDL changes (from DDL types mentioned earlier) will be visible for any newly started transactions. To make them visible in concurrent transactions, we need to introduce a new change type REORDER_BUFFER_CHANGE_INVALIDATION, already present in branches 14 and greater. We decided not to take the risk of a bigger change and fix the issue partially in 13. Reported-by: hubert depesz lubaczewski <depesz@depesz.com> Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Tested-by: Benoit Lobréau <benoit.lobreau@dalibo.com> Discussion: https://postgr.es/m/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com Discussion: https://postgr.es/m/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com Discussion: https://postgr.es/m/CAD21AoAhU3kp8shYqP=ExiFDZ9sZxpFb5WzLa0p+vEe5j+7CWQ@mail.gmail.com
* Avoid ERROR at ON COMMIT DELETE ROWS after relhassubclass=f.Noah Misch2025-04-20
| | | | | | | | | | | | | | | | Commit 7102070329d8147246d2791321f9915c3b5abf31 fixed a similar bug, but it missed the case of database-wide ANALYZE ("use_own_xacts" mode). Commit a07e03fd8fa7daf4d1356f7cb501ffe784ea6257 changed consequences from silent discard of a pg_class stats (relpages et al.) update to ERROR "tuple to be updated was already modified". Losing a relpages update of an ON COMMIT DELETE ROWS table was negligible, but a COMMIT-time error isn't negligible. Back-patch to v13 (all supported versions). Reported-by: Richard Guo <guofenglinux@gmail.com Reported-by: Robins Tharakan <tharakan@gmail.com> Discussion: https://postgr.es/m/CAMbWs4-XwMKMKJ_GT=p3_-_=j9rQSEs1FbDFUnW9zHuKPsPNEQ@mail.gmail.com Backpatch-through: 13
* Fix GIN's shimTriConsistentFn to not corrupt its input.Tom Lane2025-04-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0f21db36d made an assumption that GIN triConsistentFns would not modify their input entryRes[] arrays. But in fact, the "shim" triConsistentFn that we use for opclasses that don't supply their own did exactly that, potentially leading to wrong answers from a GIN index search. Through bad luck, none of the test cases that we have for such opclasses exposed the bug. One response to this could be that the assumption of consistency check functions not modifying entryRes[] arrays is a bad one, but it still seems reasonable to me. Notably, shimTriConsistentFn is itself assuming that with respect to the underlying boolean consistentFn, so it's sure being self-centered in supposing that it gets to do so. Fortunately, it's quite simple to fix shimTriConsistentFn to restore the entry-time state of entryRes[], so let's do that instead. This issue doesn't affect any core GIN opclasses, since they all supply their own triConsistentFns. It does affect contrib modules btree_gin, hstore, and intarray. Along the way, I (tgl) noticed that shimTriConsistentFn failed to pick up on a "recheck" flag returned by its first call to the boolean consistentFn. This may be only a latent problem, since it would be unlikely for a consistentFn to set recheck for the all-false case and not any other cases. (Indeed, none of our contrib modules do that.) Nonetheless, it's formally wrong. Reported-by: Vinod Sridharan <vsridh90@gmail.com> Author: Vinod Sridharan <vsridh90@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAFMdLD7XzsXfi1+DpTqTgrD8XU0i2C99KuF=5VHLWjx4C1pkcg@mail.gmail.com Backpatch-through: 13
* Fix race with synchronous_standby_names at startupMichael Paquier2025-04-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | synchronous_standby_names cannot be reloaded safely by backends, and the checkpointer is in charge of updating a state in shared memory if the GUC is enabled in WalSndCtl, to let the backends know if they should wait or not for a given LSN. This provides a strict control on the timing of the waiting queues if the GUC is enabled or disabled, then reloaded. The checkpointer is also in charge of waking up the backends that could be waiting for a LSN when the GUC is disabled. This logic had a race condition at startup, where it would be possible for backends to not wait for a LSN even if synchronous_standby_names is enabled. This would cause visibility issues with transactions that we should be waiting for but they were not. The problem lasts until the checkpointer does its initial update of the shared memory state when it loads synchronous_standby_names. In order to take care of this problem, the shared memory state in WalSndCtl is extended to detect if it has been initialized by the checkpointer, and not only check if synchronous_standby_names is defined. In WalSndCtlData, sync_standbys_defined is renamed to sync_standbys_status, a bits8 able to know about two states: - If the shared memory state has been initialized. This flag is set by the checkpointer at startup once, and never removed. - If synchronous_standby_names is known as defined in the shared memory state. This is the same as the previous sync_standbys_defined in WalSndCtl. This method gives a way for backends to decide what they should do until the shared memory area is initialized, and they now ultimately fall back to a check on the GUC value in this case, which is the best thing that can be done. Fortunately, SyncRepUpdateSyncStandbysDefined() is called immediately by the checkpointer when this process starts, so the window is very narrow. It is possible to enlarge the problematic window by making the checkpointer wait at the beginning of SyncRepUpdateSyncStandbysDefined() with a hardcoded sleep for example, and doing so has showed that a 2PC visibility test is indeed failing. On machines slow enough, this bug would cause spurious failures. In 17~, we have looked at the possibility of adding an injection point to have a reproducible test, but as the problematic window happens at early startup, we would need to invent a way to make an injection point optionally persistent across restarts when attached, something that would be fine for this case as it would involve the checkpointer. This issue is quite old, and can be reproduced on all the stable branches. Author: Melnikov Maksim <m.melnikov@postgrespro.ru> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ru Backpatch-through: 13
* Reset InstallXLogFileSegmentActive after walreceiver self-initiated exit.Noah Misch2025-04-06
| | | | | | | | | | | | | | | | | | | | | | | After commit cc2c7d65fc27e877c9f407587b0b92d46cd6dd16 added this flag, failure to reset it caused assertion failures. In non-assert builds, it made the system fail to achieve the objectives listed in that commit; chiefly, we might emit a spurious log message. Back-patch to v15, where that commit first appeared. Bharath Rupireddy and Kyotaro Horiguchi. Reviewed by Dilip Kumar, Nathan Bossart and Michael Paquier. Reported by Dilip Kumar. This commit has been applied as of b4f584f9d2a1 in v15 and newer versions. This is required on stable branches of v13 and v14 to fix a regression reported by Noah Misch, introduced by 1f95181b44c8, causing spurious failures in archive recovery (neither streaming nor archive recovery) with concurrent restartpoints. The backpatched versions of the patches have been aligned on these branches by me. Tests have been conducted by the both of us. Discussion: https://postgr.es/m/CAFiTN-sE3ry=ycMPVtC+Djw4Fd7gbUGVv_qqw6qfzp=JLvqT3g@mail.gmail.com Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com Backpatch-through: 13
* Skip WAL recycling and preallocation during archive recovery.Noah Misch2025-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous commit addressed the chief consequences of a race condition between InstallXLogFileSegment() and KeepFileRestoredFromArchive(). Fix three lesser consequences. A spurious durable_rename_excl() LOG message remained possible. KeepFileRestoredFromArchive() wasted the proceeds of WAL recycling and preallocation. Finally, XLogFileInitInternal() could return a descriptor for a file that KeepFileRestoredFromArchive() had already unlinked. That felt like a recipe for future bugs. This commit has been applied as of cc2c7d65fc27 in v15 and newer versions. This is required on stable branches of v13 and v14 to fix a regression reported by Noah Misch, introduced by 1f95181b44c8, causing spurious failures in archive recovery (neither streaming nor archive recovery) with concurrent restartpoints. The backpatched versions of the patches have been aligned on these branches by me, Noah Misch is the author. Tests have been conducted by the both of us. Note that this commit is known to have introduced a regression of its own. This is fixed by the commit following this one, and not grouped in a single commit to keep the commit history consistent across all branches. Reported-by: Arun Thirupathi Author: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com Backpatch-through: 13
* Don't ERROR on PreallocXlogFiles() race condition.Noah Misch2025-04-06
| | | | | | | | | | | | | | | | | | | | | | | | Before a restartpoint finishes PreallocXlogFiles(), a startup process KeepFileRestoredFromArchive() call can unlink the preallocated segment. If a CHECKPOINT sql command had elicited the restartpoint experiencing the race condition, that sql command failed. Moreover, the restartpoint omitted its log_checkpoints message and some inessential resource reclamation. Prevent the ERROR by skipping open() of the segment. Since these consequences are so minor, no back-patch. This commit has been applied as of 2b3e4672f760 in v15 and newer versions. This is required on stable branches of v13 and v14 to fix a regression reported by Noah Misch, introduced by 1f95181b44c8, causing spurious failures in archive recovery (neither streaming nor archive recovery) with concurrent restartpoints. The backpatched versions of the patches have been aligned on these branches by me, Noah Misch is the author. Tests have been conducted by the both of us. Reported-by: Arun Thirupathi Author: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com Backpatch-through: 13
* Revert "Add HINT for restartpoint race with KeepFileRestoredFromArchive()."Michael Paquier2025-04-06
| | | | | | | | | | | | | | | | This reverts commit 8ad6c5dbbe5a, which was a commit specific to v14 and older branches as the race condition between restartpoints and KeepFileRestoredFromArchive() still existed. 1f95181b44c8 has worsened the situation on these two branches, causing spurious failures in archive recovery (neither streaming nor archive recovery) with concurrent restartpoints. The same logic as v15 and newer versions will be applied in some follow-up commits to close this problem, making this HINT not necessary anymore. Reported-by: Arun Thirupathi Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com Backpatch-through: 13
* Remove XLogFileInit() ability to unlink a pre-existing file.Noah Misch2025-04-06
| | | | | | | | | | | | | | | | | | | | Only initdb used it. initdb refuses to operate on a non-empty directory and generally does not cope with pre-existing files of other kinds. Hence, use the opportunity to simplify. This commit has been applied as of 421484f79c0b in v15 and newer versions. This is required on stable branches of v13 and v14 to fix a regression reported by Noah Misch, introduced by 1f95181b44c8, causing spurious failures in archive recovery (neither streaming nor archive recovery) with concurrent restartpoints. The backpatched versions of the patches have been aligned on these branches by me, Noah Misch is the author. Tests have been conducted by the both of us. Reported-by: Arun Thirupathi Author: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com Backpatch-through: 13
* In XLogFileInit(), fix *use_existent postcondition to suit callers.Noah Misch2025-04-06
| | | | | | | | | | | | | | | | | | | | Infrequently, the mismatch caused log_checkpoints messages and TRACE_POSTGRESQL_CHECKPOINT_DONE() to witness an "added" count too high by one. Since that consequence is so minor, no back-patch. This commit has been applied as of 85656bc3050f in v15 and newer versions. This is required on stable branches of v13 and v14 to fix a regression reported by Noah Misch, introduced by 1f95181b44c8, causing spurious failures in archive recovery (neither streaming nor archive recovery) with concurrent restartpoints. The backpatched versions of the patches have been aligned on these branches by me, Noah Misch is the author. Tests have been conducted by the both of us. Reported-by: Arun Thirupathi Author: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com Backpatch-through: 13
* Remove XLogFileInit() ability to skip ControlFileLock.Noah Misch2025-04-06
| | | | | | | | | | | | | | | | | | Cold paths, initdb and end-of-recovery, used it. Don't optimize them. This commit has been applied as of c53c6b98d38a in v15 and newer versions. This is required on stable branches of v13 and v14 to fix a regression reported by Noah Misch, introduced by 1f95181b44c8, causing spurious failures in archive recovery (neither streaming nor archive recovery) with concurrent restartpoints. The backpatched versions of the patches have been aligned on these branches by me, Noah Misch is the author. Tests have been conducted by the both of us. Reported-by: Arun Thirupathi Author: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com Backpatch-through: 13
* Fix parse_cte.c's failure to examine sub-WITHs in DML statements.Tom Lane2025-04-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | makeDependencyGraphWalker thought that only SelectStmt nodes could contain a WithClause. Which was true in our original implementation of WITH, but astonishingly we missed updating this code when we added the ability to attach WITH to INSERT/UPDATE/DELETE (and later MERGE). Moreover, since it was coded to deliberately block recursion to a WithClause, even updating raw_expression_tree_walker didn't save it. The upshot of this was that we didn't see references to outer CTE names appearing within an inner WITH, and would neither complain about disallowed recursion nor account for such references when sorting CTEs into a usable order. The lack of complaints about this is perhaps not so surprising, because typical usage of WITH wouldn't hit either case. Still, it's pretty broken; failing to detect recursion here leads to assert failures or worse later on. Fix by factoring out the processing of sub-WITHs into a new function WalkInnerWith, and invoking that for all the statement types that can have WITH. Bug: #18878 Reported-by: Yu Liang <luy70@psu.edu> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18878-a26fa5ab6be2f2cf@postgresql.org Backpatch-through: 13
* Relax assertion in finding correct GiST parentHeikki Linnakangas2025-04-04
| | | | | | | | | | | | | | | | | | Commit 28d3c2ddcf introduced an assertion that if the memorized downlink location in the insertion stack isn't valid, the parent's LSN should've changed too. Turns out that was too strict. In gistFindCorrectParent(), if we walk right, we update the parent's block number and clear its memorized 'downlinkoffnum'. That triggered the assertion on next call to gistFindCorrectParent(), if the parent needed to be split too. Relax the assertion, so that it's OK if downlinkOffnum is InvalidOffsetNumber. Backpatch to v13-, all supported versions. The assertion was added in commit 28d3c2ddcf in v12. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://www.postgresql.org/message-id/18396-03cac9beb2f7aac3@postgresql.org
* Make dblink interruptible, via new libpqsrv APIs.Noah Misch2025-04-03
| | | | | | | | | | | | | | | This replaces dblink's blocking libpq calls, allowing cancellation and allowing DROP DATABASE (of a database not involved in the query). Apart from explicit dblink_cancel_query() calls, dblink still doesn't cancel the remote side. The replacement for the blocking calls consists of new, general-purpose query execution wrappers in the libpqsrv facility. Out-of-tree extensions should adopt these. The original commit d3c5f37dd543498cc7c678815d3921823beec9e9 did not back-patch. Back-patch now to v16-v13, bringing coverage to all supported versions. This back-patch omits the orignal's refactoring in postgres_fdw. Discussion: https://postgr.es/m/20231122012945.74@rfd.leadboat.com
* Remove unnecessary type violation in tsvectorrecv().Tom Lane2025-04-02
| | | | | | | | | | | | | | | | | | | | | | | | compareentry() is declared to work on WordEntryIN structs, but tsvectorrecv() is using it in two places to work on WordEntry structs. This is almost okay, since WordEntry is the first field of WordEntryIN. But on machines with 8-byte pointers, WordEntryIN will have a larger alignment spec than WordEntry, and it's at least theoretically possible that the compiler could generate code that depends on the larger alignment. Given the lack of field reports, this may be just a hypothetical bug that upsets nothing except sanitizer tools. Or it may be real on certain hardware but nobody's tried to use tsvectorrecv() on such hardware. In any case we should fix it, and the fix is trivial: just change compareentry() so that it works on WordEntry without any mention of WordEntryIN. We can also get rid of the quite-useless intermediate function WordEntryCMP. Bug: #18875 Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18875-07a29c49c825a608@postgresql.org Backpatch-through: 13
* Remove HeapBitmapScan's skip_fetch optimizationAndres Freund2025-04-02
| | | | | | | | | | | | | | | | | | | | | The optimization does not take the removal of TIDs by a concurrent vacuum into account. The concurrent vacuum can remove dead TIDs and make pages ALL_VISIBLE while those dead TIDs are referenced in the bitmap. This can lead to a skip_fetch scan returning too many tuples. It likely would be possible to implement this optimization safely, but we don't have the necessary infrastructure in place. Nor is it clear that it's worth building that infrastructure, given how limited the skip_fetch optimization is. In the backbranches we just disable the optimization by always passing need_tuples=true to table_beginscan_bm(). We can't perform API/ABI changes in the backbranches and we want to make the change as minimal as possible. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reported-By: Konstantin Knizhnik <knizhnik@garret.ru> Discussion: https://postgr.es/m/CAEze2Wg3gXXZTr6_rwC+s4-o2ZVFB5F985uUSgJTsECx6AmGcQ@mail.gmail.com Backpatch-through: 13
* Need to do CommandCounterIncrement after StoreAttrMissingVal.Tom Lane2025-04-02
| | | | | | | | | | | | | | | | | | | | Without this, an additional change to the same pg_attribute row within the same command will fail. This is possible at least with ALTER TABLE ADD COLUMN on a multiple-inheritance-pathway structure. (Another potential hazard is that immediately-following operations might not see the missingval.) Introduced by 95f650674, which split the former coding that used a single pg_attribute update to change both atthasdef and atthasmissing/attmissingval into two updates, but missed that this should entail two CommandCounterIncrements as well. Like that fix, back-patch through v13. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/025a3ffa-5eff-4a88-97fb-8f583b015965@gmail.com Backpatch-through: 13
* doc: Correct description of values used in FSM for indexesMichael Paquier2025-03-27
| | | | | | | | | | | The implementation of FSM for indexes is simpler than heap, where 0 is used to track if a page is in-use and (BLCKSZ - 1) if a page is free. One comment in indexfsm.c and one description in the documentation of pg_freespacemap were incorrect about that. Author: Alex Friedman <alexf01@gmail.com> Discussion: https://postgr.es/m/71eef655-c192-453f-ac45-2772fec2cb04@gmail.com Backpatch-through: 13
* Fix ARRAY_SUBLINK and ARRAY[] for int2vector and oidvector input.Tom Lane2025-03-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the given input_type yields valid results from both get_element_type and get_array_type, initArrayResultAny believed the former and treated the input as an array type. However this is inconsistent with what get_promoted_array_type does, leading to situations where the output of an ARRAY() subquery is labeled with the wrong type: it's labeled as oidvector[] but is really a 2-D array of OID. That at least results in strange output, and can result in crashes if further processing such as unnest() is applied. AFAIK this is only possible with the int2vector and oidvector types, which are special-cased to be treated mostly as true arrays even though they aren't quite. Fix by switching the logic to match get_promoted_array_type by testing get_array_type not get_element_type, and remove an Assert thereby made pointless. (We need not introduce a symmetrical check for get_element_type in the other if-branch, because initArrayResultArr will check it.) This restores the behavior that existed before bac27394a introduced initArrayResultAny: the output really is int2vector[] or oidvector[]. Comparable confusion exists when an input of an ARRAY[] construct is int2vector or oidvector: transformArrayExpr decides it's dealing with a multidimensional array constructor, and we end up with something that's a multidimensional OID array but is alleged to be of type oidvector. I have not found a crashing case here, but it's easy to demonstrate totally-wrong results. Adjust that code so that what you get is an oidvector[] instead, for consistency with ARRAY() subqueries. (This change also makes these types work like domains-over-arrays in this context, which seems correct.) Bug: #18840 Reported-by: yang lei <ylshiyu@126.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18840-fbc9505f066e50d6@postgresql.org Backpatch-through: 13
* Repair commits 317aba70e et al for -DWRITE_READ_PARSE_PLAN_TREES.Tom Lane2025-03-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Letting the rewriter keep RangeTblEntry.relid when expanding a view RTE, without making the outfuncs/readfuncs changes that went along with that originally, is more problematic than I realized. It causes WRITE_READ_PARSE_PLAN_TREES testing to fail because outfuncs/readfuncs don't think relid need be saved in an RTE_SUBQUERY RTE. There doesn't seem to be any other good route to fixing the whole-row Var problem solved at f4e7756ef, so we just have to deal with the consequences. We can make the eventually-produced plan tree safe for WRITE_READ_PARSE_PLAN_TREES by clearing the relid field at the end of planning, as was already being done for the functions field. (The functions field is not problematic here because our abuse of it is strictly local to the planner.) However, there is no nice fix for the post-rewrite WRITE_READ_PARSE_PLAN_TREES test. The solution adopted here is to remove the post-rewrite test in the affected branches. That's surely less than ideal, but a couple of arguments can be made why it's not unacceptable. First, the behavior of outfuncs/readfuncs for parsetrees in these branches is frozen no matter what, because of catalog stability requirements. So we're not testing anything that is going to change. Second, testing WRITE_READ_PARSE_PLAN_TREES at this particular time doesn't correspond to any direct system functionality requirement, neither rule storage nor plan transmission. Reported-by: Andres Freund <andres@anarazel.de> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/3518c50a-ab18-482f-b916-a37263622501@deepbluecap.com Backpatch-through: 13-15
* Build whole-row Vars the same way during parsing and planning.Tom Lane2025-03-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | makeWholeRowVar() has different rules for constructing a whole-row Var depending on the kind of RTE it's representing. This turns out to be problematic because the rewriter and planner can convert view RTEs and set-returning-function RTEs into subquery RTEs; so a whole-row Var made during planning might look different from one made by the parser. In isolation this doesn't cause any problem, but if a query contains Vars made both ways for the same varno, there are cross-checks in the executor that will complain. This manifests for UPDATE, DELETE, and MERGE queries that use whole-row table references. To fix, we need makeWholeRowVar() to produce the same result from an inlined RTE as it would have for the original. For an inlined view, we can use RangeTblEntry.relid to detect that this had been a view RTE. For inlined SRFs, make a data structure definition change akin to commit 47bb9db75, and say that we won't clear RangeTblEntry.functions until the end of planning. That allows makeWholeRowVar() to repeat what it would have done with the unmodified RTE. Reported-by: Duncan Sands <duncan.sands@deepbluecap.com> Reported-by: Dean Rasheed <dean.a.rasheed@gmail.com> Diagnosed-by: Tender Wang <tndrwang@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/3518c50a-ab18-482f-b916-a37263622501@deepbluecap.com Backpatch-through: 13
* Preserve RangeTblEntry.relid when expanding a view RTE.Tom Lane2025-03-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | When the rewriter converts an RTE_RELATION RTE for a view into an RTE_SUBQUERY RTE containing the view's defining query, leave the relid field alone instead of zeroing it. This allows the planner to tell that the subquery came from a view rather than having been written in-line, which is needed to support an upcoming planner bug fix. We cannot change the behavior of the outfuncs/readfuncs code in released branches, so the relid field will not survive in plans passed to parallel workers; therefore this info should not be relied on in the executor. This back-patches a portion of the data structure definitional changes made in v16 and up by commit 47bb9db75. It's being committed separately for visibility in the commit log, but with luck it will not actually matter to anyone. While it's not inconceivable that this change will break code expecting relid to be zero in a subquery RTE, we can hope that such code has already been adjusted to cope with v16 and up. Reported-by: Duncan Sands <duncan.sands@deepbluecap.com> Diagnosed-by: Tender Wang <tndrwang@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/3518c50a-ab18-482f-b916-a37263622501@deepbluecap.com Backpatch-through: 13-15
* BRIN: be more strict about required support procsÁlvaro Herrera2025-03-11
| | | | | | | | | | | | | | | | | | | With improperly defined operator classes, it's possible to get a Postgres crash because we'd try to invoke a procedure that doesn't exist. This is because the code is being a bit too trusting that the opclass is correctly defined. Add some ereport(ERROR)s for cases where mandatory support procedures are not defined, transforming the crashes into errors. The particular case that was reported is an incomplete opclass in PostGIS. Backpatch all the way down to 13. Reported-by: Tobias Wendorff <tobias.wendorff@tu-dortmund.de> Diagnosed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/fb6d9a35-6c8e-4869-af80-0a4944a793a4@tu-dortmund.de
* Fix a few more redundant calls of GetLatestSnapshot()Heikki Linnakangas2025-03-10
| | | | | | | | | | Commit 2367503177 fixed this in RelationFindReplTupleByIndex(), but I missed two other similar cases. Per report from Ranier Vilela. Discussion: https://www.postgresql.org/message-id/CAEudQArUT1dE45WN87F-Gb7XMy_hW6x1DFd3sqdhhxP-RMDa0Q@mail.gmail.com Backpatch-through: 13
* Fix snapshot used in logical replication index lookupHeikki Linnakangas2025-03-10
| | | | | | | | | | | | | | | | | | | | | | | | The function calls GetLatestSnapshot() to acquire a fresh snapshot, makes it active, and was meant to pass it to table_tuple_lock(), but instead called GetLatestSnapshot() again to acquire yet another snapshot. It was harmless because the heap AM and all other known table AMs ignore the 'snapshot' argument anyway, but let's be tidy. In the long run, this perhaps should be redesigned so that snapshot was not needed in the first place. The table AM API uses TID + snapshot as the unique identifier for the row version, which is questionable when the row came from an index scan with a Dirty snapshot. You might lock a different row version when you use a different snapshot in the table_tuple_lock() call (a fresh MVCC snapshot) than in the index scan (DirtySnapshot). However, in the heap AM and other AMs where the TID alone identifies the row version, it doesn't matter. So for now, just fix the obvious albeit harmless bug. This has been wrong ever since the table AM API was introduced in commit 5db6df0c01, so backpatch to all supported versions. Discussion: https://www.postgresql.org/message-id/83d243d6-ad8d-4307-8b51-2ee5844f6230@iki.fi Backpatch-through: 13
* Clear errno before calling strtol() in spell.c.Tom Lane2025-03-08
| | | | | | | | | | | | | Per POSIX, a caller of strtol() that wishes to check for errors must set errno to 0 beforehand. Several places in spell.c neglected that, so that they risked delivering a false overflow error in case errno had been ERANGE already. Given the lack of field reports, this case may be unreachable at present --- but it's surely trouble waiting to happen, so fix it. Author: Jacob Brazeal <jacob.brazeal@gmail.com> Discussion: https://postgr.es/m/CA+COZaBhsq6EromFm+knMJfzK6nTpG23zJ+K2=nfUQQXcj_xcQ@mail.gmail.com Backpatch-through: 13
* Fix some performance issues in GIN query startup.Tom Lane2025-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a GIN index search had a lot of search keys (for example, "jsonbcol ?| array[]" with tens of thousands of array elements), both ginFillScanKey() and startScanKey() took O(N^2) time. Worse, those loops were uncancelable for lack of CHECK_FOR_INTERRUPTS. The problem in ginFillScanKey() is the brute-force search key de-duplication done in ginFillScanEntry(). The most expedient solution seems to be to just stop trying to de-duplicate once there are "too many" search keys. We could imagine working harder, say by using a sort-and-unique algorithm instead of brute force compare-all-the-keys. But it seems unlikely to be worth the trouble. There is no correctness issue here, since the code already allowed duplicate keys if any extra_data is present. The problem in startScanKey() is the loop that attempts to identify the first non-required search key. In the submitted test case, that vainly tests all the key positions, and each iteration takes O(N) time. One part of that is that it's reinitializing the entryRes[] array from scratch each time, which is entirely unnecessary given that the triConsistentFn isn't supposed to scribble on its input. We can easily adjust the array contents incrementally instead. The other part of it is that the triConsistentFn may itself take O(N) time (and does in this test case). This is all extremely brute force: in simple cases with AND or OR semantics, we could know without any looping whatever that all or none of the keys are required. But GIN opclasses don't have any API for exposing that knowledge, so at least in the short run there is little to be done about that. Put in a CHECK_FOR_INTERRUPTS so that at least the loop is cancelable. These two changes together resolve the primary complaint that the test query doesn't respond promptly to cancel interrupts. Also, while they don't completely eliminate the O(N^2) behavior, they do provide quite a nice speedup for mid-sized examples. Bug: #18831 Reported-by: Niek <niek.brasa@hitachienergy.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18831-e845ac44ebc5dd36@postgresql.org Backpatch-through: 13
* Fix ALTER TABLE error messageÁlvaro Herrera2025-03-04
| | | | | | | | | | | | | | This bogus error message was introduced in 2013 by commit f177cbfe676d, because of misunderstanding the processCASbits() API; at the time, no test cases were added that would be affected by this change. Only in ca87c415e2fc was one added (along with a couple of typos), with an XXX note that the error message was bogus. Fix the whole, add some test cases. Backpatch all the way back. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://postgr.es/m/202503041822.aobpqke3igvb@alvherre.pgsql
* Fix broken handling of domains in atthasmissing logic.Tom Lane2025-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If a domain type has a default, adding a column of that type (without any explicit DEFAULT clause) failed to install the domain's default value in existing rows, instead leaving the new column null. This is unexpected, and it used to work correctly before v11. The cause is confusion in the atthasmissing mechanism about which default value to install: we'd only consider installing an explicitly-specified default, and then we'd decide that no table rewrite is needed. To fix, take the responsibility for filling attmissingval out of StoreAttrDefault, and instead put it into ATExecAddColumn's existing logic that derives the correct value to fill the new column with. Also, centralize the logic that determines the need for default-related table rewriting there, instead of spreading it over four or five places. In the back branches, we'll leave the attmissingval-filling code in StoreAttrDefault even though it's now dead, for fear that some extension may be depending on that functionality to exist there. A separate HEAD-only patch will clean up the now-useless code. Reported-by: jian he <jian.universality@gmail.com> Author: jian he <jian.universality@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACJufxHFssPvkP1we7WMhPD_1kwgbG52o=kQgL+TnVoX5LOyCQ@mail.gmail.com Backpatch-through: 13
* Avoid null pointer dereference crash after OOM in Snowball stemmers.Tom Lane2025-02-18
| | | | | | | | | | | | | | | Absorb upstream bug fix (their commit e322673a841d9abd69994ae8cd20e191090b6ef4), which prevents a null pointer dereference crash if SN_create_env() gets a malloc failure at just the wrong point. Thanks to Maksim Korotkov for discovering the null-pointer bug and submitting the fix to upstream snowball. Reported-by: Maksim Korotkov <m.korotkov@postgrespro.ru> Author: Maksim Korotkov <m.korotkov@postgrespro.ru> Discussion: https://postgr.es/m/1d1a46-67ab1000-21-80c451@83151435 Backpatch-through: 13
* Fix unsafe access to BufferDescriptorsRichard Guo2025-02-19
| | | | | | | | | | | | | | | | When considering a local buffer, the GetBufferDescriptor() call in BufferGetLSNAtomic() would be retrieving a shared buffer with a bad buffer ID. Since the code checks whether the buffer is shared before using the retrieved BufferDesc, this issue did not lead to any malfunction. Nonetheless this seems like trouble waiting to happen, so fix it by ensuring that GetBufferDescriptor() is only called when we know the buffer is shared. Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAHewXNku-o46-9cmUgyv6LkSZ25doDrWq32p=oz9kfD8ovVJMg@mail.gmail.com Backpatch-through: 13
* Translation updatesÁlvaro Herrera2025-02-17
| | | | | Source-Git-URL: ssh://git@git.postgresql.org/pgtranslation/messages.git Source-Git-Hash: 2bcd19355a18178dfe82bde9e98b9486fcd3143f
* Fix assertion on dereferenced objectDaniel Gustafsson2025-02-14
| | | | | | | | | | | | | Commit 27cc7cd2bc8a accidentally placed the assertion ensuring that the pointer isn't NULL after it had already been accessed. Fix by moving the pointer dereferencing to after the assertion. Backpatch to all supported branches. Author: Dmitry Koval <d.koval@postgrespro.ru> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/1618848d-cdc7-414b-9c03-08cf4bef4408@postgrespro.ru Backpatch-through: 13
* Fix MakeTransitionCaptureState() to return a consistent resultMichael Paquier2025-02-13
| | | | | | | | | | | | | | | | | | | | | | | When an UPDATE trigger referencing a new table and a DELETE trigger referencing an old table are both present, MakeTransitionCaptureState() returns an inconsistent result for UPDATE commands in its set of flags and tuplestores holding the TransitionCaptureState for transition tables. As proved by the test added here, this issue causes a crash in v14 and earlier versions (down to 11, actually, older versions do not support triggers on partitioned tables) during cross-partition updates on a partitioned table. v15 and newer versions are safe thanks to 7103ebb7aae8. This commit fixes the function so that it returns a consistent state by using portions of the changes made in commit 7103ebb7aae8 for v13 and v14. v15 and newer versions are slightly tweaked to match with the older versions, mainly for consistency across branches. Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20250207.150238.968446820828052276.horikyota.ntt@gmail.com Backpatch-through: 13
* Translation updatesPeter Eisentraut2025-02-10
| | | | | Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: bca9943c7f737a04c8d42999330ba0602a133523
* Mention jsonlog in description of logging_collector in GUC tableMichael Paquier2025-02-02
| | | | | | | | | | | | | | logging_collector was only mentioning stderr and csvlog, and forgot about jsonlog. Oversight in dc686681e079, that has added support for jsonlog in log_destination. While on it, the description in the GUC table is tweaked to be more consistent with the documentation and postgresql.conf.sample. Author: Umar Hayat Reviewed-by: Ashutosh Bapat, Tom Lane Discussion: https://postgr.es/m/CAD68Dp1K_vBYqBEukHw=1jF7e76t8aszGZTFL2ugi=H7r=a7MA@mail.gmail.com Backpatch-through: 13
* Fix comment of StrategySyncStart()Michael Paquier2025-01-31
| | | | | | | | | | | The top comment of StrategySyncStart() mentions BufferSync(), but this function calls BgBufferSync(), not BufferSync(). Oversight in 9cd00c457e6a. Author: Ashutosh Bapat Discussion: https://postgr.es/m/CAExHW5tgkjag8i-s=RFrCn5KAWDrC4zEPPkfUKczfccPOxBRQQ@mail.gmail.com Backpatch-through: 13
* Avoid integer overflow while testing wal_skip_threshold condition.Tom Lane2025-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | smgrDoPendingSyncs had two distinct risks of integer overflow while deciding which way to ensure durability of a newly-created relation. First, it accumulated the total size of all forks in a variable of type BlockNumber (uint32). While we restrict an individual fork's size to fit in that, I don't believe there's such a restriction on all of them added together. Second, it proceeded to multiply the sum by BLCKSZ, which most certainly could overflow a uint32. (The exact expression is total_blocks * BLCKSZ / 1024. The compiler might choose to optimize that to total_blocks * 8, which is not at quite as much risk of overflow as a literal reading would be, but it's still wrong.) If an overflow did occur it could lead to a poor choice to shove a very large relation into WAL instead of fsync'ing it. This wouldn't be fatal, but it could be inefficient. Change total_blocks to uint64 which should be plenty, and rearrange the comparison calculation to be overflow-safe. I noticed this while looking for ramifications of the proposed change in MAX_KILOBYTES. It's not entirely clear to me why wal_skip_threshold is limited to MAX_KILOBYTES in the first place, but in any case this code is unsafe regardless of the range of wal_skip_threshold. Oversight in c6b92041d which introduced wal_skip_threshold, so back-patch to v13. Discussion: https://postgr.es/m/1a01f0-66ec2d80-3b-68487680@27595217 Backpatch-through: 13
* At update of non-LP_NORMAL TID, fail instead of corrupting page header.Noah Misch2025-01-25
| | | | | | | | | | | | | | | | The right mix of DDL and VACUUM could corrupt a catalog page header such that PageIsVerified() durably fails, requiring a restore from backup. This affects only catalogs that both have a syscache and have DDL code that uses syscache tuples to construct updates. One of the test permutations shows a variant not yet fixed. This makes !TransactionIdIsValid(TM_FailureData.xmax) possible with TM_Deleted. I think core and PGXN are indifferent to that. Per bug #17821 from Alexander Lakhin. Back-patch to v13 (all supported versions). The test case is v17+, since it uses INJECTION_POINT. Discussion: https://postgr.es/m/17821-dd8c334263399284@postgresql.org
* Use the correct sizeof() in BufFileLoadBufferTomas Vondra2025-01-25
| | | | | | | | | | | | | | The sizeof() call should reference buffer.data, because that's the buffer we're reading data into, not the whole PGAlignedBuffer union. This was introduced by 44cac93464, which replaced the simple buffer with a PGAlignedBuffer field. It's benign, because the buffer is the largest field of the union, so the sizes are the same. But it's easy to trip over this in a patch, so fix and backpatch. Commit 44cac93464 went into 12, but that's EOL. Backpatch-through: 13 Discussion: https://postgr.es/m/928bdab1-6567-449f-98c4-339cd2203b87@vondra.me
* Don't ask for bug reports about pthread_is_threaded_np() != 0.Tom Lane2025-01-23
| | | | | | | | | | | | | | | | | | We thought that this condition was unreachable in ExitPostmaster, but actually it's possible if you have both a misconfigured locale setting and some other mistake that causes PostmasterMain to bail out before reaching its own check of pthread_is_threaded_np(). Given the lack of other reports, let's not ask for bug reports if this occurs; instead just give the same hint as in PostmasterMain. Bug: #18783 Reported-by: anani191181515@gmail.com Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/18783-d1873b95a59b9103@postgresql.org Discussion: https://postgr.es/m/206317.1737656533@sss.pgh.pa.us Backpatch-through: 13
* Repair incorrect handling of AfterTriggerSharedData.ats_modifiedcols.Tom Lane2025-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes two distinct errors that both ultimately trace to commit 71d60e2aa, which added the ats_modifiedcols field. The more severe error is that ats_modifiedcols wasn't accounted for in afterTriggerAddEvent's scanning loop that looks for a pre-existing duplicate AfterTriggerSharedData. Thus, a new event could be incorrectly matched to an AfterTriggerSharedData that has a different value of ats_modifiedcols, resulting in the wrong tg_updatedcols bitmap getting passed to the trigger whenever it finally gets fired. We'd not noticed because (a) few triggers consult tg_updatedcols, and (b) we had no tests exercising a case where such a trigger was called as an AFTER trigger. In the test case added by this commit, contrib/lo's trigger fails to remove a large object when expected because (without this fix) it thinks the LO OID column hasn't changed. The other problem was introduced by commit ce5aaea8c, which copied the modified-columns bitmap into trigger-related storage. It made a copy for every trigger event, whereas what we really want is to make a new copy only when we make a new AfterTriggerSharedData entry. (We could imagine adding extra logic to reduce the number of bitmap copies still more, but it doesn't look worthwhile at the moment.) In a simple test of an UPDATE of 10000000 rows with a single AFTER trigger, this thinko roughly tripled the amount of memory consumed by the pending-triggers data structures, from 160446744 to 480443440 bytes. Fixing the first problem requires introducing a bms_equal() call into afterTriggerAddEvent's scanning loop, which is slightly annoying from a speed perspective. However, getting rid of the excessive bms_copy() calls from the second problem balances that out; overall speed of trigger operations is the same or slightly better, in my tests. Discussion: https://postgr.es/m/3496294.1737501591@sss.pgh.pa.us Backpatch-through: 13
* Fix header check for continuation records where standbys could be stuckMichael Paquier2025-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XLogPageRead() checks immediately for an invalid WAL record header on a standby, to be able to handle the case of continuation records that need to be read across two different sources. As written, the check was too generic, applying to any target LSN. Based on an analysis by Kyotaro Horiguchi, what really matters is to make sure that the page header is checked when attempting to read a LSN at the boundary of a segment, to handle the case of a continuation record that spawns across multiple pages when dealing with multiple segments, as WAL receivers are spawned they request WAL from the beginning of a segment. This fix has been proposed by Kyotaro Horiguchi. This could cause standbys to loop infinitely when dealing with a continuation record during a timeline jump, in the case where the contents of the record in the follow-up page are invalid. Some regression tests are added to check such scenarios, able to reproduce the original problem. In the test, the contents of a continuation record are overwritten with junk zeros on its follow-up page, and replayed on standbys. This is inspired by 039_end_of_wal.pl, and is enough to show how standbys should react on promotion by not being stuck. Without the fix, the test would fail with a timeout. The test to reproduce the problem has been written by Alexander Kukushkin. The original check has been introduced in 066871980183, for a similar problem. Author: Kyotaro Horiguchi, Alexander Kukushkin Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAFh8B=mozC+e1wGJq0H=0O65goZju+6ab5AU7DEWCSUA2OtwDg@mail.gmail.com Backpatch-through: 13
* Revert recent changes related to handling of 2PC files at recoveryMichael Paquier2025-01-17
| | | | | | | | | | | | | | | | | | | | This commit reverts 8f67f994e8ea (down to v13) and c3de0f9eed38 (down to v17), as these are proving to not be completely correct regarding two aspects: - In v17 and newer branches, c3de0f9eed38's check for epoch handling is incorrect, and does not correctly handle frozen epochs. A logic closer to widen_snapshot_xid() should be used. The 2PC code should try to integrate deeper with FullTransactionIds, 5a1dfde8334b being not enough. - In v13 and newer branches, 8f67f994e8ea is a workaround for the real issue, which is that we should not attempt CLOG lookups without reaching consistency. This exists since 728bd991c3c4, and this is reachable with ProcessTwoPhaseBuffer() called by restoreTwoPhaseData() at the beginning of recovery. Per discussion with Noah Misch. Discussion: https://postgr.es/m/20250116010051.f3.nmisch@google.com Backpatch-through: 13
* Fix setrefs.c's failure to do expression processing on prune steps.Tom Lane2025-01-16
| | | | | | | | | | | | | | | | | We should run the expression subtrees of PartitionedRelPruneInfo structs through fix_scan_expr. Failure to do so means that AlternativeSubPlans within those expressions won't be cleaned up properly, resulting in "unrecognized node type" errors since v14. It seems fairly likely that at least some of the other steps done by fix_scan_expr are important here as well, resulting in as-yet- undetected bugs. Therefore, I've chosen to back-patch this to all supported branches including v13, even though the known symptom doesn't manifest in v13. Per bug #18778 from Alexander Lakhin. Discussion: https://postgr.es/m/18778-24cd399df6c806af@postgresql.org
* Fix catcache invalidation of a list entry that's being builtHeikki Linnakangas2025-01-14
| | | | | | | | | | | | | | | | | | | If a new catalog tuple is inserted that belongs to a catcache list entry, and cache invalidation happens while the list entry is being built, the list entry might miss the newly inserted tuple. To fix, change the way we detect concurrent invalidations while a catcache entry is being built. Keep a stack of entries that are being built, and apply cache invalidation to those entries in addition to the real catcache entries. This is similar to the in-progress list in relcache.c. Back-patch to all supported versions. (This commit to v13 a few hours later than other branches, because I somehow missed v13 in the first batch.) Reviewed-by: Noah Misch Discussion: https://www.postgresql.org/message-id/2234dc98-06fe-42ed-b5db-ac17384dc880@iki.fi