aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Avoid scribbling of VACUUM optionsMichael Paquier2025-06-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes two issues with the handling of VacuumParams in vacuum_rel(). This code path has the idea to change the passed-in pointer of VacuumParams for the "truncate" and "index_cleanup" options for the relation worked on, impacting the two following scenarios where incorrect options may be used because a VacuumParams pointer is shared across multiple relations: - Multiple relations in a single VACUUM command. - TOAST relations vacuumed with their main relation. The problem is avoided by providing to the two callers of vacuum_rel() copies of VacuumParams, before the pointer is updated for the "truncate" and "index_cleanup" options. The refactoring of the VACUUM option and parameters done in 0d831389749a did not introduce an issue, but it has encouraged the problem we are dealing with in this commit, with b84dbc8eb80b for "truncate" and a96c41feec6b for "index_cleanup" that have been added a couple of years after the initial refactoring. HEAD will be improved with a different patch that hardens the uses of VacuumParams across the tree. This cannot be backpatched as it introduces an ABI breakage. The backend portion of the patch has been authored by Nathan, while I have implemented the tests. The tests rely on injection points to check the option values, making them faster, more reliable than the tests originally proposed by Shihao, and they also provide more coverage. This part can only be backpatched down to v17. Reported-by: Shihao Zhong <zhong950419@gmail.com> Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAGRkXqTo+aK=GTy5pSc-9cy8H2F2TJvcrZ-zXEiNJj93np1UUw@mail.gmail.com Backpatch-through: 13
* Prevent excessive delays before launching new logrep workers.Tom Lane2025-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The logical replication launcher process would sometimes sleep for as much as 3 minutes before noticing that it is supposed to launch a new worker. This could happen if (1) WaitForReplicationWorkerAttach absorbed a process latch wakeup that was meant to cause ApplyLauncherMain to do work, or (2) logicalrep_worker_launch reported failure, either because of resource limits or because the new worker terminated immediately. In case (2), the expected behavior is that we retry the launch after wal_retrieve_retry_interval, but that didn't reliably happen. It's not clear how often such conditions would occur in the field, but in our subscription test suite they are somewhat common, especially in tests that exercise cases that cause quick worker failure. That causes the tests to take substantially longer than they ought to do on typical setups. To fix (1), make WaitForReplicationWorkerAttach re-set the latch before returning if it cleared it while looping. To fix (2), ensure that we reduce wait_time to no more than wal_retrieve_retry_interval when logicalrep_worker_launch reports failure. In passing, fix a couple of perhaps-hypothetical race conditions, e.g. examining worker->in_use without a lock. Backpatch to v16. Problem (2) didn't exist before commit 5a3a95385 because the previous code always set wait_time to wal_retrieve_retry_interval when launching a worker, regardless of success or failure of the launch. That behavior also greatly mitigated problem (1), so I'm not excited about adapting the remainder of the patch to the substantially-different code in older branches. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/817604.1750723007@sss.pgh.pa.us Backpatch-through: 16
* Make query jumbling also squash PARAM_EXTERN paramsÁlvaro Herrera2025-06-24
| | | | | | | | | | | | | | | | | | | | | | Commit 62d712ecfd94 made query jumbling squash lists of Consts as a single element, but there's no reason not to treat PARAM_EXTERN parameters the same. For these purposes, these values are indeed constants for any particular execution of a query. In particular, this should make list squashing more useful for applications using extended query protocol, which would use parameters extensively. A complication arises: if a query has both external parameters and squashable lists, then the parameter number used as placeholder for the squashed list might be inconsistent with regards to the parameter numbers used by the query literal. To reduce the surprise factor, all parameters are renumbered starting from 1 in that case. Author: Sami Imseih <samimseih@gmail.com> Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAA5RZ0tRXoPG2y6bMgBCWNDt0Tn=unRerbzYM=oW0syi1=C1OA@mail.gmail.com
* Improve jumble squashing through CoerceViaIO and RelabelTypeÁlvaro Herrera2025-06-24
| | | | | | There's no principled reason for query jumbling to only remove the first layer of RelabelType and CoerceViaIO. Change it to see through as many layers as there are.
* Fix virtual generated column type checking for ALTER TABLEPeter Eisentraut2025-06-24
| | | | | | | | | | Virtual generated columns have some special checks in CheckAttributeType(), mainly to check that domains are not used. But this check was only applied during CREATE TABLE, not during ALTER TABLE. This fixes that. Reported-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/CACJufxE0KHR__-h=zHXbhSNZXMMs4LYo4-dbj8H3YoStYBok1Q@mail.gmail.com
* Fix missing comment update in 1462aad2e4.Amit Kapila2025-06-24
| | | | | | | | | Remove the part of comment that says we don't allow toggling two_phase option as that is supported in commit 1462aad2e4. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Author: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OSCPR01MB1496656725F3951AEE8749EBDF579A@OSCPR01MB14966.jpnprd01.prod.outlook.com
* Remove excess assert from InvalidatePossiblyObsoleteSlot()Alexander Korotkov2025-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | ca307d5cec90 introduced keeping WAL segments by slot's last saved restart LSN. It also added an assertion that the slot's restart LSN never goes backward. However, situations when the restart LSN goes backward have been spotted by buildfarm animals and investigated in the thread. When pg_receivewal starts the replication, it sets the last replayed LSN to the beginning of the segment, which is older than what ReplicationSlotReserveWal() set for the slot. A similar situation can happen to pg_basebackup. When standby reconnects to the primary, it sends the last replayed LSN, which might be older than the last confirmed flush LSN. In both these situations, a concurrent checkpoint may trigger an assert trap. Based on ideas from Vitaly Davydov <v.davydov@postgrespro.ru>, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>, Vignesh C <vignesh21@gmail.com>, Amit Kapila <amit.kapila16@gmail.com>. Reported-by: Vignesh C <vignesh21@gmail.com> Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CALDaNm3s-jpQTe1MshsvQ8GO%3DTLj233JCdkQ7uZ6pwqRVpxAdw%40mail.gmail.com Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
* Doc: improve documentation about width_bucket().Tom Lane2025-06-21
| | | | | | | | | | | | | | | | Specify whether the bucket bounds are inclusive or exclusive, and improve some other vague language. Explain the behavior that occurs when the "low" bound is greater than the "high" bound. Make width_bucket_numeric's comment more like that for width_bucket_float8, in particular noting that infinite bounds are rejected (since they became possible in v14). Reported-by: Ben Peachey Higdon <bpeacheyhigdon@gmail.com> Author: Robert Treat <rob@xzilla.net> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/2BD74F86-5B89-4AC1-8F13-23CED3546AC1@gmail.com Backpatch-through: 13
* Remove planner's have_dangerous_phv() join-order restriction.Tom Lane2025-06-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 85e5e222b, which added (a forerunner of) this logic, argued that Adding the necessary complexity to make this work doesn't seem like it would be repaid in significantly better plans, because in cases where such a PHV exists, there is probably a corresponding join order constraint that would allow a good plan to be found without using the star-schema exception. The flaw in this claim is that there may be other join-order restrictions that prevent us from finding a join order that doesn't involve a "dangerous" PHV. In particular we now recognize that small join_collapse_limit or from_collapse_limit could prevent it. Therefore, let's bite the bullet and make the case work. We don't have to extend the executor's support for nestloop parameters as I thought at the time, because we can instead push the evaluation of the placeholder's expression into the left-hand input of the NestLoop node. So there's not really a lot of downside to this solution, and giving the planner more join-order flexibility should have value beyond just avoiding failure. Having said that, there surely is a nonzero risk of introducing new bugs. Since this failure mode escaped detection for ten years, such cases don't seem common enough to justify a lot of risk. Therefore, let's put this fix into master but leave the back branches alone (for now anyway). Bug: #18953 Reported-by: Alexander Lakhin <exclusion@gmail.com> Diagnosed-by: Richard Guo <guofenglinux@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18953-1c9883a9d4afeb30@postgresql.org
* Use SnapshotDirty when checking for conflicting index names.Tom Lane2025-06-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While choosing an autogenerated name for an index, look for pre-existing relations using a SnapshotDirty snapshot, instead of the previous behavior that considered only committed-good pg_class rows. This allows us to detect and avoid conflicts against indexes that are still being built. It's still possible to fail due to a race condition, but the window is now just the amount of time that it takes DefineIndex to validate all its parameters, call smgrcreate(), and enter the index's pg_class row. Formerly the race window covered the entire time needed to create and fill an index, which could be very long if the table is large. Worse, if the conflicting index creation is part of a larger transaction, it wouldn't be visible till COMMIT. So this isn't a complete solution, but it should greatly ameliorate the problem, and the patch is simple enough to be back-patchable. It might at some point be useful to do the same for pg_constraint entries (cf. ChooseConstraintName, ConstraintNameExists, and related functions). However, in the absence of field complaints, I'll leave that alone for now. The relation-name test should be good enough for index-based constraints, while foreign-key constraints seem to be okay since they require exclusive locks to create. Bug: #18959 Reported-by: Maximilian Chrzan <maximilian.chrzan@here.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/18959-f63b53b864bb1417@postgresql.org Backpatch-through: 13
* Improve log messages and docs for slot synchronization.Amit Kapila2025-06-19
| | | | | | | | | | | | | | | | | | | | | | | Improve the clarity of LOG messages when a failover logical slot synchronization fails, making the reasons more explicit for easier debugging. Update the documentation to outline scenarios where slot synchronization can fail, especially during the initial sync, and emphasize that pg_sync_replication_slot() is primarily intended for testing and debugging purposes. We also discussed improving the functionality of pg_sync_replication_slot() so that it can be used reliably, but we would take up that work for next version after some more discussion and review. Reported-by: Suraj Kharage <suraj.kharage@enterprisedb.com> Author: shveta malik <shveta.malik@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 17, where it was introduced Discussion: https://postgr.es/m/CAF1DzPWTcg+m+x+oVVB=y4q9=PYYsL_mujVp7uJr-_oUtWNGbA@mail.gmail.com
* doc: Mention GIN indexes support parallel builds.Fujii Masao2025-06-19
| | | | | | | | | | | | | Commit 8492feb98f6 added support for parallel CREATE INDEX on GIN indexes. However, previously two places in the documentation and two in the source code comments still stated that only B-tree and BRIN indexes support parallel builds. This commit updates those references to correctly include GIN indexes. Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Robert Treat <rob@xzilla.net> Discussion: https://postgr.es/m/7d27d068-90e2-4022-9bd7-09b0fd3d4f47@oss.nttdata.com
* Document "relrewrite" at the top of heap_create_with_catalog()Michael Paquier2025-06-18
| | | | | | | | | | This parameter has been introduced in 325f2ec5557f, and it was not documented contrary to all the other arguments of heap_create_with_catalog(). Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-by: Steven Niu <niushiji@gmail.com> Discussion: https://postgr.es/m/aE--bmEv-gJUTH5v@paquier.xyz
* Fix re-distributing previously distributed invalidation messages during ↵Masahiko Sawada2025-06-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | logical decoding. Commit 4909b38af0 introduced logic to distribute invalidation messages from catalog-modifying transactions to all concurrent in-progress transactions. However, since each transaction distributes not only its original invalidation messages but also previously distributed messages to other transactions, this leads to an exponential increase in allocation request size for invalidation messages, ultimately causing memory allocation failure. This commit fixes this issue by tracking distributed invalidation messages separately per decoded transaction and not redistributing these messages to other in-progress transactions. The maximum size of distributed invalidation messages that one transaction can store is limited to MAX_DISTR_INVAL_MSG_PER_TXN (8MB). Once the size of the distributed invalidation messages exceeds this threshold, we invalidate all caches in locations where distributed invalidation messages need to be executed. Back-patch to all supported versions where we introduced the fix by commit 4909b38af0. Note that this commit adds two new fields to ReorderBufferTXN to store the distributed transactions. This change breaks ABI compatibility in back branches, affecting third-party extensions that depend on the size of the ReorderBufferTXN struct, though this scenario seems unlikely. Additionally, it adds a new flag to the txn_flags field of ReorderBufferTXN to indicate distributed invalidation message overflow. This should not affect existing implementations, as it is unlikely that third-party extensions use unused bits in the txn_flags field. Bug: #18938 #18942 Author: vignesh C <vignesh21@gmail.com> Reported-by: Duncan Sands <duncan.sands@deepbluecap.com> Reported-by: John Hutchins <john.hutchins@wicourts.gov> Reported-by: Laurence Parry <greenreaper@hotmail.com> Reported-by: Max Madden <maxmmadden@gmail.com> Reported-by: Braulio Fdo Gonzalez <brauliofg@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/680bdaf6-f7d1-4536-b580-05c2760c67c6@deepbluecap.com Discussion: https://postgr.es/m/18942-0ab1e5ae156613ad@postgresql.org Discussion: https://postgr.es/m/18938-57c9a1c463b68ce0@postgresql.org Discussion: https://postgr.es/m/CAD1FGCT2sYrP_70RTuo56QTizyc+J3wJdtn2gtO3VttQFpdMZg@mail.gmail.com Discussion: https://postgr.es/m/CANO2=B=2BT1hSYCE=nuuTnVTnjidMg0+-FfnRnqM6kd23qoygg@mail.gmail.com Backpatch-through: 13
* Fix possible Assert failure in verify_compact_attribute()David Rowley2025-06-17
| | | | | | | | | | | | | | | | | Sometimes the TupleDesc used in verify_compact_attribute() is shared among backends, and since CompactAttribute.attcacheoff gets updated during tuple deformation, it was possible that another backend would set attcacheoff on a given CompactAttribute in the small window of time from when the attcacheoff from the live CompactAttribute was being set in the 'tmp' CompactAttribute and before the Assert verifying that the live and tmp CompactAttributes matched. Here we adjust the code to make a copy of the live CompactAttribute so that we're not trying to Assert against a shared copy of it. Author: David Rowley <dgrowleyml@gmail.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/7195e408-758c-4031-8e61-4f842c716ac0@gmail.com
* aio: Add missing memory barrier when waiting for IO handleAndres Freund2025-06-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously there was no memory barrier enforcing correct memory ordering when waiting for a free IO handle. However, in the much more common case of waiting for IO to complete, memory barriers already were present. On strongly ordered architectures like x86 this had no negative consequences, but on some armv8 hardware (observed on Apple hardware), it was possible for the update, in the IO worker, to PgAioHandle->state to become visible before ->distilled_result becoming visible, leading to rather confusing assertion failures. The failures were rare enough that the bug sometimes took days to reproduce when running 027_stream_regress in a loop. Once finally debugged, it was easy enough to come up with a much quicker repro: Trigger a lot of very fast IO by limiting io_combine_limit to 1 and ensure that we always have to wait for a free handle by setting io_max_concurrency to 1. Triggering lots of concurrent seqscans in that setup triggers the issue within seconds. One reason this was hard to debug was that the assertion failure most commonly happened in WaitReadBuffers(), rather than in the AIO subsystem itself. The assertions added in this commit make problems like this easier to understand. Also add a comment to the IO worker explaining that we rely on the lwlock acquisition for correct memory ordering. I think it'd be good to add a tap test that stress tests buffer IO, but that's material for a separate patch. Thanks a lot to Alexander and Konstantin for all the debugging help. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Alexander Lakhin <exclusion@gmail.com> Investigated-by: Andres Freund <andres@anarazel.de> Investigated-by: Alexander Lakhin <exclusion@gmail.com> Investigated-by: Konstantin Knizhnik <knizhnik@garret.ru> Discussion: https://postgr.es/m/2dkz7azclpeiqcmouamdixyn5xhlzy4rvikxrbovyzvi6rnv5c@pz7o7osv2ahf
* Sync typedefs.list with the buildfarm.Tom Lane2025-06-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | Our maintenance of typedefs.list has been a little haphazard (and apparently we can't alphabetize worth a darn). Replace the file with the authoritative list from our buildfarm, and run pgindent using that. I also updated the additions/exclusions lists in pgindent where necessary to keep pgindent from messing things up significantly. Notably, now that regex_t and some related names are macros not real typedefs, we have to whitelist them explicitly. The exclusions list has also drifted noticeably, presumably due to changes of system headers on the buildfarm animals that contribute to the list. Unlike in prior years, I've not manually added typedef names that are missing from the buildfarm's list because they are not used to declare any variables or fields. So there are a few places where the typedef declaration itself is formatted worse than before, e.g. typedef enum IoMethod. I could preserve the names that were manually added to the list previously, but I'd really prefer to find a less manual way of dealing with these cases. A quick grep finds about 75 such symbols, most of which have never gotten any special treatment. Per discussion among pgsql-release, doing this now seems appropriate even though we're still a week or two away from making the v18 branch.
* Improve comments for TidRangeEvalDavid Rowley2025-06-14
| | | | | | | | | | Here we provide a bit more detail on why TidRangeEval() does return false when trss_mintid is greater than trss_maxtid. Reported-by: Junwang Zhao <zhjwpku@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAEG8a3KUbUUqQgfK5X8Sj-%2BppPtGNTU%2BZiep0Rxr7SLjoR%2BB6w%40mail.gmail.com
* Add TAP tests to check replication slot advance during the checkpointAlexander Korotkov2025-06-14
| | | | | | | | | | | | | | | | | | | | | The new tests verify that logical and physical replication slots are still valid after an immediate restart on checkpoint completion when the slot was advanced during the checkpoint. This commit introduces two new injection points to make these tests possible: * checkpoint-before-old-wal-removal - triggered in the checkpointer process just before old WAL segments cleanup; * logical-replication-slot-advance-segment - triggered in LogicalConfirmReceivedLocation() when restart_lsn was changed enough to point to the next WAL segment. Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497 Author: Vitaly Davydov <v.davydov@postgrespro.ru> Author: Tomas Vondra <tomas@vondra.me> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 17
* Keep WAL segments by slot's last saved restart LSNAlexander Korotkov2025-06-14
| | | | | | | | | | | | | | | | | | | | | The patch fixes the issue with the unexpected removal of old WAL segments after checkpoint, followed by an immediate restart. The issue occurs when a slot is advanced after the start of the checkpoint and before old WAL segments are removed at the end of the checkpoint. The patch introduces a new in-memory state for slots: last_saved_restart_lsn, which is used to calculate the oldest LSN for removing WAL segments. This state is updated every time with the current restart_lsn at the moment when the slot is saved to disk. This fix changes the shared memory layout. It's applied to HEAD only because we don't have to preserve ABI compatibility during the beta stage. Another fix that doesn't affect the ABI is committed to back branches. Discussion: https://postgr.es/m/1d12d2-67235980-35-19a406a0%4063439497 Author: Vitaly Davydov <v.davydov@postgrespro.ru> Author: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
* nbtree: _bt_readnextpage doesn't affect markPos.Peter Geoghegan2025-06-13
| | | | | | | | | | | _bt_readnextpage expects so->currPos.buf to be InvalidBuffer (and for the position's page to be unlocked) when called. However, it does not expect there to be no pins held on any page. In particular, so->markPos might hold a separate pin, both before and after the call. Fix some comments that seemed to suggest otherwise. Follow-up commit to commit 7c319f54, which made _bt_killitems drop pins it acquired itself.
* Comment fixups from 626df47ad9.Jeff Davis2025-06-13
| | | | | Reported-by: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/CAHut+PspbHQmRCBL1c-opoJeTUKUaFFfUQJd2rhDZqwUrWCi7w@mail.gmail.com
* Replace %llu by PRIu64 in AIO io_uring codeMichael Paquier2025-06-13
| | | | | | | | | | | | | This is a continuation of 15a79c73111f, cleaning up the AIO io_uring code that has been committed after that while still using %llu. The code changed here is new in v18, so cleaning things now means less conflicts if this area of the code changes on backpatch once the 18 stable branch is created. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/aEZcGCnYFq642q8k@paquier.xyz
* Fix squashing algorithm for query textsÁlvaro Herrera2025-06-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | The algorithm to squash lists of constants added by commit 62d712ecfd94 was a bit too simplistic; we wanted to avoid adding unnecessary complexity, but cases like direct function calls of typecasting functions (and others) were missed, and bogus SQL syntax was being shown in pg_stat_statements normalized query text field. To fix normalization for those cases, we need the parser to transmit information about were each list of constant values starts and ends, so add that to a couple of nodes. Also add a few more test cases to make sure we're doing the right thing. The patch initially submitted by Sami added a new private struct in gram.y to carry the start/end information for A_Expr, but I (Álvaro) decided that a better fix was to remove the parser indirection via the in_expr production, and instead create separate components in the a_expr rule. I'm surprised that this works and doesn't require more changes, but I assume (without checking) that the grammar used to be more complex and got simplified at some point. Bump catversion. Author: Sami Imseih <samimseih@gmail.com> Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAA5RZ0tRXoPG2y6bMgBCWNDt0Tn=unRerbzYM=oW0syi1=C1OA@mail.gmail.com
* Revert support for improved tracking of nested queriesMichael Paquier2025-06-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit reverts the two following commits: - 499edb09741b, track more precisely query locations for nested statements. - 06450c7b8c70, a follow-up fix of 499edb09741b with query locations. The test introduced in this commit is not reverted. This is proving useful to track a problem that only pgaudit was able to detect. These prove to have issues with the tracking of SELECT statements, when these use multiple parenthesis which is something supported by the grammar. Incorrect location and lengths are causing pg_stat_statements to become confused, failing its job in query normalization with potential out-of-bound writes because the location and the length may not match with what can be handled. A lot of the query patterns discussed when this issue was reported have no test coverage in the main regression test suite, or the recovery test 027_stream_regress.pl would have caught the problems as pg_stat_statements is loaded by the node running the regression tests. A first step would be to improve the test coverage to stress more the query normalization logic. A different portion of this work was done in 45e0ba30fc40, with the addition of tests for nested queries. These can be left in the tree. They are useful to track the way inner queries are currently tracked by PGSS with non-top-level entries, and will be useful when reconsidering in the future the work reverted here. Reported-by: Alexander Kozhemyakin <a.kozhemyakin@postgrespro.ru> Discussion: https://postgr.es/m/18947-cdd2668beffe02bf@postgresql.org
* Revert "nbtree: Remove useless row compare arg."Peter Geoghegan2025-06-11
| | | | | | | | | | This reverts commit 54c6ea8c81db718508eeea50991d3c1c5dff54a5. Further analysis has shown that the forcenonrequired row compare behavior is in fact necessary, despite the new restrictions on RowCompares imposed by _bt_set_startikey following commit 5f4d98d4. Discussion: https://postgr.es/m/CAH2-Wzm3bKcz3TbHGem3_+SinEyG=VZVPbApQghp7YiZj+MM3g@mail.gmail.com
* Revert a few small patches that were intended for version 19.Jeff Davis2025-06-11
| | | | | | | | | | | - 4c787a24e7e220a60022e47c1776f22f72902899 - 78bd364ee39ca70a8f9cb8719282389866a08e14 - 7a6880fadc177873d5663961ec3a02d67e34dcbe - 8898082a5d3e94eef073f0e08124137e096e78ef Suggested-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CA+TgmoZ=J=PVNZUNKaxULu+KUVSt3Y-aJ1DZ9Y3Co6mu0z62jA@mail.gmail.com Discussion: https://postgr.es/m/60e8c6d0a6c08e67f15dbbe9e53df0119c710065.camel@j-davis.com
* Make _bt_killitems drop pins it acquired itself.Peter Geoghegan2025-06-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach nbtree's _bt_killitems to leave the so->currPos page that it sets LP_DEAD items on in whatever state it was in when _bt_killitems was called. In particular, make sure that so->dropPin scans don't acquire a pin whose reference is saved in so->currPos.buf. Allowing _bt_killitems to change so->currPos.buf like this is wrong. The immediate consequence of allowing it is that code in _bt_steppage (that copies so->currPos into so->markPos) will behave as if the scan is a !so->dropPin scan. so->markPos will therefore retain the buffer pin indefinitely, even though _bt_killitems only needs to acquire a pin (along with a lock) for long enough to mark known-dead items LP_DEAD. This issue came to light following a report of a failure of an assertion from recent commit e6eed40e. The test case in question involves the use of mark and restore. An initial call to _bt_killitems takes place that leaves so->currPos.buf in a state that is inconsistent with the scan being so->dropPin. A subsequent call to _bt_killitems for the same position (following so->currPos being saved in so->markPos, and then restored as so->currPos) resulted in the failure of an assertion that tests that so->currPos.buf is InvalidBuffer when the scan is so->dropPin (non-assert builds got a "resource was not closed" WARNING instead). The same problem exists on earlier releases, though the issue is far more subtle there. Recent commit e6eed40e introduced the so->dropPin field as a partial replacement for testing so->currPos.buf directly. Earlier releases won't get an assertion failure (or buffer pin leak), but they will allow the second _bt_killitems call from the test case to behave as if a buffer pin was consistently held since the original call to _bt_readpage. This is wrong; there will have been an initial window during which no pin was held on the so->currPos page, and yet the second _bt_killitems call will neglect to check if so->currPos.lsn continues to match the page's now-current LSN. As a result of all this, it's just about possible that _bt_killitems will set the wrong items LP_DEAD (on release branches). This could only happen with merge joins (the sole user of nbtree mark/restore support), when a concurrently inserted index tuple used a recently-recycled TID (and only when the new tuple was inserted onto the same page as a distinct concurrently-removed tuple with the same TID). This is exactly the scenario that _bt_killitems' check of the page's now-current LSN against the LSN stashed in currPos was supposed to prevent. A follow-up commit will make nbtree completely stop conditioning whether or not a position's pin needs to be dropped on whether the 'buf' field is set. All call sites that might need to drop a still-held pin will be taught to rely on the scan-level so->dropPin field recently introduced by commit e6eed40e. That will make bugs of the same general nature as this one impossible (or make them much easier to detect, at least). Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/545be1e5-3786-439a-9257-a90d30f8b849@gmail.com Backpatch-through: 13
* Don't reduce output request size on non-Unix-socket connections.Tom Lane2025-06-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally, libpq's pqPutMsgEnd has rounded down the amount-to-send to be a multiple of 8K when it is eagerly writing some data. This still seems like a good idea when sending through a Unix socket, as pipes typically have a buffer size of 8K or some fraction/multiple of that. But there's not much argument for it on a TCP connection, since (a) standard MTU values are not commensurate with that, and (b) the kernel typically applies its own packet splitting/merging logic. Worse, our SSL and GSSAPI code paths both have API stipulations that if they fail to send all the data that was offered in the previous write attempt, we mustn't offer less data in the next attempt; else we may get "SSL error: bad length" or "GSSAPI caller failed to retransmit all data needing to be retried". The previous write attempt might've been pqFlush attempting to send everything in the buffer, so pqPutMsgEnd can't safely write less than the full buffer contents. (Well, we could add some more state to track exactly how much the previous write attempt was, but there's little value evident in such extra complication.) Hence, apply the round-down only on AF_UNIX sockets, where we never use SSL or GSSAPI. Interestingly, we had a very closely related bug report before, which I attempted to fix in commit d053a879b. But the test case we had then seemingly didn't trigger this pqFlush-then-pqPutMsgEnd scenario, or at least we failed to recognize this variant of the bug. Bug: #18907 Reported-by: Dorjpalam Batbaatar <htgn.dbat.95@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18907-d41b9bcf6f29edda@postgresql.org Backpatch-through: 13
* inet_net_pton.c: use pg_ascii_tolower() rather than tolower().Jeff Davis2025-06-10
| | | | | | | Avoid dependence on setlocale(). No behavior change. Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
* copyfromparse.c: use pg_ascii_tolower() rather than tolower().Jeff Davis2025-06-10
| | | | | | | Avoid dependence on setlocale(). No behavior change. Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
* Revert "postgres_fdw: Inherit the local transaction's access/deferrable modes."Etsuro Fujita2025-06-08
| | | | | | | | | | We concluded that commit e5a3c9d9b is a feature rather than a fix; since it was added after feature freeze, revert it. Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reported-by: Michael Paquier <michael@paquier.xyz> Reported-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/ed2296f1-1a6b-4932-b870-5bb18c2591ae%40oss.nttdata.com
* Improve CREATE DATABASE error message for invalid libc locale.Jeff Davis2025-06-06
| | | | | Discussion: https://postgr.es/m/73959a14-267b-49c1-8293-291b175682cb@manitou-mail.org Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
* Use NULL instead of 0 for pointer arguments.Nathan Bossart2025-06-06
| | | | | | | | | Commit 5fe08c006c fixed this for calls to dshash_create(). This commit fixes calls to dshash_attach() and dsa_create_in_place(). Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/aECi_gSD9JnVWQ8T%40nathan
* Avoid BufferGetLSNAtomic() calls during nbtree scans.Peter Geoghegan2025-06-06
| | | | | | | | | | | | | | | | | | | | | | | | Delay calling BufferGetLSNAtomic() until we finish reading a page that actually contains items that btgettuple will return to the executor. This reduces the number of calls during plain index scans (we'll only call BufferGetLSNAtomic() when _bt_readpage returns true), and totally eliminates calls during index-only scans, bitmap index scans, and plain index scans of an unlogged relation. Currently, when checksums (or wal_log_hints) are enabled, acquiring a page's LSN in BufferGetLSNAtomic() involves locking the buffer header (which involves the use of spinlocks). Testing has shown that enabling page-level checksums causes large regressions with certain workloads, especially on larger multi-socket systems. The regression isn't tied to any Postgres 18 commit. However, Postgres 18 commit 04bec894 made initdb use checksums by default, so it seems prudent to address the problem now. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/941f0190-e3c6-4622-9ac7-c04e936e5fdb@vondra.me Discussion: https://postgr.es/m/CAH2-Wzk-Dg5XWs_jDuiHt4_7ryrSY+n=vxmHY51EVqPDFsKXmg@mail.gmail.com
* nbtree: Remove useless row compare arg.Peter Geoghegan2025-06-05
| | | | | | | Use of a RowCompare key makes nbtree index scans ineligible to use pstate.forcenonrequired following recent bugfix commit 5f4d98d4. There's no longer any need for _bt_check_rowcompare to accept a forcenonrequired argument, so remove it.
* Avoid bogus scans of partitions when marking FKs enforcedÁlvaro Herrera2025-06-05
| | | | | | | | | | | | | | Similar to commit cc733ed164c5: when an unenforced foreign key that references a partitioned table is altered to be enforced, we scan the constrained table based on each partition on the referenced partitioned table. This is bogus and likely to cause the ALTER TABLE to fail: we must only scan the constrained table as pointing to the top-level partitioned table. Oversight in commit eec0040c4bcd. Fix by eliding those scans. Author: Amul Sul <sulamul@gmail.com> Reported-by: jian he <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxF1e_gPOLtsDoaE4VCgQPC8KZW_kPAjPR5Rvv4Ew=fb2A@mail.gmail.com
* Avoid bogus scans of partitions when validating FKs to partitioned tablesÁlvaro Herrera2025-06-05
| | | | | | | | | | | | | | | | | | | | | Validating an unvalidated foreign key that references a partitioned table would try to queue validations for each individual partition of the referenced table, but this is wrong: each individual partition would not necessarily have all the referenced rows, so errors would be raised. Avoid doing that. The pg_constraint rows that cause this to happen are only there to support the action triggers that implement the DELETE/ UPDATE actions of the FK, so no validating scan is necessary. This was an oversight in commit b663b9436e75. An equivalent oversight exists for NOT ENFORCED constraints, which is not fixed in this commit. Author: Amul Sul <sulamul@gmail.com> Reported-by: Antonin Houska <ah@cybertec.at> Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/26983.1748418675@localhost
* Fix copy-pasto with process count calculation in method_io_uring.cMichael Paquier2025-06-05
| | | | | | | | | | | | | This commit replaces the formula used for "TotalProcs" with a call to pgaio_uring_procs() in pgaio_uring_shmem_init() for the shared memory initialization, which is exactly the same, removing a duplication. pgaio_uring_procs() is used for shared memory sizing and a sanity check, and it has some documentation explaining some reasoning behind the formula. Author: Japin Li <japinli@hotmail.com> Discussion: https://postgr.es/m/ME0P300MB044521067A1EDDA9EDEC3793B66DA@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
* Don't strip $libdir from LOAD commandPeter Eisentraut2025-06-04
| | | | | | | | | | | | | | | | | | | | | | | | Commit 4f7f7b03758 implemented the extension_control_path GUC, and to make it work it was decided that we should strip the $libdir/ on module_pathname from .control files, so that extensions don't need to worry about this change. This strip logic was implemented on expand_dynamic_library_name() which works fine when executing the SQL functions from extensions, but this function is also called when the LOAD command is executed, and since the user may explicitly pass the $libdir prefix on LOAD parameter, we should not strip in this case. This commit fixes this issue by moving the strip logic from expand_dynamic_library_name() to load_external_function() that is called when the running the SQL script from extensions. Reported-by: Evan Si <evsi@amazon.com> Author: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Rahila Syed <rahilasyed90@gmail.com> Bug: #18920 Discussion: https://www.postgresql.org/message-id/flat/18920-b350b1c0a30af006%40postgresql.org
* Fix incorrect format placeholdersPeter Eisentraut2025-06-03
|
* Rename log_lock_failure GUC to log_lock_failures for consistency.Fujii Masao2025-06-03
| | | | | | | | | | | This commit renames the GUC log_lock_failure to log_lock_failures to align with the existing similar setting log_lock_waits, which uses the plural form. This improves naming consistency across related GUCs. Suggested-by: Peter Eisentraut <peter@eisentraut.org> Author: Fujii Masao <masao.fujii@gmail.com Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/7a8198b6-d5b8-4910-b41e-8d3efcbb015d@eisentraut.org
* Disallow "=" in names of reloptions and foreign-data options.Tom Lane2025-06-02
| | | | | | | | | | | | | | | | | | | | We store values for these options as array elements with the syntax "name=value", hence a name containing "=" confuses matters when it's time to read the array back in. Since validation of the options is often done (long) after this conversion to array format, that leads to confusing and off-point error messages. We can improve matters by rejecting names containing "=" up-front. (Probably a better design would have involved pairs of array elements, but it's too late now --- and anyway, there's no evident use-case for option names like this. We already reject such names in some other contexts such as GUCs.) Reported-by: Chapman Flack <jcflack@acm.org> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chapman Flack <jcflack@acm.org> Discussion: https://postgr.es/m/6830EB30.8090904@acm.org Backpatch-through: 13
* Correct heap vacuum boundary state setup orderingMelanie Plageman2025-06-02
| | | | | | | | | | | | | | | | | | | | | | | 052026c9b9 mistakenly reordered setup steps in heap_vacuum_rel(), incorrectly moving RelationGetNumberOfBlocks() before vacuum_get_cutoffs(). OldestXmin must be determined before RelationGetNumberOfBlocks() calculates the number of blocks in the relation that will be vacuumed. Otherwise tuples older than OldestXmin may be inserted into the end of the relation into blocks that are not vacuumed. If additional tuples newer than those inserted into unscanned blocks but older than OldestXmin are inserted into free space earlier in the relation, the result could be advancing pg_class.relfrozenxid to a newer value than an unfrozen XID in one of the unscanned heap pages. Assigning an incorrect relfrozenxid can lead to data loss, so it is imperative that it correctly reflect the oldest unfrozen xid. Reported-by: Peter Geoghegan <pg@bowt.ie> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzntqvVEdbbpqG5JqSZGuLWmy4PBfUO-OswfivKchr2gvw%40mail.gmail.com
* Fix incorrect format placeholdersPeter Eisentraut2025-06-02
| | | | Fixes for return type of dclist_count().
* Rename gist stratnum support functionPeter Eisentraut2025-06-02
| | | | | | | | | | | | | | | | | | | | | | | | Commit 7406ab623fe added a gist support function that we internally refer to by the symbol GIST_STRATNUM_PROC. This translated from "well-known" strategy numbers to opfamily-specific strategy numbers. However, we later (commit 630f9a43cec) changed this to fit into index-AM-level compare type mapping, so this function actually now maps from compare type to opfamily-specific strategy numbers. So this name is no longer fitting. Moreover, the index AM level also supports the opposite, a function to map from strategy number to compare type. This is currently not supported in gist, but one might wonder what this function is supposed to be called when it is added. This patch changes the naming of the gist-level functionality to be more in line with the index-AM-level functionality. This makes sense because these are essentially the same thing on different levels. This also changes the names of the externally visible functions that are provided for use as such a support function. Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Discussion: https://www.postgresql.org/message-id/37ebb1d9-9036-485f-a215-e55435689917%40eisentraut.org
* Use replay LSN as target for cascading logical WAL sendersMichael Paquier2025-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A cascading WAL sender doing logical decoding (as known as doing its work on a standby) has been using as flush LSN the value returned by GetStandbyFlushRecPtr() (last position safely flushed to disk). This is incorrect as such processes are only able to decode changes up to the LSN that has been replayed by the startup process. This commit changes cascading logical WAL senders to use the replay LSN, as returned by GetXLogReplayRecPtr(). This distinction is important particularly during shutdown, when WAL senders need to send any remaining available data to their clients, switching WAL senders to a caught-up state. Using the latest flush LSN rather than the replay LSN could cause the WAL senders to be stuck in an infinite loop preventing them to shut down, as the startup process does not run when WAL senders attempt to catch up, so they could keep waiting for work that would never happen. Backpatch down to v16, where logical decoding on standbys has been introduced. Author: Alexey Makhmutov <a.makhmutov@postgrespro.ru> Reviewed-by: Ajin Cherian <itsajin@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/52138028-7246-421c-9161-4fa108b88070@postgrespro.ru Backpatch-through: 16
* postgres_fdw: Inherit the local transaction's access/deferrable modes.Etsuro Fujita2025-06-01
| | | | | | | | | | | | | | | | | | | | | | | | | Previously, postgres_fdw always 1) opened a remote transaction in READ WRITE mode even when the local transaction was READ ONLY, causing a READ ONLY transaction using it that references a foreign table mapped to a remote view executing a volatile function to write in the remote side, and 2) opened the remote transaction in NOT DEFERRABLE mode even when the local transaction was DEFERRABLE, causing a SERIALIZABLE READ ONLY DEFERRABLE transaction using it to abort due to a serialization failure in the remote side. To avoid these, modify postgres_fdw to open a remote transaction in the same access/deferrable modes as the local transaction. This commit also modifies it to open a remote subtransaction in the same access mode as the local subtransaction. Although these issues exist since the introduction of postgres_fdw, there have been no reports from the field. So it seems fine to just fix them in master only. Author: Etsuro Fujita <etsuro.fujita@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAPmGK16n_hcUUWuOdmeUS%2Bw4Q6dZvTEDHb%3DOP%3D5JBzo-M3QmpQ%40mail.gmail.com
* Fix MERGE into a plain inheritance parent table.Dean Rasheed2025-05-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a MERGE's target table is the parent of an inheritance tree, any INSERT actions insert into the parent table using ModifyTableState's rootResultRelInfo. However, there are two bugs in the way is initialized: 1. ExecInitMerge() incorrectly uses a different ResultRelInfo entry from ModifyTableState's resultRelInfo array to build the insert projection, which may not be compatible with rootResultRelInfo. 2. ExecInitModifyTable() does not fully initialize rootResultRelInfo. Specifically, ri_WithCheckOptions, ri_WithCheckOptionExprs, ri_returningList, and ri_projectReturning are not initialized. This can lead to crashes, or incorrect query results due to failing to check WCO's or process the RETURNING list for INSERT actions. Fix both these bugs in ExecInitMerge(), noting that it is only necessary to fully initialize rootResultRelInfo if the MERGE has INSERT actions and the target table is a plain inheritance parent. Backpatch to v15, where MERGE was introduced. Reported-by: Andres Freund <andres@anarazel.de> Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/4rlmjfniiyffp6b3kv4pfy4jw3pciy6mq72rdgnedsnbsx7qe5@j5hlpiwdguvc Backpatch-through: 15
* Change internal plan ID type from uint64 to int64Michael Paquier2025-05-31
| | | | | | | | | | | | | | | | uint64 was chosen to be consistent with the type used by the query ID, but the conclusion of a recent discussion for the query ID is that int64 is a better fit as the signed form is shown to the user, for PGSS or EXPLAIN outputs. This commit changes the plan ID to use int64, following c3eda50b0648 that has done the same for the query ID. The plan ID is new to v18, introduced in 2a0cd38da5cc. Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/aCvzJNwetyEI3Sgo@paquier.xyz