aboutsummaryrefslogtreecommitdiff
path: root/src/backend/commands
Commit message (Collapse)AuthorAge
* Add support for runtime arguments in injection pointsMichael Paquier11 days
| | | | | | | | | | | | | | | | | | | The macros INJECTION_POINT() and INJECTION_POINT_CACHED() are extended with an optional argument that can be passed down to the callback attached when an injection point is run, giving to callbacks the possibility to manipulate a stack state given by the caller. The existing callbacks in modules injection_points and test_aio have their declarations adjusted based on that. da7226993fd4 (core AIO infrastructure) and 93bc3d75d8e1 (test_aio) and been relying on a set of workarounds where a static variable called pgaio_inj_cur_handle is used as runtime argument in the injection point callbacks used by the AIO tests, in combination with a TRY/CATCH block to reset the argument value. The infrastructure introduced in this commit will be reused for the AIO tests, simplifying them. Reviewed-by: Greg Burd <greg@burd.me> Discussion: https://postgr.es/m/Z_y9TtnXubvYAApS@paquier.xyz
* Handle self-referencing FKs correctly in partitioned tablesÁlvaro Herrera2025-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For self-referencing foreign keys in partitioned tables, we weren't handling creation of pg_constraint rows during CREATE TABLE PARTITION AS as well as ALTER TABLE ATTACH PARTITION. This is an old bug -- mostly, we broke this in 614a406b4ff1 while trying to fix it (so 12.13, 13.9, 14.6 and 15.0 and up all behave incorrectly). This commit reverts part of that with additional fixes for full correctness, and installs more tests to verify the parts we broke, not just the catalog contents but also the user-visible behavior. Backpatch to all live branches. In branches 13 and 14, commit 46a8c27a7226 changed the behavior during DETACH to drop a FK constraint rather than trying to repair it, because the complete fix of repairing catalog constraints was problematic due to lack of previous fixes. For this reason, the test behavior in those branches is a bit different. However, as best as I can tell, the fix works correctly there. In release notes we have to recommend that all self-referencing foreign keys on partitioned tables be recreated if partitions have been created or attached after the FK was created, keeping in mind that violating rows might already be present on the referencing side. Reported-by: Guillaume Lelarge <guillaume@lelarge.info> Reported-by: Matthew Gabeler-Lee <fastcat@gmail.com> Reported-by: Luca Vallisa <luca.vallisa@gmail.com> Discussion: https://postgr.es/m/CAECtzeWHCA+6tTcm2Oh2+g7fURUJpLZb-=pRXgeWJ-Pi+VU=_w@mail.gmail.com Discussion: https://postgr.es/m/18156-a44bc7096f0683e6@postgresql.org Discussion: https://postgr.es/m/CAAT=myvsiF-Attja5DcWoUWh21R12R-sfXECY2-3ynt8kaOqjw@mail.gmail.com
* Make "directory" setting work with extension_control_pathPeter Eisentraut2025-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The extension_control_path setting (commit 4f7f7b03758) did not support extensions that set a custom "directory" setting in their control file. Very few extensions use that and during the discussion on the previous commit it was suggested to maybe remove that functionality. But a fix was easier than initially thought, so this just adds that support. The fix is to use the control->control_dir as a share dir to return the path of the extension script files. To make this work more sensibly overall, the directory suffix "extension" is no longer to be included in the extension_control_path value. To quote the patch, it would be -extension_control_path = '/usr/local/share/postgresql/extension:/home/my_project/share/extension:$system' +extension_control_path = '/usr/local/share/postgresql:/home/my_project/share:$system' During the initial patch, there was some discussion on which of these two approaches would be better, and the committed patch was a 50/50 decision. But the support for the "directory" setting pushed it the other way, and also it seems like many people didn't like the previous behavior much. Author: Matheus Alcantara <mths.dev@pm.me> Reviewed-by: Christoph Berg <myon@debian.org> Reviewed-by: David E. Wheeler <david@justatheory.com> Discussion: https://www.postgresql.org/message-id/flat/aAi1VACxhjMhjFnb%40msg.df7cb.de#0cdf7b7d727cc593b029650daa3c4fbc
* Fix bug allowing io_combine_limit > io_max_combine_combine limitAndres Freund2025-04-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | 10f66468475 intended to limit the value of io_combine_limit to the minimum of io_combine_limit and io_max_combine_limit. To avoid issues with interdependent GUCs, it introduced io_combine_limit_guc and set io_combine_limit in assign hooks. That plan was thwarted by guc_tables.c accidentally still referencing io_combine_limit, instead of io_combine_limit_guc. That lead to the GUC machinery overriding the work done in the assign hooks, potentially leaving io_combine_limit with a too high value. The consequence of this bug was that when running with io_combine_limit > io_combine_limit_guc the AIO machinery would not have reserved large enough iovec and IO data arrays, with one IO's arrays overlapping with another IO's, leading to total confusion. To make such a problem easier to detect in the future, add assertions to pgaio_io_set_handle_data_* checking the length is smaller than io_max_combine_limit (not just PG_IOV_MAX). It'd be nice to have a few tests for this, but it's not entirely obvious how to do so portably. As remarked upon by Tom, the GUC assignment hooks really shouldn't set the underlying variable, that's the job of the GUC machinery. Change that as well. Discussion: https://postgr.es/m/c5jyqnuwrpigd35qe7xdypxsisdjrdba5iw63mhcse4mzjogxo@qdjpv22z763f
* Change the names generated for child foreign key constraints.Tom Lane2025-04-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a foreign key constraint is placed on a partitioned table, we actually make two pg_constraint entries associated with that table. (I have my doubts about the wisdom of that, but it's been like that since v12 and post-feature-freeze is no time to be messing with such entrenched decisions.) The second "child" entry always had a name generated according to the default rule, "table_column(s)_fkey[nnn]", even if the primary entry had an unrelated user-specified name. The trouble with doing that is that the default name could collide with the user-specified name of some other constraint on the same table. While we were willing to adjust the generated name to avoid collisions, that only helps if it's made second; if it's made first then creation of the other constraint would fail, potentially causing dump/reload or pg_upgrade failures. The core of the problem here is that we're infringing on user namespace, so I doubt that there's any 100% solution other than to find a way to not need the "child" entry. In the meantime, it seems like it'd be an improvement to make the child's name be the name of the parent constraint with an underscore and digit(s) appended as necessary to make it unique. This rule can in theory fail in the same way, but it seems much less probable; for one thing, this rule is guaranteed not to match primary entries having auto-generated names. (While an auto-generated primary name isn't user-specified to begin with, it acts like that during dump/reload, so collisions against such names are definitely possible.) An additional bonus, visible in some of the regression test cases that change here, arises from the fact that some error messages cite the child constraint's name not the parent's. In the previous approach the two names could be completely unrelated, leading to user confusion --- the more so since psql's \d command hides child constraints. With this approach it's hopefully much clearer which constraint-the-user-knows-about is failing. However, that does mean that there's user-visible behavior change occurring here, making it seem like not something to back-patch. I feel it's not too late for v18, though. Reported-by: Kirill Reshke <reshkekirill@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CALdSSPhGitjpTfzEMJN-Y2x+Q-5QChSxAsmSJ1-E8mQJLkHOqQ@mail.gmail.com
* Avoid ERROR at ON COMMIT DELETE ROWS after relhassubclass=f.Noah Misch2025-04-20
| | | | | | | | | | | | | | | | Commit 7102070329d8147246d2791321f9915c3b5abf31 fixed a similar bug, but it missed the case of database-wide ANALYZE ("use_own_xacts" mode). Commit a07e03fd8fa7daf4d1356f7cb501ffe784ea6257 changed consequences from silent discard of a pg_class stats (relpages et al.) update to ERROR "tuple to be updated was already modified". Losing a relpages update of an ON COMMIT DELETE ROWS table was negligible, but a COMMIT-time error isn't negligible. Back-patch to v13 (all supported versions). Reported-by: Richard Guo <guofenglinux@gmail.com Reported-by: Robins Tharakan <tharakan@gmail.com> Discussion: https://postgr.es/m/CAMbWs4-XwMKMKJ_GT=p3_-_=j9rQSEs1FbDFUnW9zHuKPsPNEQ@mail.gmail.com Backpatch-through: 13
* Fix typos and grammar in the codeMichael Paquier2025-04-19
| | | | | | | | The large majority of these have been introduced by recent commits done in the v18 development cycle. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/9a7763ab-5252-429d-a943-b28941e0e28b@gmail.com
* Suppress "may be used uninitialized" warnings from older compilers.Tom Lane2025-04-17
| | | | | | | | | | The "children" list won't be used until "got_children" has been set true, but older compilers don't get that; about half a dozen buildfarm animals are warning about this. Issue added by 11ff192b5. While here, improve slightly-shaky grammar in comment. Discussion: https://postgr.es/m/2057835.1744833309@sss.pgh.pa.us
* Sync declarations and definitions of two new tablecmds.c functions.Tom Lane2025-04-16
| | | | | | | | | | | | Buildfarm member drongo complained because the definitions of these functions used "const Oid foo" where the forward declarations just had "Oid foo". (I'm a bit surprised that drongo seems to be the only complainant.) I chose to fix this by removing the "consts" because (a) I'm generally not a fan of using const that way, and (b) it was a minority usage even within these two functions, let alone compared to the rest of our code base. Oversight in commit eec0040c4, so no need for back-patch.
* Elide not-null constraint checks on child tables during PK creationÁlvaro Herrera2025-04-16
| | | | | | | | | | | | | | | | | | | | | | | | | We were unnecessarily acquiring AccessExclusiveLock on all child tables when "ALTER TABLE ONLY sometab ADD PRIMARY KEY" was run on their parent table, an oversight in commit 14e87ffa5c54. This caused deadlocks during pg_restore of partitioned tables. The reason to acquire the AEL was that we need to verify that child tables have the involved columns already marked as not-null; but if the parent table has an inheritable not-null constraint, then all children must necessarily be in the correct state already, so we can skip the check, which avoids acquiring the lock. Reorder the code so that it works that way. This doesn't change things in the case where the constraint doesn't exist, but that case is of lesser importance because it doesn't occur during parallel pg_restore. While at it, reword some errmsg() and add errhint() to similar cases in related but not adjacent code. Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Tender Wang <tndrwang@gmail.com> Discussion: https://postgr.es/m/67469c1c-38bc-7d94-918a-67033f5dd731@gmx.net Discussion: https://postgr.es/m/2045026.1743801143@sss.pgh.pa.us Discussion: https://postgr.es/m/1280408.1744650810@sss.pgh.pa.us
* Harmonize function parameter names for Postgres 18.Peter Geoghegan2025-04-12
| | | | | | | | | | Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced during Postgres 18 development. This commit was written with help from clang-tidy, by mechanically applying the same rules as similar clean-up commits (the earliest such commit was commit 035ce1fe).
* Improve various new-to-v18 appendStringInfo callsDavid Rowley2025-04-11
| | | | | | | | | Similar to 8461424fd, here we adjust a few new locations which were not using the most suitable appendStringInfo* function for the intended purpose. Author: David Rowley <drowleyml@gmail.com Discussion: https://postgr.es/m/CAApHDvqJnNjueb=Eoj8K+8n0g7nj_AcPWSiCj5RNV4fDejAfqA@mail.gmail.com
* Fix erroneous construction of functions' dependencies on transforms.Tom Lane2025-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The list of transform objects that a function should use is specified in CREATE FUNCTION's TRANSFORM clause, and then represented indirectly in pg_proc.protrftypes. However, ProcedureCreate completely ignored that for purposes of constructing pg_depend entries, and instead made the function depend on any transforms that exist for its parameter or return data types. This is bad in both directions: the function could be made dependent on a transform it does not actually use, or it could try to use a transform that's since been dropped. (The latter scenario would require use of a transform that's not for any of the parameter or return types, but that seems legit for cases where the function performs SQL operations internally.) To fix, pass in the list of transform objects that CreateFunction identified, and build pg_depend entries from that not from the parameter/return types. This results in changes in the expected test outputs in contrib/bool_plperl, which I guess are due to different ordering of pg_depend entries -- that test case is surely not exercising either of the problem scenarios. This fix is not back-patchable as-is: changing the signature of ProcedureCreate seems too risky in stable branches. We could do something like making ProcedureCreate a wrapper around ProcedureCreateExt or so. However, I'm more inclined to do nothing in the back branches. We had no field complaints up to now, so the hazards don't seem to be a big issue in practice. And we couldn't do anything about existing pg_depend entries, so a back-patched fix would result in a mishmash of dependencies created according to different rules. That cure could be worse than the disease, perhaps. I bumped catversion just to lay down a marker that the expected contents of pg_depend are a bit different than before. Reported-by: Chapman Flack <jcflack@acm.org> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3112950.1743984111@sss.pgh.pa.us
* Allow NOT NULL constraints to be added as NOT VALIDÁlvaro Herrera2025-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows them to be added without scanning the table, and validating them afterwards without holding access exclusive lock on the table after any violating rows have been deleted or fixed. Doing ALTER TABLE ... SET NOT NULL for a column that has an invalid not-null constraint validates that constraint. ALTER TABLE .. VALIDATE CONSTRAINT is also supported. There are various checks on whether an invalid constraint is allowed in a child table when the parent table has a valid constraint; this should match what we do for enforced/not enforced constraints. pg_attribute.attnotnull is now only an indicator for whether a not-null constraint exists for the column; whether it's valid or invalid must be queried in pg_constraint. Applications can continue to query pg_attribute.attnotnull as before, but now it's possible that NULL rows are present in the column even when that's set to true. For backend internal purposes, we cache the nullability status in CompactAttribute->attnullability that each tuple descriptor carries (replacing CompactAttribute.attnotnull, which was a mirror of Form_pg_attribute.attnotnull). During the initial tuple descriptor creation, based on the pg_attribute scan, we set this to UNRESTRICTED if pg_attribute.attnotnull is false, or to UNKNOWN if it's true; then we update the latter to VALID or INVALID depending on the pg_constraint scan. This flag is also copied when tupledescs are copied. Comparing tuple descs for equality must also compare the CompactAttribute.attnullability flag and return false in case of a mismatch. pg_dump deals with these constraints by storing the OIDs of invalid not-null constraints in a separate array, and running a query to obtain their properties. The regular table creation SQL omits them entirely. They are then dealt with in the same way as "separate" CHECK constraints, and dumped after the data has been loaded. Because no additional pg_dump infrastructure was required, we don't bump its version number. I decided not to bump catversion either, because the old catalog state works perfectly in the new world. (Trying to run with new catalog state and the old server version would likely run into issues, however.) System catalogs do not support invalid not-null constraints (because commit 14e87ffa5c54 didn't allow them to have pg_constraint rows anyway.) Author: Rushabh Lathia <rushabh.lathia@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Tested-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAGPqQf0KitkNack4F5CFkFi-9Dqvp29Ro=EpcWt=4_hs-Rt+bQ@mail.gmail.com
* Repair misbehavior with duplicate entries in FK SET column lists.Tom Lane2025-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since v15 we've had an option to apply a foreign key constraint's ON DELETE SET DEFAULT or SET NULL action to just some of the referencing columns. There was not a check for duplicate entries in the list of columns-to-set, though. That caused a potential memory stomp in CreateConstraintEntry(), which incautiously assumed that the list of columns-to-set couldn't be longer than the number of key columns. Even after fixing that, the case doesn't work because you get an error like "multiple assignments to same column" from the SQL command that is generated to do the update. We could either raise an error for duplicate columns or silently suppress the dups, and after a bit of thought I chose to do the latter. This is motivated by the fact that duplicates in the FK column list are legal, so it's not real clear why duplicates in the columns-to-set list shouldn't be. Of course there's no need to actually set the column more than once. I left in the fix in CreateConstraintEntry() too, just because it didn't seem like such low-level code ought to be making assumptions about what it's handed. Bug: #18879 Reported-by: Yu Liang <luy70@psu.edu> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/18879-259fc59d072bd4d7@postgresql.org Backpatch-through: 15
* Add nbtree skip scan optimization.Peter Geoghegan2025-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach nbtree multi-column index scans to opportunistically skip over irrelevant sections of the index given a query with no "=" conditions on one or more prefix index columns. When nbtree is passed input scan keys derived from a predicate "WHERE b = 5", new nbtree preprocessing steps output "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys. That is, preprocessing generates a "skip array" (and an output scan key) for the omitted prefix column "a", which makes it safe to mark the scan key on "b" as required to continue the scan. The scan is therefore able to repeatedly reposition itself by applying both the "a" and "b" keys. A skip array has "elements" that are generated procedurally and on demand, but otherwise works just like a regular ScalarArrayOp array. Preprocessing can freely add a skip array before or after any input ScalarArrayOp arrays. Index scans with a skip array decide when and where to reposition the scan using the same approach as any other scan with array keys. This design builds on the design for array advancement and primitive scan scheduling added to Postgres 17 by commit 5bf748b8. Testing has shown that skip scans of an index with a low cardinality skipped prefix column can be multiple orders of magnitude faster than an equivalent full index scan (or sequential scan). In general, the cardinality of the scan's skipped column(s) limits the number of leaf pages that can be skipped over. The core B-Tree operator classes on most discrete types generate their array elements with the help of their own custom skip support routine. This infrastructure gives nbtree a way to generate the next required array element by incrementing (or decrementing) the current array value. It can reduce the number of index descents in cases where the next possible indexable value frequently turns out to be the next value stored in the index. Opclasses that lack a skip support routine fall back on having nbtree "increment" (or "decrement") a skip array's current element by setting the NEXT (or PRIOR) scan key flag, without directly changing the scan key's sk_argument. These sentinel values behave just like any other value from an array -- though they can never locate equal index tuples (they can only locate the next group of index tuples containing the next set of non-sentinel values that the scan's arrays need to advance to). A skip array's range is constrained by "contradictory" inequality keys. For example, a skip array on "x" will only generate the values 1 and 2 given a qual such as "WHERE x BETWEEN 1 AND 2 AND y = 66". Such a skip array qual usually has near-identical performance characteristics to a comparable SAOP qual "WHERE x = ANY('{1, 2}') AND y = 66". However, improved performance isn't guaranteed. Much depends on physical index characteristics. B-Tree preprocessing is optimistic about skipping working out: it applies static, generic rules when determining where to generate skip arrays, which assumes that the runtime overhead of maintaining skip arrays will pay for itself -- or lead to only a modest performance loss. As things stand, these assumptions are much too optimistic: skip array maintenance will lead to unacceptable regressions with unsympathetic queries (queries whose scan can't skip over many irrelevant leaf pages). An upcoming commit will address the problems in this area by enhancing _bt_readpage's approach to saving cycles on scan key evaluation, making it work in a way that directly considers the needs of = array keys (particularly = skip array keys). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiro Ikeda <masahiro.ikeda@nttdata.com> Reviewed-By: Heikki Linnakangas <heikki.linnakangas@iki.fi> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Tomas Vondra <tomas@vondra.me> Reviewed-By: Aleksander Alekseev <aleksander@timescale.com> Reviewed-By: Alena Rybakina <a.rybakina@postgrespro.ru> Discussion: https://postgr.es/m/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com
* Allow "COPY table TO" command to copy rows from materialized views.Fujii Masao2025-04-04
| | | | | | | | | | | | | | | | | | | Previously, "COPY table TO" command worked only with plain tables and did not support materialized views, even when they were populated and had physical storage. To copy rows from materialized views, "COPY (query) TO" command had to be used, instead. This commit extends "COPY table TO" to support populated materialized views directly, improving usability and performance, as "COPY table TO" is generally faster than "COPY (query) TO". Note that copying from unpopulated materialized views will still result in an error. Author: jian he <jian.universality@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: David G. Johnston <david.g.johnston@gmail.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Discussion: https://postgr.es/m/CACJufxHVxnyRYy67hiPePNCPwVBMzhTQ6FaL9_Te5On9udG=yg@mail.gmail.com
* Need to do CommandCounterIncrement after StoreAttrMissingVal.Tom Lane2025-04-02
| | | | | | | | | | | | | | | | | | | | Without this, an additional change to the same pg_attribute row within the same command will fail. This is possible at least with ALTER TABLE ADD COLUMN on a multiple-inheritance-pathway structure. (Another potential hazard is that immediately-following operations might not see the missingval.) Introduced by 95f650674, which split the former coding that used a single pg_attribute update to change both atthasdef and atthasmissing/attmissingval into two updates, but missed that this should entail two CommandCounterIncrements as well. Like that fix, back-patch through v13. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/025a3ffa-5eff-4a88-97fb-8f583b015965@gmail.com Backpatch-through: 13
* Add support for NOT ENFORCED in foreign key constraintsPeter Eisentraut2025-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This expands the NOT ENFORCED constraint flag, previously only supported for CHECK constraints (commit ca87c415e2f), to foreign key constraints. Normally, when a foreign key constraint is created on a table, action and check triggers are added to maintain data integrity. With this patch, if a constraint is marked as NOT ENFORCED, integrity checks are no longer required, making these triggers unnecessary. Consequently, when creating a NOT ENFORCED foreign key constraint, triggers will not be created, and the constraint will be marked as NOT VALID. Similarly, if an existing foreign key constraint is changed to NOT ENFORCED, the associated triggers will be dropped, and the constraint will also be marked as NOT VALID. Conversely, if a NOT ENFORCED foreign key constraint is changed to ENFORCED, the necessary triggers will be created, and the will be changed to VALID by performing necessary validation. Since not-enforced foreign key constraints have no triggers, the shortcut used for example in psql and pg_dump to skip looking for foreign keys if the relation is known not to have triggers no longer applies. (It already didn't work for partitioned tables.) Author: Amul Sul <sulamul@gmail.com> Reviewed-by: Joel Jacobson <joel@compiler.org> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Isaac Morland <isaac.morland@gmail.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Tested-by: Triveni N <triveni.n@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
* Enable IO concurrency on all systemsAndres Freund2025-03-30
| | | | | | | | | | | | | | | Previously effective_io_concurrency and maintenance_io_concurrency could not be set above 0 on machines without fadvise support. AIO enables IO concurrency without such support, via io_method=worker. Currently only subsystems using the read stream API will take advantage of this. Other users of maintenance_io_concurrency (like recovery prefetching) which leverage OS advice directly will not benefit from this change. In those cases, maintenance_io_concurrency will have no effect on I/O behavior. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/CAAKRu_atGgZePo=_g6T3cNtfMf0QxpvoUh5OUqa_cnPdhLd=gw@mail.gmail.com
* read_stream: Introduce and use optional batchmode supportAndres Freund2025-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Submitting IO in larger batches can be more efficient than doing so one-by-one, particularly for many small reads. It does, however, require the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO batching (c.f. pgaio_enter_batchmode()). Basically, the callback may not: a) block without first calling pgaio_submit_staged(), unless a to-be-waited-on lock cannot be part of a deadlock, e.g. because it is never held while waiting for IO. b) directly or indirectly start another batch pgaio_enter_batchmode() As this requires care and is nontrivial in some cases, batching is only used with explicit opt-in. This patch adds an explicit flag (READ_STREAM_USE_BATCHING) to read_stream and uses it where appropriate. There are two cases where batching would likely be beneficial, but where we aren't using it yet: 1) bitmap heap scans, because the callback reads the VM This should soon be solved, because we are planning to remove the use of the VM, due to that not being sound. 2) The first phase of heap vacuum This could be made to support batchmode, but would require some care. Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
* Use PRI?64 instead of "ll?" in format strings (continued).Peter Eisentraut2025-03-29
| | | | | | | Continuation of work started in commit 15a79c73, after initial trial. Author: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org
* Add support for not-null constraints on virtual generated columnsPeter Eisentraut2025-03-28
| | | | | | | | | | | | | | | | | This was left out of the original patch for virtual generated columns (commit 83ea6c54025). This just involves a bit of extra work in the executor to expand the generation expressions and run a "IS NOT NULL" test against them. There is also a bit of work to make sure that not-null constraints are checked during a table rewrite. Author: jian he <jian.universality@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Navneet Kumar <thanit3111@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com
* Fix guc_malloc calls for consistency and OOM checksDaniel Gustafsson2025-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | check_createrole_self_grant and check_synchronized_standby_slots were allocating memory on a LOG elevel without checking if the allocation succeeded or not, which would have led to a segfault on allocation failure. On top of that, a number of callsites were using the ERROR level, relying on erroring out rather than returning false to allow the GUC machinery handle it gracefully. Other callsites used WARNING instead of LOG. While neither being not wrong, this changes all check_ functions do it consistently with LOG. init_custom_variable gets a promoted elevel to FATAL to keep the guc_malloc error handling in line with the rest of the error handling in that function which already call FATAL. If we encounter an OOM in this callsite there is no graceful handling to be had, better to error out hard. Backpatch the fix to check_createrole_self_grant down to v16 and the fix to check_synchronized_standby_slots down to v17 where they were introduced. Author: Daniel Gustafsson <daniel@yesql.se> Reported-by: Nikita <pm91.arapov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Bug: #18845 Discussion: https://postgr.es/m/18845-582c6e10247377ec@postgresql.org Backpatch-through: 16
* Simplify syntax for ALTER TABLE ALTER CONSTRAINT NO INHERITÁlvaro Herrera2025-03-27
| | | | | | | | | | | | | | | Commit d45597f72fe5 introduced the ability to change a not-null constraint from NO INHERIT to INHERIT and vice versa, but we included the SET noise word in the syntax for it. The SET turns out not to be necessary and goes against what the SQL standard says for other ALTER TABLE subcommands, so remove it. This changes the way this command is processed for constraint types other than not-null, so there are some error message changes. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Suraj Kharage <suraj.kharage@enterprisedb.com> Discussion: https://postgr.es/m/202503251602.vsxaehsyaoac@alvherre.pgsql
* Introduce PG_MODULE_MAGIC_EXT macro.Tom Lane2025-03-26
| | | | | | | | | | | | | | | | | | This macro allows dynamically loaded shared libraries (modules) to provide a wired-in module name and version, and possibly other compile-time-constant fields in future. This information can be retrieved with the new pg_get_loaded_modules() function. This feature is expected to be particularly useful for modules that do not have any exposed SQL functionality and thus are not associated with a SQL-level extension object. But even for modules that do belong to extensions, being able to verify the actual code version can be useful. Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Yurii Rashkovskii <yrashk@omnigres.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com
* refactor: Pass relation OID instead of Relation to ↵Peter Eisentraut2025-03-25
| | | | | | | | | | | | | | createForeignKeyCheckTriggers() Currently, createForeignKeyCheckTriggers() takes a Relation type as its first argument, but it doesn't use that argument directly. Instead, it fetches the relation OID by calling RelationGetRelid(). Therefore, it would be more consistent with other functions (e.g., createForeignKeyCheckTriggers()) to pass the relation OID directly instead of the whole Relation. Author: Amul Sul <amul.sul@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
* refactor: Split ATExecAlterConstraintInternal()Peter Eisentraut2025-03-25
| | | | | | | | | | | | | | | Split ATExecAlterConstraintInternal() into two functions: ATExecAlterConstrDeferrability() and ATExecAlterConstrInheritability(). This simplifies the code and avoids unnecessary confusion caused by recursive code, which isn't needed for ATExecAlterConstrInheritability(). (This also takes over the changes in commit 64224a834ce, as the new AlterConstrDeferrabilityRecurse() is essentially the old ATExecAlterChildConstr().) Author: Amul Sul <amul.sul@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
* refactor: Move some code that updates pg_constraint to a separate functionPeter Eisentraut2025-03-25
| | | | | | | | | This extracts common/duplicate code for different ALTER CONSTRAINT variants into a common function. We plan to add more variants that would use the same code. Author: Amul Sul <amul.sul@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
* Small fixes for Add ALTER TABLE ... ALTER CONSTRAINT ... SET [NO] INHERITPeter Eisentraut2025-03-25
| | | | | | | | | | | | | Small fixes for commit f4e53e10b6c: Add missing calls to InvokeObjectPostAlterHook() and also CacheInvalidateRelcache(). The former change could have a user-visible effect. The latter omission might have caused other bugs, but it is not clear whether one actually existed. With these changes, the code is now more consistent with similar ALTER CONSTRAINT variants, especially the ones that set the deferrability. Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAF1DzPVfOW6Kk=7SSh7LbneQDJWh=PbJrEC_Wkzc24tHOyQWGg@mail.gmail.com
* Change one loop in ATRewriteTable to use 1-based attnumsÁlvaro Herrera2025-03-21
| | | | | | | | | | All TupleDescAttr() calls in tablecmds.c that aren't in loops across all attributes use AttrNumber-style indexes (1-based); there was only one place in ATRewriteTable that was stashing 0-based indexes in a list for later processing. Switch that to use attnums for consistency. Author: jian he <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEoYA5ScUr2=CmA1xcpaS_1ixneDbEkVU77X1ctGxY2mA@mail.gmail.com
* Simplify EXPLAIN code for MemoizeDavid Rowley2025-03-21
| | | | | | | | | | | | | | | | This removes a needless special case for Memoize's FORMAT TEXT EXPLAIN output. ExplainPropertyText() outputs the same thing in text mode as the special-case code was doing, so removing the special-case code results in the same EXPLAIN output, just with less code. It seems like a good idea to fix this to help prevent future changes in this area from copying the same pattern. Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> Reported-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/88a71bcd-0b5c-4d0b-8107-757e96f402d5@tantorlabs.com
* Add an additional hook for EXPLAIN option validation.Robert Haas2025-03-20
| | | | | | | | | | | | | | | | Commit c65bc2e1d14a2d4daed7c1921ac518f2c5ac3d17 made it possible for loadable modules to add EXPLAIN options. Normally, any necessary validation can be performed by the hook function passed to RegisterExtensionExplainOption, but if a loadable module wants to sanity check options against each other, that needs to be done after the entire options list has been processed. So, add an additional hook for that purpose. Author: Sami Imseih <samimseih@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: http://postgr.es/m/CAA5RZ0vOcJF91O2e5AQN+V6guMNLMhJx83dxALf-iUZ-hLGO_Q@mail.gmail.com
* Add vacuum_truncate configuration parameter.Nathan Bossart2025-03-20
| | | | | | | | | | | | | | | | | | | | | | This new parameter works just like the storage parameter of the same name: if set to true (which is the default), autovacuum and VACUUM attempt to truncate any empty pages at the end of the table. It is primarily intended to help users avoid locking issues on hot standbys. The setting can be overridden with the storage parameter or VACUUM's TRUNCATE option. Since there's presently no way to determine whether a Boolean storage parameter is explicitly set or has just picked up the default value, this commit also introduces an isset_offset member to relopt_parse_elt. Suggested-by: Will Storey <will@summercat.com> Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Gurjeet Singh <gurjeet@singh.im> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Robert Treat <rob@xzilla.net> Discussion: https://postgr.es/m/Z2DE4lDX4tHqNGZt%40dev.null
* extension_control_pathPeter Eisentraut2025-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new GUC extension_control_path specifies a path to look for extension control files. The default value is $system, which looks in the compiled-in location, as before. The path search uses the same code and works in the same way as dynamic_library_path. Some use cases of this are: (1) testing extensions during package builds, (2) installing extensions outside security-restricted containers like Python.app (on macOS), (3) adding extensions to PostgreSQL running in a Kubernetes environment using operators such as CloudNativePG without having to rebuild the base image for each new extension. There is also a tweak in Makefile.global so that it is possible to install extensions using PGXS into an different directory than the default, using 'make install prefix=/else/where'. This previously only worked when specifying the subdirectories, like 'make install datadir=/else/where/share pkglibdir=/else/where/lib', for purely implementation reasons. (Of course, without the path feature, installing elsewhere was rarely useful.) Author: Peter Eisentraut <peter@eisentraut.org> Co-authored-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: David E. Wheeler <david@justatheory.com> Reviewed-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> Reviewed-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Reviewed-by: Niccolò Fei <niccolo.fei@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E7C7BFFB-8857-48D4-A71F-88B359FADCFD@justatheory.com
* Ensure first ModifyTable rel initialized if all are prunedAmit Langote2025-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit cbc127917e introduced tracking of unpruned relids to avoid processing pruned relations, and changed ExecInitModifyTable() to initialize only unpruned result relations. As a result, MERGE statements that prune all target partitions can now lead to crashes or incorrect behavior during execution. The crash occurs because some executor code paths rely on ModifyTableState.resultRelInfo[0] being present and initialized, even when no result relations remain after pruning. For example, ExecMerge() and ExecMergeNotMatched() use the first resultRelInfo to determine the appropriate action. Similarly, ExecInitPartitionInfo() assumes that at least one result relation exists. To preserve these assumptions, ExecInitModifyTable() now includes the first result relation in the initialized result relation list if all result relations for that ModifyTable were pruned. To enable that, ExecDoInitialPruning() ensures the first relation is locked if it was pruned and locking is necessary. To support this exception to the pruning logic, PlannedStmt now includes a list of RT indexes identifying the first result relation of each ModifyTable node in the plan. This allows ExecDoInitialPruning() to check whether each such relation was pruned and, if so, lock it if necessary. Bug: #18830 Reported-by: Robins Tharakan <tharakan@gmail.com> Diagnozed-by: Tender Wang <tndrwang@gmail.com> Diagnozed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/18830-1f31ea1dc930d444%40postgresql.org
* Introduce io_max_combine_limit.Thomas Munro2025-03-19
| | | | | | | | | | | | | | | | | | | | | The existing io_combine_limit can be changed by users. The new io_max_combine_limit is fixed at server startup time, and functions as a silent clamp on the user setting. That in itself is probably quite useful, but the primary motivation is: aio_init.c allocates shared memory for all asynchronous IOs including some per-block data, and we didn't want to waste memory you'd never used by assuming they could be up to PG_IOV_MAX. This commit already halves the size of 'AioHandleIov' and 'AioHandleData'. A follow-up commit can now expand PG_IOV_MAX without affecting that. Since our GUC system doesn't support dependencies or cross-checks between GUCs, the user-settable one now assigns a "raw" value to io_combine_limit_guc, and the lower of io_combine_limit_guc and io_max_combine_limit is maintained in io_combine_limit. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
* Add some new hooks so extensions can add details to EXPLAIN.Robert Haas2025-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | Specifically, add a per-node hook that is called after the per-node information has been displayed but before we display children, and a per-query hook that is called after existing query-level information is printed. This assumes that extension-added information should always go at the end rather than the beginning or the middle, but that seems like an acceptable limitation for simplicity. It also assumes that extensions will only want to add information, not remove or reformat existing details; those also seem like acceptable restrictions, at least for now. If multiple EXPLAIN extensions are used, the order in which any additional details are printed is likely to depend on the order in which the modules are loaded. That seems OK, since the user may have opinions about the order in which output should appear, and the extension author can't really know whether their stuff is more or less important to a particular user than some other extension. Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com Reviewed-by: Srinath Reddy <srinath2133@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Sami Imseih <samimseih@gmail.com>
* Make it possible for loadable modules to add EXPLAIN options.Robert Haas2025-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | Modules can use RegisterExtensionExplainOption to register new EXPLAIN options, and GetExplainExtensionId, GetExplainExtensionState, and SetExplainExtensionState to store related state inside the ExplainState object. Since this substantially increases the amount of code that needs to handle ExplainState-related tasks, move a few bits of existing code to a new file explain_state.c and add the rest of this infrastructure there. See the comments at the top of explain_state.c for further explanation of how this mechanism works. This does not yet provide a way for such such options to do anything useful. The intention is that we'll add hooks for that purpose in a separate commit. Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com Reviewed-by: Srinath Reddy <srinath2133@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Sami Imseih <samimseih@gmail.com>
* Allow non-btree unique indexes for matviewsPeter Eisentraut2025-03-18
| | | | | | | | | | | | | | We were rejecting non-btree indexes in some cases owing to the inability to determine the equality operators for other index AMs; that problem no longer exists, because we can look up the equality operator using COMPARE_EQ. Stop rejecting these indexes, but instead rely on all unique indexes having equality operators. Unique indexes must have equality operators. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
* Allow non-btree unique indexes for partition keysPeter Eisentraut2025-03-18
| | | | | | | | | | | | | | We were rejecting non-btree indexes in some cases owing to the inability to determine the equality operators for other index AMs; that problem no longer exists, because we can look up the equality operator using COMPARE_EQ. The problem of not knowing the strategy number for equality in other index AMs is already resolved. Stop rejecting the indexes upfront, and instead reject any for which the equality operator lookup fails. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
* Use correct variable name in publicationcmds.c.Amit Kapila2025-03-18
| | | | | | | | subid was used at few places for publicationid in publicationcmds.c/.h. Author: vignesh C <vignesh21@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CALDaNm1KqJ0VFfDJRPbfYi9Shz6LHFEE-Ckn+eqsePfKhebv9w@mail.gmail.com
* Remove direct handling of reloptions for toast tablesÁlvaro Herrera2025-03-14
| | | | | | | | | | | It doesn't actually work, even with allow_system_table_mods turned on: the ALTER TABLE operation is rejected by ATSimplePermissions(), so even the error message we're adding in this commit is unreachable. Add a test case for it. Author: Nikolay Shaplov <dhyan@nataraj.su> Discussion: https://postgr.es/m/1913854.tdWV9SEqCh@thinkpad-pgpro
* ATExecSetRelOptions: Reduce scope of 'isnull' variableÁlvaro Herrera2025-03-13
| | | | | | Author: Nikolay Shaplov <dhyan@nataraj.su> Reviewed-by: Timur Magomedov <t.magomedov@postgrespro.ru> Discussion: https://postgr.es/m/1913854.tdWV9SEqCh@thinkpad-pgpro
* Avoid invalidating all RelationSyncCache entries on publication rename.Amit Kapila2025-03-13
| | | | | | | | | | | | | | | | | | | | On Publication rename, we need to only invalidate the RelationSyncCache entries corresponding to relations that are part of the publication being renamed. As part of this patch, we introduce a new invalidation message to invalidate the cache maintained by the logical decoding output plugin. We can't use existing relcache invalidation for this purpose, as that would unnecessarily cause relcache invalidations in other backends. This will improve performance by building fewer relation cache entries during logical replication. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OSCPR01MB14966C09AA201EFFA706576A7F5C92@OSCPR01MB14966.jpnprd01.prod.outlook.com
* Assert that a snapshot is active or registered before it's usedHeikki Linnakangas2025-03-11
| | | | | | | | | | | | | | | | | | | | | | | | The comment in GetTransactionSnapshot() said that you "should call RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be used very long". That felt too unclear to me. Make the comment more strongly worded. To enforce that rule and to catch potential bugs where a snapshot might get invalidated while it's still in use, add an assertion to HeapTupleSatisfiesMVCC() to check that the snapshot is registered or pushed to active stack. No new bugs were found by this, but it seems like good future-proofing. It's not a great place for the check; HeapTupleSatisfiesMVCC() is in fact safe to call with an unregistered snapshot, and the assertion won't catch other unsafe uses. But it goes a long way in practice. Fix a few cases that were playing fast and loose with that and just assumed that the snapshot cannot be invalidated during a scan. Those assumptions were not wrong, but they're not performance critical, so let's drop the excuses and just register the snapshot. These were false positives found by the new assertion. Discussion: https://www.postgresql.org/message-id/7c56f180-b9e1-481e-8c1d-efa63de3ecbb@iki.fi
* Improve EXPLAIN's display of window functions.Tom Lane2025-03-11
| | | | | | | | | | | | | | | | | | | | | | Up to now we just punted on showing the window definitions used in a plan, with window function calls represented as "OVER (?)". To improve that, show the window definition implemented by each WindowAgg plan node, and reference their window names in OVER. For nameless window clauses generated by "OVER (...)", assign unique names w1, w2, etc. In passing, re-order the properties shown for a WindowAgg node so that the Run Condition (if any) appears after the Window property and before the Filter (if any). This seems more sensible since the Run Condition is associated with the Window and acts before the Filter. Thanks to David G. Johnston and Álvaro Herrera for design suggestions. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/144530.1741469955@sss.pgh.pa.us
* Show index search count in EXPLAIN ANALYZE, take 2.Peter Geoghegan2025-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Expose the count of index searches/index descents in EXPLAIN ANALYZE's output for index scan/index-only scan/bitmap index scan nodes. This information is particularly useful with scans that use ScalarArrayOp quals, where the number of index searches can be unpredictable due to implementation details that interact with physical index characteristics (at least with nbtree SAOP scans, since Postgres 17 commit 5bf748b8). The information shown also provides useful context when EXPLAIN ANALYZE runs a plan with an index scan node that successfully applied the skip scan optimization (set to be added to nbtree by an upcoming patch). The instrumentation works by teaching all index AMs to increment a new nsearches counter whenever a new index search begins. The counter is incremented at exactly the same point that index AMs already increment the pg_stat_*_indexes.idx_scan counter (we're counting the same event, but at the scan level rather than the relation level). Parallel queries have workers copy their local counter struct into shared memory when an index scan node ends -- even when it isn't a parallel aware scan node. An earlier version of this patch that only worked with parallel aware scans became commit 5ead85fb (though that was quickly reverted by commit d00107cd following "debug_parallel_query=regress" buildfarm failures). Our approach doesn't match the approach used when tracking other index scan related costs (e.g., "Rows Removed by Filter:"). It is comparable to the approach used in similar cases involving costs that are only readily accessible inside an access method, not from the executor proper (e.g., "Heap Blocks:" output for a Bitmap Heap Scan, which was recently enhanced to show per-worker costs by commit 5a1e6df3, using essentially the same scheme as the one used here). It is necessary for index AMs to have direct responsibility for maintaining the new counter, since the counter might need to be incremented multiple times per amgettuple call (or per amgetbitmap call). But it is also necessary for the executor proper to manage the shared memory now used to transfer each worker's counter struct to the leader. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Tomas Vondra <tomas@vondra.me> Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
* Move RemoveInheritedConstraint() call slightly earlierPeter Eisentraut2025-03-11
| | | | | | | | | | | | | | | | | This change is harmless and does not affect the existing intended operation. It is necessary for a subsequent patch operation (NOT ENFORCED foreign keys), where we may need to change the child constraint to enforced. In this case, we would create the necessary triggers and queue the constraint for validation, so it is important to remove any unnecessary constraints before proceeding. This is a small change that could have been included in the previous "split tryAttachPartitionForeignKey" refactoring patch (commit 1d26c2d2c4b), but was kept separate to highlight the changes. Author: Amul Sul <amul.sul@enterprisedb.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA%40mail.gmail.com
* refactor: Split tryAttachPartitionForeignKey()Peter Eisentraut2025-03-11
| | | | | | | | | | | Split tryAttachPartitionForeignKey() into three functions: AttachPartitionForeignKey(), RemoveInheritedConstraint(), and DropForeignKeyConstraintTriggers(), so they can be reused in some subsequent patches for the NOT ENFORCED feature. Author: Amul Sul <amul.sul@enterprisedb.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA%40mail.gmail.com