aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Allow ReadStream to be consumed as raw block numbers.Thomas Munro2024-09-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commits 041b9680 and 6377e12a changed the interface of scan_analyze_next_block() to take a ReadStream instead of a BlockNumber and a BufferAccessStrategy, and to return a value to indicate when the stream has run out of blocks. This caused integration problems for at least one known extension that uses specially encoded BlockNumber values that map to different underlying storage, because acquire_sample_rows() sets up the stream so that read_stream_next_buffer() reads blocks from the main fork of the relation's SMgrRelation. Provide read_stream_next_block(), as a way for such an extension to access the stream of raw BlockNumbers directly and forward them to its own ReadBuffer() calls after decoding, as it could in earlier releases. The new function returns the BlockNumber and BufferAccessStrategy that were previously passed directly to scan_analyze_next_block(). Alternatively, an extension could wrap the stream of BlockNumbers in another ReadStream with a callback that performs any decoding required to arrive at real storage manager BlockNumber values, so that it could benefit from the I/O combining and concurrency provided by read_stream.c. Another class of table access method that does nothing in scan_analyze_next_block() because it is not block-oriented could use this function to control the number of block sampling loops. It could match the previous behavior with "return read_stream_next_block(stream, &bas) != InvalidBlockNumber". Ongoing work is expected to provide better ANALYZE support for table access methods that don't behave like heapam with respect to storage blocks, but that will be for future releases. Back-patch to 17. Reported-by: Mats Kindahl <mats@timescale.com> Reviewed-by: Mats Kindahl <mats@timescale.com> Discussion: https://postgr.es/m/CA%2B14425%2BCcm07ocG97Fp%2BFrD9xUXqmBKFvecp0p%2BgV2YYR258Q%40mail.gmail.com
* Repair pg_upgrade for identity sequences with non-default persistence.Tom Lane2024-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we introduced unlogged sequences in v15, identity sequences have defaulted to having the same persistence as their owning table. However, it is possible to change that with ALTER SEQUENCE, and pg_dump tries to preserve the logged-ness of sequences when it doesn't match (as indeed it wouldn't for an unlogged table from before v15). The fly in the ointment is that ALTER SEQUENCE SET [UN]LOGGED fails in binary-upgrade mode, because it needs to assign a new relfilenode which we cannot permit in that mode. Thus, trying to pg_upgrade a database containing a mismatching identity sequence failed. To fix, add syntax to ADD/ALTER COLUMN GENERATED AS IDENTITY to allow the sequence's persistence to be set correctly at creation, and use that instead of ALTER SEQUENCE SET [UN]LOGGED in pg_dump. (I tried to make SET [UN]LOGGED work without any pg_dump modifications, but that seems too fragile to be a desirable answer. This way should be markedly faster anyhow.) In passing, document the previously-undocumented SEQUENCE NAME option that pg_dump also relies on for identity sequences; I see no value in trying to pretend it doesn't exist. Per bug #18618 from Anthony Hsu. Back-patch to v15 where we invented this stuff. Discussion: https://postgr.es/m/18618-d4eb26d669ed110a@postgresql.org
* Minor cleanup related to pg_wal_replay_wait() procedureAlexander Korotkov2024-09-17
| | | | | | | | | | | | * Rename $node_standby1 to $node_standby in 043_wal_replay_wait.pl as there is only one standby. * Remove useless debug printing in 043_wal_replay_wait.pl. * Fix typo in one check description in 043_wal_replay_wait.pl. * Fix some wording in comments and documentation. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/1d7b08f2-64a2-77fb-c666-c9a74c68eeda%40gmail.com Reviewed-by: Alexander Lakhin
* Avoid parallel nbtree index scan hangs with SAOPs.Peter Geoghegan2024-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5bf748b8, which enhanced nbtree ScalarArrayOp execution, made parallel index scans work with the new design for arrays via explicit scheduling of primitive index scans. A backend that successfully scheduled the scan's next primitive index scan saved its backend local array keys in shared memory. Any backend could pick up the scheduled primitive scan within _bt_first. This scheme decouples scheduling a primitive scan from starting the scan (by performing another descent of the index via a _bt_search call from _bt_first) to make things robust. The scheme had a deadlock hazard, at least when the leader process participated in the scan. _bt_parallel_seize had a code path that made backends that were not in an immediate position to start a scheduled primitive index scan wait for some other backend to do so instead. Under the right circumstances, the leader process could wait here forever: the leader would wait for any other backend to start the primitive scan, while every worker was busy waiting on the leader to consume tuples from the scan's tuple queue. To fix, don't wait for a scheduled primitive index scan to be started by some other eligible backend from within _bt_parallel_seize (when the calling backend isn't in a position to do so itself). Return false instead, while recording that the scan has a scheduled primitive index scan in backend local state. This leaves the backend in the same state as the existing case where a backend schedules (or tries to schedule) another primitive index scan from within _bt_advance_array_keys, before calling _bt_parallel_seize. _bt_parallel_seize already handles that case by returning false without waiting, and without unsetting the backend local state. Leaving the backend in this state enables it to start a previously scheduled primitive index scan once it gets back to _bt_first. Oversight in commit 5bf748b8, which enhanced nbtree ScalarArrayOp execution. Matthias van de Meent, with tweaks by me. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reported-By: Tomas Vondra <tomas@vondra.me> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzmMGaPa32u9x_FvEbPTUkP5e95i=QxR8054nvCRydP-sw@mail.gmail.com Backpatch: 17-, where nbtree SAOP execution was enhanced.
* Add temporal FOREIGN KEY contraintsPeter Eisentraut2024-09-17
| | | | | | | | | | | | | | | | | | | | | Add PERIOD clause to foreign key constraint definitions. This is supported for range and multirange types. Temporal foreign keys check for range containment instead of equality. This feature matches the behavior of the SQL standard temporal foreign keys, but it works on PostgreSQL's native ranges instead of SQL's "periods", which don't exist in PostgreSQL (yet). Reference actions ON {UPDATE,DELETE} {CASCADE,SET NULL,SET DEFAULT} are not supported yet. (previously committed as 34768ee3616, reverted by 8aee330af55; this is essentially unchanged from those) Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
* Add temporal PRIMARY KEY and UNIQUE constraintsPeter Eisentraut2024-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints. These are backed by GiST indexes instead of B-tree indexes, since they are essentially exclusion constraints with = for the scalar parts of the key and && for the temporal part. (previously committed as 46a0cd4cefb, reverted by 46a0cd4cefb; the new part is this:) Because 'empty' && 'empty' is false, the temporal PK/UQ constraint allowed duplicates, which is confusing to users and breaks internal expectations. For instance, when GROUP BY checks functional dependencies on the PK, it allows selecting other columns from the table, but in the presence of duplicate keys you could get the value from any of their rows. So we need to forbid empties. This all means that at the moment we can only support ranges and multiranges for temporal PK/UQs, unlike the original patch (above). Documentation and tests for this are added. But this could conceivably be extended by introducing some more general support for the notion of "empty" for other types. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
* Add stratnum GiST support functionPeter Eisentraut2024-09-17
| | | | | | | | | | | | | | | | | | | | | This is support function 12 for the GiST AM and translates "well-known" RT*StrategyNumber values into whatever strategy number is used by the opclass (since no particular numbers are actually required). We will use this to support temporal PRIMARY KEY/UNIQUE/FOREIGN KEY/FOR PORTION OF functionality. This commit adds two implementations, one for internal GiST opclasses (just an identity function) and another for btree_gist opclasses. It updates btree_gist from 1.7 to 1.8, adding the support function for all its opclasses. (previously committed as 6db4598fcb8, reverted by 8aee330af55; this is essentially unchanged from those) Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
* Add memory/disk usage for Window aggregate nodes in EXPLAIN.Tatsuo Ishii2024-09-17
| | | | | | | | | | | | | This commit is similar to 1eff8279d and expands the idea to Window aggregate nodes so that users can know how much memory or disk the tuplestore used. This commit uses newly introduced tuplestore_get_stats() to inquire this information and add some additional output in EXPLAIN ANALYZE to display the information for the Window aggregate node. Reviewed-by: David Rowley, Ashutosh Bapat, Maxim Orlov, Jian He Discussion: https://postgr.es/m/20240706.202254.89740021795421286.ishii%40postgresql.org
* Replace usages of xmlXPathCompile() with xmlXPathCtxtCompile().Tom Lane2024-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In existing releases of libxml2, xmlXPathCompile can be driven to stack overflow because it fails to protect itself against too-deeply-nested input. While there is an upstream fix as of yesterday, it will take years for that to propagate into all shipping versions. In the meantime, we can protect our own usages basically for free by calling xmlXPathCtxtCompile instead. (The actual bug is that libxml2 keeps its nesting counter in the xmlXPathContext, and its parsing code was willing to just skip counting nesting levels if it didn't have a context. So if we supply a context, all is well. It seems odd actually that it works at all to not supply a context, because this means that XPath parsing does not have access to XML namespace info. Apparently libxml2 never checks namespaces until runtime? Anyway, this seems like good future-proofing even if its only immediate effect is to dodge a bug.) Sadly, this hack only offers protection with libxml2 2.9.11 and newer. Before that there are multiple similar problems, so if you are processing untrusted XML it behooves you to get a newer version. But we have some pretty old libxml2 in the buildfarm, so it seems impractical to add a regression test to verify this fix. Per bug #18617 from Jingzhou Fu. Back-patch to all supported versions. Discussion: https://postgr.es/m/18617-1cee4d2ed1f4e7ae@postgresql.org Discussion: https://gitlab.gnome.org/GNOME/libxml2/-/issues/799
* Allow _h_indexbuild() to be interrupted.Tom Lane2024-09-13
| | | | | | | | | | | | | | When we are building a hash index that is large enough to need pre-sorting (larger than either maintenance_work_mem or NBuffers), the initial sorting phase is interruptible, but the insertion phase wasn't. Add the missing CHECK_FOR_INTERRUPTS(). Per bug #18616 from Alexander Lakhin. Back-patch to all supported branches. Pavel Borisov Discussion: https://postgr.es/m/18616-acbb9e5caf41e964@postgresql.org
* Remove separate locale_is_c argumentsPeter Eisentraut2024-09-13
| | | | | | | | | | | | | | | | Since e9931bfb751, ctype_is_c is part of pg_locale_t. Some functions passed a pg_locale_t and a bool argument separately. This can now be combined into one argument. Since some callers call MatchText() with locale 0, it is a bit confusing whether this is all correct. But it is the case that only callers that pass a non-zero locale object to MatchText() end up checking locale->ctype_is_c. To make that flow a bit more understandable, add the locale argument to MATCH_LOWER() and GETCHAR() in like_match.c, instead of implicitly taking it from the outer scope. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://www.postgresql.org/message-id/84d415fc-6780-419e-b16c-61a0ca819e2b@eisentraut.org
* Prohibit altering invalidated replication slots.Amit Kapila2024-09-13
| | | | | | | | | | ALTER_REPLICATION_SLOT for invalid replication slots should not be allowed because there is no way to get back the invalidated (logical) slot to work. Author: Bharath Rupireddy Reviewed-by: Peter Smith, Shveta Malik Discussion: https://www.postgresql.org/message-id/CALj2ACW4fSOMiKjQ3=2NVBMTZRTG8Ujg6jsK9z3EvOtvA4vzKQ@mail.gmail.com
* Simplify checks for deterministic collations.Jeff Davis2024-09-12
| | | | | | | | | | | | Remove redundant checks for locale->collate_is_c now that we always have a valid pg_locale_t. Also, remove pg_locale_deterministic() wrapper, which is no longer useful after commit e9931bfb75. Just check the field directly, consistent with other fields in pg_locale_t. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
* Remove redundant check for default collation.Jeff Davis2024-09-12
| | | | | | | | | The operative check is for a deterministic collation, so the check for DEFAULT_COLLATION is redundant. Furthermore, it will be wrong if we ever support a non-deterministic default collation. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
* Make jsonpath .string() be immutable for datetimes.Tom Lane2024-09-12
| | | | | | | | | | | | | | | | | Discussion of commit ed055d249 revealed that we don't actually want jsonpath's .string() method to depend on DateStyle, nor TimeZone either, because the non-"_tz" jsonpath functions are supposed to be immutable. Potentially we could allow a TimeZone dependency in the "_tz" variants, but it seems better to just uniformly define this method as returning the same string that jsonb text output would do. That's easier to implement too, saving a couple dozen lines. Patch by me, per complaint from Peter Eisentraut. Back-patch to v17 where this feature came in (in 66ea94e8e). Also back-patch ed055d249 to provide test cases. Discussion: https://postgr.es/m/5e8879d0-a3c8-4be2-950f-d83aa2af953a@eisentraut.org
* Add has_largeobject_privilege function.Fujii Masao2024-09-12
| | | | | | | | | | | | | | | This function checks whether a user has specific privileges on a large object, identified by OID. The user can be provided by name, OID, or default to the current user. If the specified large object doesn't exist, the function returns NULL. It raises an error for a non-existent user name. This behavior is basically consistent with other privilege inquiry functions like has_table_privilege. Bump catalog version. Author: Yugo Nagata Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20240702163444.ab586f6075e502eb84f11b1a@sranhm.sraoss.co.jp
* Deduplicate code in LargeObjectExists and myLargeObjectExists.Fujii Masao2024-09-12
| | | | | | | | | | | myLargeObjectExists() and LargeObjectExists() had nearly identical code, except for handling snapshots. This commit renames myLargeObjectExists() to LargeObjectExistsWithSnapshot() and refactors LargeObjectExists() to call it internally, reducing duplication. Author: Yugo Nagata Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20240702163444.ab586f6075e502eb84f11b1a@sranhm.sraoss.co.jp
* Remove hardcoded hash opclass function signature exceptionsPeter Eisentraut2024-09-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | hashvalidate(), which validates the signatures of support functions for the hash AM, contained several hardcoded exceptions. For example, hash/date_ops support function 1 was hashint4(), which would ordinarily fail validation because the function argument is int4, not date. But this works internally because int4 and date are of the same size. There are several more exceptions like this that happen to work and were allowed historically but would now fail the function signature validation. This patch removes those exceptions by providing new support functions that have the proper declared signatures. They internally share most of the code with the "wrong" functions they replace, so the behavior is still the same. With the exceptions gone, hashvalidate() is now simplified and relies fully on check_amproc_signature(). hashvarlena() and hashvarlenaextended() are kept in pg_proc.dat because some extensions currently use them to build hash functions for their own types, and we need to keep exposing these functions as "LANGUAGE internal" functions for that to continue to work. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/29c3b746-69e7-482a-b37c-dbbf7e5b009b@eisentraut.org
* Remove old RULE privilege completely.Fujii Masao2024-09-12
| | | | | | | | | | | | | | | | The RULE privilege for tables was removed in v8.2, but for backward compatibility, GRANT/REVOKE and privilege functions like has_table_privilege continued to accept the RULE keyword without any effect. After discussions on pgsql-hackers, it was agreed that this compatibility is no longer needed. Since it's been long enough since the deprecation, we've decided to fully remove support for RULE privilege, so GRANT/REVOKE and privilege functions will no longer accept it. Author: Fujii Masao Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/976a3581-6939-457f-b947-fc3dc836c083@oss.nttdata.com
* Don't overwrite scan key in systable_beginscan()Peter Eisentraut2024-09-12
| | | | | | | | | | | | | | | | | | | | When systable_beginscan() and systable_beginscan_ordered() choose an index scan, they remap the attribute numbers in the passed-in scan keys to the attribute numbers of the index, and then write those remapped attribute numbers back into the scan key passed by the caller. This second part is surprising and gratuitous. It means that a scan key cannot safely be used more than once (but it might sometimes work, depending on circumstances). Also, there is no value in providing these remapped attribute numbers back to the caller, since they can't do anything with that. Fix that by making a copy of the scan keys passed by the caller and make the modifications there. Also, some code that had to work around the previous situation is simplified. Discussion: https://www.postgresql.org/message-id/flat/f8c739d9-f48d-4187-b214-df3391ba41ab@eisentraut.org
* Move logic related to WAL replay of Heap/Heap2 into its own fileMichael Paquier2024-09-12
| | | | | | | | | | | | | This brings more clarity to heapam.c, by cleanly separating all the logic related to WAL replay and the rest of Heap and Heap2, similarly to other RMGRs like hash, btree, etc. The header reorganization is also nice in heapam.c, cutting half of the headers required. Author: Li Yong Reviewed-by: Sutou Kouhei, Michael Paquier Discussion: https://postgr.es/m/EFE55E65-D7BD-4C6A-B630-91F43FD0771B@ebay.com
* Adjust tuplestore stats APIDavid Rowley2024-09-12
| | | | | | | | | | | | | | | | | 1eff8279d added an API to tuplestore.c to allow callers to obtain storage telemetry data. That API wasn't quite good enough for callers that perform tuplestore_clear() as the telemetry functions only accounted for the current state of the tuplestore, not the maximums before tuplestore_clear() was called. There's a pending patch that would like to add tuplestore telemetry output to EXPLAIN ANALYZE for WindowAgg. That node type uses tuplestore_clear() before moving to the next window partition and we want to show the maximum space used, not the space used for the final partition. Reviewed-by: Tatsuo Ishii, Ashutosh Bapat Discussion: https://postgres/m/CAApHDvoY8cibGcicLV0fNh=9JVx9PANcWvhkdjBnDCc9Quqytg@mail.gmail.com
* SQL/JSON: Fix JSON_QUERY(... WITH CONDITIONAL WRAPPER)Amit Langote2024-09-12
| | | | | | | | | | | | | | | | Currently, when WITH CONDITIONAL WRAPPER is specified, array wrappers are applied even to a single SQL/JSON item if it is a scalar JSON value, but this behavior does not comply with the standard. To fix, apply wrappers only when there are multiple SQL/JSON items in the result. Reported-by: Peter Eisentraut <peter@eisentraut.org> Author: Peter Eisentraut <peter@eisentraut.org> Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/8022e067-818b-45d3-8fab-6e0d94d03626%40eisentraut.org Backpatch-through: 17
* Remove incorrect Assert.Tom Lane2024-09-11
| | | | | | | | | | | | | | | | | | | | | | | | check_agglevels_and_constraints() asserted that if we find an aggregate function in an EXPR_KIND_FROM_SUBSELECT expression, the expression must be in a LATERAL subquery. Alexander Lakhin found a case where that's not so: because of the odd scoping rules for NEW/OLD within a rule, a reference to NEW/OLD could cause an aggregate to be considered top-level even though it's in an unmarked sub-select. The error message that would be thrown seems sufficiently on-point, so just remove the Assert. (Hence, this is not a bug for production builds.) This Assert was added by me in commit eaccfded9 (9.3 era). It looks like I put it in to cross-check that the new logic for detecting misplaced aggregates (using agglevelsup) caught the same cases that a previous check on p_lateral_active did. So there might have been some related misbehavior before eaccfded9 ... but that's very ancient history by now, so I didn't dig any deeper. Per bug #18608 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/18608-48de0717508ee429@postgresql.org
* Replace gratuitous memmove() with memcpy()Peter Eisentraut2024-09-11
| | | | | | | | | | The index access methods all had similar code that copied the passed-in scan keys to local storage. They all used memmove() for that, which is not wrong, but it seems confusing not to use memcpy() when that would work. Presumably, this was all once copied from ancient code and never adjusted. Discussion: https://www.postgresql.org/message-id/flat/f8c739d9-f48d-4187-b214-df3391ba41ab@eisentraut.org
* Fix unique key checks in JSON object constructorsTomas Vondra2024-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building a JSON object, the code builds a hash table of keys, to allow checking if the keys are unique. The uniqueness check and adding the new key happens in json_unique_check_key(), but this assumes the pointer to the key remains valid. Unfortunately, two places passed pointers to keys in a buffer, while also appending more data (additional key/value pairs) to the buffer. With enough data the buffer is resized by enlargeStringInfo(), which calls repalloc(), invalidating the earlier key pointers. Due to this the uniqueness check may fail with both false negatives and false positives, producing JSON objects with duplicate keys or failing to produce a perfectly valid JSON object. This affects multiple functions that enforce uniqueness of keys, all introduced in PG16 with the new SQL/JSON: - json_object_agg_unique / jsonb_object_agg_unique - json_object / jsonb_objectagg Existing regression tests did not detect the issue, simply because the initial buffer size is 1024 and the objects were small enough not to require the repalloc. With a sufficiently large object, AddressSanitizer reported the access to invalid memory immediately. So would valgrind, of course. Fixed by copying the key into the hash table memory context, and adding regression tests with enough data to repalloc the buffer. Backpatch to 16, where the functions were introduced. Reported by Alexander Lakhin. Investigation and initial fix by Junwang Zhao, with various improvements and tests by me. Reported-by: Alexander Lakhin Author: Junwang Zhao, Tomas Vondra Backpatch-through: 16 Discussion: https://postgr.es/m/18598-3279ed972a2347c7@postgresql.org Discussion: https://postgr.es/m/CAEG8a3JjH0ReJF2_O7-8LuEbO69BxPhYeXs95_x7+H9AMWF1gw@mail.gmail.com
* Remove obsolete unconstify()Peter Eisentraut2024-09-11
| | | | | | | | This is no longer needed as of OpenSSL 1.1.0 (the current minimum version). LibreSSL made the same change around the same time as well. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/20463f79-a7b0-4bba-a178-d805f99c02f9%40eisentraut.org
* Improve assertion in FindReplTupleInLocalRel().Amit Kapila2024-09-11
| | | | | | | | | | | | | | | | The first part of the assertion verifying that the passed index must be PK or RI was incorrectly passing index relation instead of heap relation in GetRelationIdentityOrPK(). The assertion was not failing because the second part of the assertion which needs to be performed only when remote relation has REPLICA_IDENTITY_FULL set was also incorrect. The change is not backpatched because the current coding doesn't lead to any failure. Reported-by: Dilip Kumar Author: Amit Kapila Reviewed-by: Vignesh C Discussion: https://postgr.es/m/CAFiTN-tmguaT1DXbCC+ZomZg-oZLmU6BPhr0po7akQSG6vNJrg@mail.gmail.com
* Use a hash table to de-duplicate column names in ruleutils.c.Tom Lane2024-09-10
| | | | | | | | | | | | | | | | | | | Commit 8004953b5 added a hash table to avoid O(N^2) cost in choosing unique relation aliases while deparsing a view or rule. It did nothing about the similar O(N^2) (maybe worse) costs of choosing unique column aliases within each RTE. However, that's now demonstrably a bottleneck when deparsing CHECK constraints for wide tables, so let's use a similar hash table to handle those. The extra cost of setting up the hash table will not be repaid unless the table has many columns. I've set this up so that we use the brute force method if there are less than 32 columns. The exact cutoff is not too critical, but this value seems good because it results in both code paths getting exercised by existing regression-test cases. Patch by me; thanks to David Rowley for review. Discussion: https://postgr.es/m/2885468.1722291250@sss.pgh.pa.us
* Fix some whitespace issues in XMLSERIALIZE(... INDENT).Tom Lane2024-09-10
| | | | | | | | | | | | | | | | | | | We must drop whitespace while parsing the input, else libxml2 will include "blank" nodes that interfere with the desired indentation behavior. The end result is that we didn't indent nodes separated by whitespace. Also, it seems that libxml2 may add a trailing newline when working in DOCUMENT mode. This is semantically insignificant, so strip it. This is in the gray area between being a bug fix and a definition change. However, the INDENT option is still pretty new (since v16), so I think we can get away with changing this in stable branches. Hence, back-patch to v16. Jim Jones Discussion: https://postgr.es/m/872865a8-548b-48e1-bfcd-4e38e672c1e4@uni-muenster.de
* Add amgettreeheight index AM API routinePeter Eisentraut2024-09-10
| | | | | | | | | | | The only current implementation is for btree where it calls _bt_getrootheight(). Other index types can now also use this to pass information to their amcostestimate routine. Previously, btree was hardcoded and other index types could not hook into the optimizer at this point. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
* Mark expressions nullable by grouping setsRichard Guo2024-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When generating window_pathkeys, distinct_pathkeys, or sort_pathkeys, we failed to realize that the grouping/ordering expressions might be nullable by grouping sets. As a result, we may incorrectly deem that the PathKeys are redundant by EquivalenceClass processing and thus remove them from the pathkeys list. That would lead to wrong results in some cases. To fix this issue, we mark the grouping expressions nullable by grouping sets if that is the case. If the grouping expression is a Var or PlaceHolderVar or constructed from those, we can just add the RT index of the RTE_GROUP RTE to the existing nullingrels field(s); otherwise we have to add a PlaceHolderVar to carry on the nullingrel bit. However, we have to manually remove this nullingrel bit from expressions in various cases where these expressions are logically below the grouping step, such as when we generate groupClause pathkeys for grouping sets, or when we generate PathTarget for initial input to grouping nodes. Furthermore, in set_upper_references, the targetlist and quals of an Agg node should have nullingrels that include the effects of the grouping step, ie they will have nullingrels equal to the input Vars/PHVs' nullingrels plus the nullingrel bit that references the grouping RTE. In order to perform exact nullingrels matches, we also need to manually remove this nullingrel bit. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com
* Introduce an RTE for the grouping stepRichard Guo2024-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there are subqueries in the grouping expressions, each of these subqueries in the targetlist and HAVING clause is expanded into distinct SubPlan nodes. As a result, only one of these SubPlan nodes would be converted to reference to the grouping key column output by the Agg node; others would have to get evaluated afresh. This is not efficient, and with grouping sets this can cause wrong results issues in cases where they should go to NULL because they are from the wrong grouping set. Furthermore, during re-evaluation, these SubPlan nodes might use nulled column values from grouping sets, which is not correct. This issue is not limited to subqueries. For other types of expressions that are part of grouping items, if they are transformed into another form during preprocessing, they may fail to match lower target items. This can also lead to wrong results with grouping sets. To fix this issue, we introduce a new kind of RTE representing the output of the grouping step, with columns that are the Vars or expressions being grouped on. In the parser, we replace the grouping expressions in the targetlist and HAVING clause with Vars referencing this new RTE, so that the output of the parser directly expresses the semantic requirement that the grouping expressions be gotten from the grouping output rather than computed some other way. In the planner, we first preprocess all the columns of this new RTE and then replace any Vars in the targetlist and HAVING clause that reference this new RTE with the underlying grouping expressions, so that we will have only one instance of a SubPlan node for each subquery contained in the grouping expressions. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com
* Remove emode argument from XLogFileRead() and XLogFileReadAnyTLI()Michael Paquier2024-09-10
| | | | | | | | | | | | | | This change makes the code slightly easier to reason about, because there is actually no need to know if a specific caller of one of these routines should fail hard on a PANIC, or just let it go through with a DEBUG2. The only caller of XLogFileReadAnyTLI() used DEBUG2, and XLogFileRead() has never used its emode. This can be simplified since 1bb2558046cc that has introduced XLogFileReadAnyTLI(), splitting both. Author: Yugo Nagata Discussion: https://postgr.es/m/20240906201043.a640f3b44e755d4db2b6943e@sraoss.co.jp
* Add WAL usage reporting to ANALYZE VERBOSE output.Masahiko Sawada2024-09-09
| | | | | | | | | | | This change adds WAL usage reporting to the output of ANALYZE VERBOSE and autoanalyze reports. It aligns the analyze output with VACUUM, providing consistency. Additionally, it aids in troubleshooting cases where WAL records are generated during analyze operations. Author: Anthonin Bonnefoy Reviewed-by: Masahiko Sawada Discussion: https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com
* Don't bother checking the result of SPI_connect[_ext] anymore.Tom Lane2024-09-09
| | | | | | | | | | | | | | | | | | | | | | | SPI_connect/SPI_connect_ext have not returned any value other than SPI_OK_CONNECT since commit 1833f1a1c in v10; any errors are thrown via ereport. (The most likely failure is out-of-memory, which has always been thrown that way, so callers had better be prepared for such errors.) This makes it somewhat pointless to check these functions' result, and some callers within our code haven't been bothering; indeed, the only usage example within spi.sgml doesn't bother. So it's likely that the omission has propagated into extensions too. Hence, let's standardize on not checking, and document the return value as historical, while not actually changing these functions' behavior. (The original proposal was to change their return type to "void", but that would needlessly break extensions that are conforming to the old practice.) This saves a small amount of boilerplate code in a lot of places. Stepan Neretin Discussion: https://postgr.es/m/CAMaYL5Z9Uk8cD9qGz9QaZ2UBJFOu7jFx5Mwbznz-1tBbPDQZow@mail.gmail.com
* Fix waits of REINDEX CONCURRENTLY for indexes with predicates or expressionsMichael Paquier2024-09-09
| | | | | | | | | | | | | | | | | | | | | As introduced by f9900df5f94, a REINDEX CONCURRENTLY job done for an index with predicates or expressions would set PROC_IN_SAFE_IC in its MyProc->statusFlags, causing it to be ignored by other concurrent operations. Such concurrent index rebuilds should never be ignored, as a predicate or an expression could call a user-defined function that accesses a different table than the table where the index is rebuilt. A test that uses injection points is added, backpatched down to 17. Michail has proposed a different test, but I have added something simpler with more coverage. Oversight in f9900df5f949. Author: Michail Nikolaev Discussion: https://postgr.es/m/CANtu0oj9A3kZVduFTG0vrmGnKB+DCHgEpzOp0qAyOgmks84j0w@mail.gmail.com Backpatch-through: 14
* SQL/JSON: Avoid initializing unnecessary ON ERROR / ON EMPTY stepsAmit Langote2024-09-09
| | | | | | | | | | | | | | | | | When the ON ERROR / ON EMPTY behavior is to return NULL, returning NULL directly from ExecEvalJsonExprPath() suffices. Therefore, there's no need to create separate steps to check the error/empty flag or those to evaluate the the constant NULL expression. This speeds up common cases because the default ON ERROR / ON EMPTY behavior for JSON_QUERY() and JSON_VALUE() is to return NULL. However, these steps are necessary if the RETURNING type is a domain, as constraints on the domain may need to be checked. Reported-by: Jian He <jian.universality@gmail.com> Author: Jian He <jian.universality@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17
* Fix order of parameters in a cost_sort callRichard Guo2024-09-09
| | | | | | | | | | | | | | | | In label_sort_with_costsize, the cost_sort function is called with the parameters 'input_disabled_nodes' and 'input_cost' in the wrong order. This does not cause any plan diffs in the regression tests, because label_sort_with_costsize is only used to label the Sort node nicely for EXPLAIN, and cost numbers are not displayed in regression tests. Oversight in e22253467. Fixed by passing arguments in the right order. Per report from Alexander Lakhin running UBSan. Author: Alexander Lakhin Discussion: https://postgr.es/m/a9b7231d-68bc-f117-a07c-96688f3e6aef@gmail.com
* Add callbacks to control flush of fixed-numbered statsMichael Paquier2024-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds two callbacks in pgstats to have a better control of the flush timing of pgstat_report_stat(), whose operation depends on the three PGSTAT_*_INTERVAL variables: - have_fixed_pending_cb(), to check if a stats kind has any pending data waiting for a flush. This is used as a fast path if there are no pending statistics to flush, and this check is done for fixed-numbered statistics only if there are no variable-numbered statistics to flush. A flush will need to happen if at least one callback reports any pending data. - flush_fixed_cb(), to do the actual flush. These callbacks are currently used by the SLRU, WAL and IO statistics, generalizing the concept for all stats kinds (builtin and custom). The SLRU and IO stats relied each on one global variable to determine whether a flush should happen; these are now local to pgstat_slru.c and pgstat_io.c, cleaning up a bit how the pending flush states are tracked in pgstat.c. pgstat_flush_io() and pgstat_flush_wal() are still required, but we do not need to check their return result anymore. Reviewed-by: Bertrand Drouvot, Kyotaro Horiguchi Discussion: https://postgr.es/m/ZtaVO0N-aTwiAk3w@paquier.xyz
* Update extension lookup routines to use the syscacheMichael Paquier2024-09-07
| | | | | | | | | | | | | | | | The following routines are changed to use the syscache entries added for pg_extension in 490f869d92e5: - get_extension_oid() - get_extension_name() - get_extension_schema() A catalog scan is costly and could easily lead to a noticeable performance impact when called once or more per query, so this is going to be helpful for developers for extension data lookups. Author: Andrei Lepikhov Reviewed-by: Jelte Fennema-Nio Discussion: https://postgr.es/m/529295b2-6ba9-4dae-acd1-20a9c6fb8f9a@gmail.com
* Remove lc_ctype_is_c().Jeff Davis2024-09-06
| | | | | | | | | | | | Instead always fetch the locale and look at the ctype_is_c field. hba.c relies on regexes working for the C locale without needing catalog access, which worked before due to a special case for C_COLLATION_OID in lc_ctype_is_c(). Move the special case to pg_set_regex_collation() now that lc_ctype_is_c() is gone. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
* Fix incorrect pg_stat_io output on 32-bit machines.Tom Lane2024-09-06
| | | | | | | | | | | | | | pg_stat_get_io() applied TimestampTzGetDatum twice to the stat_reset_timestamp value. On 64-bit builds that's harmless because TimestampTzGetDatum is a no-op, but on 32-bit builds it results in displaying garbage in the stats_reset column of the pg_stat_io view. Bug dates to commit a9c70b46d which introduced pg_stat_io, so back-patch to v16 where that came in. Bertrand Drouvot Discussion: https://postgr.es/m/Ztrd+XcPTz1zorkg@ip-10-97-1-34.eu-west-3.compute.internal
* Remove useless unconstifyPeter Eisentraut2024-09-06
| | | | | | Digging into the history, this was not necessary even when it was added, but might have been some time before that. In any case, there is no use for this now.
* SQL/JSON: Fix default ON ERROR behavior for JSON_TABLEAmit Langote2024-09-06
| | | | | | | | | | | | Use EMPTY ARRAY instead of EMPTY. This change does not affect the runtime behavior of JSON_TABLE(), which continues to return an empty relation ON ERROR. It only alters whether the default ON ERROR behavior is shown in the deparsed output. Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17
* SQL/JSON: Fix JSON_TABLE() column deparsingAmit Langote2024-09-06
| | | | | | | | | | | | | | | | The deparsing code in get_json_expr_options() unnecessarily emitted the default column-specific ON ERROR / EMPTY behavior when the top-level ON ERROR behavior in JSON_TABLE was set to ERROR. Fix that by not overriding the column-specific default, determined based on the column's JsonExprOp in get_json_table_columns(), with JSON_BEHAVIOR_ERROR when that is the top-level ON ERROR behavior. Note that this only removes redundancy; the current deparsing output is not incorrect, just redundant. Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17
* Revert recent SQL/JSON related commitsAmit Langote2024-09-06
| | | | | Reverts 68222851d5a8, 565caaa79af, and 3a97460970f, because a few BF animals didn't like one or all of them.
* SQL/JSON: Avoid initializing unnecessary ON ERROR / ON EMPTY stepsAmit Langote2024-09-06
| | | | | | | | | | | | | | | | | When the ON ERROR / ON EMPTY behavior is to return NULL, returning NULL directly from ExecEvalJsonExprPath() suffices. Therefore, there's no need to create separate steps to check the error/empty flag or those to evaluate the the constant NULL expression. This speeds up common cases because the default ON ERROR / ON EMPTY behavior for JSON_QUERY() and JSON_VALUE() is to return NULL. However, these steps are necessary if the RETURNING type is a domain, as constraints on the domain may need to be checked. Reported-by: Jian He <jian.universality@gmail.com> Author: Jian He <jian.universality@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17
* SQL/JSON: Fix default ON ERROR behavior for JSON_TABLEAmit Langote2024-09-06
| | | | | | | | | | | | Use EMPTY ARRAY instead of EMPTY. This change does not affect the runtime behavior of JSON_TABLE(), which continues to return an empty relation ON ERROR. It only alters whether the default ON ERROR behavior is shown in the deparsed output. Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17
* SQL/JSON: Fix JSON_TABLE() column deparsingAmit Langote2024-09-06
| | | | | | | | | | | | | | | | The deparsing code in get_json_expr_options() unnecessarily emitted the default column-specific ON ERROR / EMPTY behavior when the top-level ON ERROR behavior in JSON_TABLE was set to ERROR. Fix that by not overriding the column-specific default, determined based on the column's JsonExprOp in get_json_table_columns(), with JSON_BEHAVIOR_ERROR when that is the top-level ON ERROR behavior. Note that this only removes redundancy; the current deparsing output is not incorrect, just redundant. Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17