postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Invent --transaction-size option for pg_restore.	Tom Lane	2024-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows pg_restore to wrap its commands into transaction blocks, somewhat like --single-transaction, except that we commit and start a new block after every N objects. Using this mode with a size limit of 1000 or so objects greatly reduces the number of transactions consumed by the restore, while preventing any one transaction from taking enough locks to overrun the receiving server's shared lock table. (A value of 1000 works well with the default lock table size of around 6400 locks. Higher --transaction-size values can be used if one has increased the receiving server's lock table size.) Excessive consumption of XIDs has been reported as a problem for pg_upgrade in particular, but it could be bad for any restore; and the change also reduces the number of fsyncs and amount of WAL generated, so it should provide speed benefits too. This patch does not try to make parallel workers batch the SQL commands they issue. The trouble with doing that is that other workers may need to see the objects a worker creates right away. Possibly this can be improved later. In this patch I have hard-wired pg_upgrade to use a transaction size of 1000 divided by the number of parallel restore jobs allowed (without that, we'd still be at risk of overrunning the shared lock table). Perhaps there would be value in adding another pg_upgrade option to allow user control of that, but I'm unsure that it's worth the trouble; I think few users would use it, and any who did would see not that much benefit compared to the default. Patch by me, but the original idea to batch SQL commands during restore is due to Robins Tharakan. Discussion: https://postgr.es/m/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com
*	Rearrange pg_dump's handling of large objects for better efficiency.	Tom Lane	2024-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit c0d5be5d6 caused pg_dump to create a separate BLOB metadata TOC entry for each large object (blob), but it did not touch the ancient decision to put all the blobs' data into a single "BLOBS" TOC entry. This is bad for a few reasons: for databases with millions of blobs, the TOC becomes unreasonably large, causing performance issues; selective restore of just some blobs is quite impossible; and we cannot parallelize either dump or restore of the blob data, since our architecture for that relies on farming out whole TOC entries to worker processes. To improve matters, let's group multiple blobs into each blob metadata TOC entry, and then make corresponding per-group blob data TOC entries. Selective restore using pg_restore's -l/-L switches is then possible, though only at the group level. (Perhaps we should provide a switch to allow forcing one-blob-per-group for users who need precise selective restore and don't have huge numbers of blobs. This patch doesn't do that, instead just hard-wiring the maximum number of blobs per entry at 1000.) The blobs in a group must all have the same owner, since the TOC entry format only allows one owner to be named. In this implementation we also require them to all share the same ACL (grants); the archive format wouldn't require that, but pg_dump's representation of DumpableObjects does. It seems unlikely that either restriction will be problematic for databases with huge numbers of blobs. The metadata TOC entries now have a "desc" string of "BLOB METADATA", and their "defn" string is just a newline-separated list of blob OIDs. The restore code has to generate creation commands, ALTER OWNER commands, and drop commands (for --clean mode) from that. We would need special-case code for ALTER OWNER and drop in any case, so the alternative of keeping the "defn" as directly executable SQL code for creation wouldn't buy much, and it seems like it'd bloat the archive to little purpose. Since we require the blobs of a metadata group to share the same ACL, we can furthermore store only one copy of that ACL, and then make pg_restore regenerate the appropriate commands for each blob. This saves space in the dump file not only by removing duplicative SQL command strings, but by not needing a separate TOC entry for each blob's ACL. In turn, that reduces client-side memory requirements for handling many blobs. ACL TOC entries that need this special processing are labeled as "ACL"/"LARGE OBJECTS nnn..nnn". If we have a blob with a unique ACL, continue to label it as "ACL"/"LARGE OBJECT nnn". We don't actually have to make such a distinction, but it saves a few cycles during restore for the easy case, and it seems like a good idea to not change the TOC contents unnecessarily. The data TOC entries ("BLOBS") are exactly the same as before, except that now there can be more than one, so we'd better give them identifying tag strings. Also, commit c0d5be5d6 put the new BLOB metadata TOC entries into SECTION_PRE_DATA, which perhaps is defensible in some ways, but it's a rather odd choice considering that we go out of our way to treat blobs as data. Moreover, because parallel restore handles the PRE_DATA section serially, this means we'd only get part of the parallelism speedup we could hope for. Move these entries into SECTION_DATA, letting us parallelize the lo_create calls not just the data loading when there are many blobs. Add dependencies to ensure that we won't try to load data for a blob we've not yet created. As this stands, we still generate a separate TOC entry for any comment or security label attached to a blob. I feel comfortable in believing that comments and security labels on blobs are rare, so this patch should be enough to get most of the useful TOC compression for blobs. We have to bump the archive file format version number, since existing versions of pg_restore wouldn't know they need to do something special for BLOB METADATA, plus they aren't going to work correctly with multiple BLOBS entries or multiple-large-object ACL entries. The directory and tar-file format handlers need some work for multiple BLOBS entries: they used to hard-wire the file name as "blobs.toc", which is replaced here with "blobs_<dumpid>.toc". The 002_pg_dump.pl test script also knows about that and requires minor updates. (I had to drop the test for manually-compressed blobs.toc files with LZ4, because lz4's obtuse command line design requires explicit specification of the output file name which seems impractical here. I don't think we're losing any useful test coverage thereby; that test stanza seems completely duplicative with the gzip and zstd cases anyway.) In passing, centralize management of the lo_buf used to hold data while restoring blobs. The code previously had each format handler create lo_buf, which seems rather pointless given that the format handlers all make it the same way. Moreover, the format handlers never use lo_buf directly, making this setup a failure from a separation-of-concerns standpoint. Let's move the responsibility into pg_backup_archiver.c, which is the only module concerned with lo_buf. The reason to do this in this patch is that it allows a centralized fix for the now-false assumption that we never restore blobs in parallel. Also, get rid of dead code in DropLOIfExists: it's been a long time since we had any need to be able to restore to a pre-9.0 server. Discussion: https://postgr.es/m/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com
*	Avoid possible longjmp-induced logic error in PLy_trigger_build_args.	Tom Lane	2024-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "pltargs" variable wasn't marked volatile, which makes it unsafe to change its value within the PG_TRY block. It looks like the worst outcome would be to fail to release a refcount on Py_None during an (improbable) error exit, which would likely go unnoticed in the field. Still, it's a bug. A one-liner fix could be to mark pltargs volatile, but on the whole it seems cleaner to arrange things so that we don't change its value within PG_TRY. Per report from Xing Guo. This has been there for quite awhile, so back-patch to all supported branches. Discussion: https://postgr.es/m/CACpMh+DLrk=fDv07MNpBT4J413fDAm+gmMXgi8cjPONE+jvzuw@mail.gmail.com
*	Fix assorted resource leaks in new pg_createsubscriber code.	Tom Lane	2024-04-01
\| \| \| \| \| \| \| \| \| \| \|	Various error paths did not release resources before returning. While it's likely that the program would just exit shortly later, none of the functions in question have summary exit(1) calls, so they should not be assuming that. Ranier Vilela and Tom Lane, per reports from Coverity Discussion: https://postgr.es/m/CAEudQAr2_SZFxB4kXJiL4+2UaNZxUk5UBJtj0oXyJYMGZu-03g@mail.gmail.com
*	Handle non-chain tuples outside of heap_prune_chain()	Heikki Linnakangas	2024-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Handle dead branches of aborted HOT chains outside heap_prune_chain() as a separate phase. This simplifies the logic in heap_prune_chain(), as well as allowing us to clean up more RECENTLY_DEAD -> DEAD chains. To accomplish this efficiently, partition tuples into HOT and non-HOT while first collecting visibility information for each tuple in heap_page_prune(). Then call heap_prune_chain() only on potential chain members. Then mop up the leftover HOT tuples afterwards. As part of this, keep track of which items on page have already been processed, in 'processed' array. This replaces the 'marked' array which was only set for tuples marked for removal or redirection. The 'processed' array is updated also for items that are left unchanged, when we conclude that an item can be left unchanged. At the end of pruning, every item on the page should be marked as processed in the array; an assertion is added for that. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
*	Refactor heap_prune_chain()	Heikki Linnakangas	2024-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Keep track of the number of deleted tuples in PruneState and record this information when recording a tuple dead, unused or redirected. This removes a special case from the traversal and chain processing logic as well as setting a precedent of recording the impact of prune actions in the record functions themselves. This paradigm will be used in future commits which move tracking of additional statistics on pruning actions from lazy_scan_prune() to heap_prune_chain(). Simplify heap_prune_chain()'s chain traversal logic by handling each case explicitly. That is, do not attempt to share code when processing different types of chains. For each category of chain, process it specifically and procedurally: first handling the root, then any intervening tuples, and, finally, the end of the chain. While we are at it, add a few new comments to heap_prune_chain() clarifying some special cases involving RECENTLY_DEAD tuples. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
*	Minor refactoring in heap_page_prune	Heikki Linnakangas	2024-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pass 'page', 'blockno' and 'maxoff' to heap_prune_chain() as arguments, so that it doesn't need to fetch them from the buffer. This saves a few cycles per chain. Remove the "if (off_loc != NULL)" checks, and require the caller to pass a non-NULL 'off_loc'. Pass a pointer to a dummy local variable when it's not needed. Those checks are cheap, but it's still better to avoid them in the per-chain loops when we can do so easily. The CPU time saving from these changes are hardly measurable, but fewer instructions is good anyway, so why not. I spotted the potential for these while reviewing Melanie Plageman's patch set to combine prune and freeze records. Discussion: https://www.postgresql.org/message-id/CAAKRu_abm2tHhrc0QSQa%3D%3DsHe%3DVA1%3Doz1dJMQYUOKuHmu%2B9Xrg%40mail.gmail.com
*	Add new COPY option LOG_VERBOSITY.	Masahiko Sawada	2024-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds a new COPY option LOG_VERBOSITY, which controls the amount of messages emitted during processing. Valid values are 'default' and 'verbose'. This is currently used in COPY FROM when ON_ERROR option is set to ignore. If 'verbose' is specified, a NOTICE message is emitted for each discarded row, providing additional information such as line number, column name, and the malformed value. This helps users to identify problematic rows that failed to load. Author: Bharath Rupireddy Reviewed-by: Michael Paquier, Atsushi Torikoshi, Masahiko Sawada Discussion: https://www.postgresql.org/message-id/CALj2ACUk700cYhx1ATRQyRw-fBM%2BaRo6auRAitKGff7XNmYfqQ%40mail.gmail.com
*	Revert "Speed up tail processing when hashing aligned C strings"	John Naylor	2024-03-31
\| \| \| \| \| \| \|	This reverts commit 07f0f6abfc7f6c55cede528d9689dedecefc734a. This has shown failures on both Valgrind and big-endian machines, per members skink and pike.
*	Speed up tail processing when hashing aligned C strings	John Naylor	2024-03-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After encountering the NUL terminator, the word-at-a-time loop exits and we must hash the remaining bytes. Previously we calculated the terminator's position and re-loaded the remaining bytes from the input string. We already have all the data we need in a register, so let's just mask off the bytes we need and hash them immediately. The mask can be cheaply computed without knowing the terminator's position. We still need that position for the length calculation, but the CPU can now do that in parallel with other work, shortening the dependency chain. Ants Aasma and John Naylor Discussion: https://postgr.es/m/CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1%2BPC0a_w1fwY4iGMqUA%40mail.gmail.com
*	Let table AM insertion methods control index insertion	Alexander Korotkov	2024-03-30
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, the executor did index insert unconditionally after calling table AM interface methods tuple_insert() and multi_insert(). This commit introduces the new parameter insert_indexes for these two methods. Setting '*insert_indexes' to true saves the current logic. Setting it to false indicates that table AM cares about index inserts itself and doesn't want the caller to do that. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Pavel Borisov, Matthias van de Meent, Mark Dilger
*	Custom reloptions for table AM	Alexander Korotkov	2024-03-30
\| \| \| \| \| \| \| \| \| \| \|	Let table AM define custom reloptions for its tables. This allows to specify AM-specific parameters by WITH clause when creating a table. The code may use some parts from prior work by Hao Wu. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Discussion: https://postgr.es/m/AMUA1wBBBxfc3tKRLLdU64rb.1.1683276279979.Hmail.wuhao%40hashdata.cn Reviewed-by: Reviewed-by: Pavel Borisov, Matthias van de Meent
*	Generalize relation analyze in table AM interface	Alexander Korotkov	2024-03-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there is just one algorithm for sampling tuples from a table written in acquire_sample_rows(). Custom table AM can just redefine the way to get the next block/tuple by implementing scan_analyze_next_block() and scan_analyze_next_tuple() API functions. This approach doesn't seem general enough. For instance, it's unclear how to sample this way index-organized tables. This commit allows table AM to encapsulate the whole sampling algorithm (currently implemented in acquire_sample_rows()) into the relation_analyze() API function. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Pavel Borisov, Matthias van de Meent
*	Add pg_basetype() function to extract a domain's base type.	Tom Lane	2024-03-30
\| \| \| \| \| \| \| \| \| \| \| \| \|	This SQL-callable function behaves much like our internal utility function getBaseType(), except it returns NULL rather than failing for an invalid type OID. (That behavior is modeled on our experience with other catalog-inquiry functions such as the ACL checking functions.) The key advantage over doing a join to pg_type is that it will loop as needed to find the bottom base type of a nest of domains. Steve Chavez, reviewed by jian he and others Discussion: https://postgr.es/m/CAGRrpzZSX8j=MQcbCSEisFA=ic=K3bknVfnFjAv1diVJxFHJvg@mail.gmail.com
*	Add support for MERGE ... WHEN NOT MATCHED BY SOURCE.	Dean Rasheed	2024-03-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows MERGE commands to include WHEN NOT MATCHED BY SOURCE actions, which operate on rows that exist in the target relation, but not in the data source. These actions can execute UPDATE, DELETE, or DO NOTHING sub-commands. This is in contrast to already-supported WHEN NOT MATCHED actions, which operate on rows that exist in the data source, but not in the target relation. To make this distinction clearer, such actions may now be written as WHEN NOT MATCHED BY TARGET. Writing WHEN NOT MATCHED without specifying BY SOURCE or BY TARGET is equivalent to writing WHEN NOT MATCHED BY TARGET. Dean Rasheed, reviewed by Alvaro Herrera, Ted Yu and Vik Fearing. Discussion: https://postgr.es/m/CAEZATCWqnKGc57Y_JanUBHQXNKcXd7r=0R4NEZUVwP+syRkWbA@mail.gmail.com
*	Add unicode_strtitle() for Unicode Default Case Conversion.	Jeff Davis	2024-03-29
\| \| \| \| \| \| \| \| \| \| \| \| \|	This brings the titlecasing implementation for the builtin provider out of formatting.c and into unicode_case.c, along with unicode_strlower() and unicode_strupper(). Accepts an arbitrary word boundary callback. Simple for now, but can be extended to support the Unicode Default Case Conversion algorithm with full case mapping. Discussion: https://postgr.es/m/3bc653b5d562ae9e2838b11cb696816c328a489a.camel@j-davis.com Reviewed-by: Peter Eisentraut
*	Remove superfluous trailing semicolons	Daniel Gustafsson	2024-03-29
\| \| \| \| \| \| \| \|	Two semicolons were accidentally added to rows which were already terminated semicolons. While harmless, fix by removing these. Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAMbWs4_fnJ0+yOgFioswzLE7t6R8P6cqbuacFVeZqbESFAjs1A@mail.gmail.com
*	Use version for builtin collations.	Jeff Davis	2024-03-29
\| \| \| \| \| \| \| \|	Given that the version field already exists, there's little reason not to use it. Suggestion from Peter Eisentraut. Discussion: https://postgr.es/m/613c120a-5413-4fa7-a501-6590eae558f8@eisentraut.org Reviewed-by: Peter Eisentraut
*	Try to stabilize flappy test result.	Tom Lane	2024-03-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This recently-added test case checks the plan of an inner join between two identical tables. It's just chance which join order the planner will pick, and in the presence of any variation in the underlying statistics, the displayed plan might change. Add a WHERE condition to break the cost symmetry and hopefully stabilize matters. (We're still trying to understand exactly why the underlying statistics aren't as stable as intended, but this seems like a good change anyway, since this test would surely bite us again in future.) While here, clean up assorted comment spelling, grammar, and whitespace problems. Discussion: https://postgr.es/m/4168116.1711720146@sss.pgh.pa.us
*	Add allow_alter_system GUC.	Robert Haas	2024-03-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is marked PGC_SIGHUP, so it can only be set in a configuration file, not anywhere else; and it is also marked GUC_DISALLOW_IN_AUTO_FILE, so it can't be set using ALTER SYSTEM. When set to false, the ALTER SYSTEM command is disallowed. There was considerable concern that this would be misinterpreted as a security feature, which it is not, because a determined superuser has various ways of bypassing it. Hence, a lot of work has gone into wordsmithing the documentation, in the hopes of avoiding any such confusion. Jelte Fennemia-Nio and Gabriele Bartolini, with wording suggestions for the documentation from many others. Discussion: http://postgr.es/m/CA%2BVUV5rEKt2%2BCdC_KUaPoihMu%2Bi5ChT4WVNTr4CD5-xXZUfuQw%40mail.gmail.com
*	Allow "internal" subtransactions in parallel mode.	Tom Lane	2024-03-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow use of BeginInternalSubTransaction() in parallel mode, so long as the subtransaction doesn't attempt to acquire an XID or increment the command counter. Given those restrictions, the other parallel processes don't need to know about the subtransaction at all, so this should be safe. The benefit is that it allows subtransactions intended for error recovery, such as pl/pgsql exception blocks, to be used in PARALLEL SAFE functions. Another reason for doing this is that the API of BeginInternalSubTransaction() doesn't allow reporting failure. pl/python for one, and perhaps other PLs, copes very poorly with an error longjmp out of BeginInternalSubTransaction(). The headline feature of this patch removes the only easily-triggerable failure case within that function. There remain some resource-exhaustion and similar cases, which we now deal with by promoting them to FATAL errors, so that callers need not try to clean up. (It is likely that such errors would leave us with corrupted transaction state inside xact.c, making recovery difficult if not impossible anyway.) Although this work started because of a report of a pl/python crash, we're not going to do anything about that in the back branches. Back-patching this particular fix is obviously not very wise. While we could contemplate some narrower band-aid, pl/python is already an untrusted language, so it seems okay to classify this as a "so don't do that" case. Patch by me, per report from Hao Zhang. Thanks to Robert Haas for review. Discussion: https://postgr.es/m/CALY6Dr-2yLVeVPhNMhuBnRgOZo1UjoTETgtKBx1B2gUi8yy+3g@mail.gmail.com
*	ALTER TABLE: rework determination of access method ID	Alvaro Herrera	2024-03-28
\| \| \| \| \| \| \| \| \|	Avoid setting an access method OID for relation kinds that don't take one. Code review for new feature added in 374c7a229042. Author: Justin Pryzby <pryzby@telsasoft.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/e5516ac1-5264-c3c0-d822-9e6f614ea93b@gmail.com
*	Update comment in set_dummy_rel_pathlist().	Tom Lane	2024-03-28
\| \| \| \| \| \| \| \| \| \|	This comment claimed that set_dummy_rel_pathlist() has callers other than (possibly indirectly) set_rel_size(). It doesn't, so revise the argument to not rely on that. Noted by Richard Guo. Discussion: https://postgr.es/m/CAMbWs4-KFEU_fDuJPNCOkUu3rwvZvKBEytkd9VrM4kH4-2h1CQ@mail.gmail.com
*	Remove translation markers from libpq-be-fe-helpers.h	Alvaro Herrera	2024-03-28
\| \| \| \| \| \| \|	Apparently these markers cause the modules to not link correctly in some platforms, at least per buildfarm member indri; moreover, this code is only used in modules that don't have a translation. If we someday add i18n support to contrib/ it might be worth revisiting this.
*	libpq-be-fe-helpers.h: wrap new cancel APIs	Alvaro Herrera	2024-03-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 61461a300c1c introduced new functions to libpq for cancelling queries. This commit introduces a helper function that backend-side libraries and extensions can use to invoke those. This function takes a timeout and can itself be interrupted while it is waiting for a cancel request to be sent and processed, instead of being blocked. This replaces the usage of the old functions in postgres_fdw and dblink. Finally, it also adds some test coverage for the cancel support in postgres_fdw. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://postgr.es/m/CAGECzQT_VgOWWENUqvUV9xQmbaCyXjtRRAYO8W07oqashk_N+g@mail.gmail.com
*	Remove obsolete comment about VACUUM retrying pruning	Heikki Linnakangas	2024-03-28
\| \| \| \| \| \| \| \|	Commit 1ccc1e05ae removed the retry logic that the comment talked about. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240328015326.x5gnzsohl6j23b42@liskov
*	Improve tab completion for ALTER TABLE ALTER COLUMN SET in psql.	Masahiko Sawada	2024-03-28
\| \| \| \| \| \| \| \| \|	The commit changes the tab completion to add DATA TYPE after ALTER TABLE ... ALTER COLUMN ... SET. Author: Vignesh C Reviewed-by: Shubham Khanna, Masahiko Sawada Discussion: https://postgr.es/m/CALDaNm1aEdJb-QJi%3DGWStkfj_%2BEDUK_VtDkn%2BTjQ2z7HyU0MBw%40mail.gmail.com
*	Improve style of pg_lfind32().	Nathan Bossart	2024-03-27
\| \| \| \| \| \| \| \|	This commit simplifies pg_lfind32() a bit by moving the standard one-by-one linear search code to an inline helper function. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20240327013616.GA3940109%40nathanxps13
*	Rethink create and attach APIs of shared TidStore.	Masahiko Sawada	2024-03-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the behavior of TidStoreCreate() was inconsistent between local and shared TidStore instances in terms of memory limitation. For local TidStore, a memory context was created with initial and maximum memory block sizes, as well as a minimum memory context size, based on the specified max_bytes values. However, for shared TidStore, the provided DSA area was used for TID storage. Although commit bb952c8c8b allowed specifying the initial and maximum DSA segment sizes, callers would have needed to clamp their own limits, which was not consistent and user-friendly. With this commit, when creating a shared TidStore, a dedicated DSA area is created for TID storage instead of using a provided DSA area. The initial and maximum DSA segment sizes are chosen based on the specified max_bytes. Other processes can attach to the shared TidStore using the handle of the created DSA returned by the new TidStoreGetDSA() function and the DSA pointer returned by TidStoreGetHandle(). The created DSA has the same lifetime as the shared TidStore and is deleted when all processes detach from it. To improve clarity, the TidStoreCreate() function has been divided into two separate functions: TidStoreCreateLocal() and TidStoreCreateShared(). Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAD21AoAyc1j%3DBCdUqZfk6qbdjZ68UgRx1Gkpk0oah4K7S0Ri9g%40mail.gmail.com
*	Run perltidy on generate-unicode_version.pl.	Jeff Davis	2024-03-27
\|
*	Fix unnecessary use of moving-aggregate mode with non-moving frame.	Tom Lane	2024-03-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a plain aggregate is used as a window function, and the window frame start is specified as UNBOUNDED PRECEDING, the frame's head cannot move so we do not need to use moving-aggregate mode. The check for that was put into initialize_peragg(), failing to notice that ExecInitWindowAgg() calls that function before it's filled in winstate->frameOptions. Since makeNode() would have zeroed the field, this didn't provoke uninitialized-value complaints, nor would the erroneous decision have resulted in more than a little inefficiency. Still, it's wrong, so move the initialization of winstate->frameOptions earlier to make it work properly. While here, also fix a thinko in a comment. Both errors crept in in commit a9d9acbf2 which introduced the moving-aggregate mode. Spotted by Vallimaharajan G. Back-patch to all supported branches. Discussion: https://postgr.es/m/18e7f2a5167.fe36253866818.977923893562469143@zohocorp.com
*	Rename COMPAT_OPTIONS_CLIENT to COMPAT_OPTIONS_OTHER.	Robert Haas	2024-03-27
\| \| \| \| \| \| \| \| \| \| \| \|	The user-facing name is "Other Platforms and Clients", but the internal name seems too focused on clients specifically, especially given the plan to add a new setting to this session that is about platform or deployment model compatibility rather than client compatibility. Jelte Fennema-Nio Discussion: http://postgr.es/m/CAGECzQTfMbDiM6W3av+3weSnHxJvPmuTEcjxVvSt91sQBdOxuQ@mail.gmail.com
*	Fix unstable aggregate regression test	David Rowley	2024-03-28
\| \| \| \| \| \| \| \| \| \| \| \| \|	Buildfarm member avocet has shown a plan change by switching the finalize aggregate stage to use a GroupAggregate rather than a HashAggregate. This is consistent with autovacuum having triggered on the table, per analysis by Alexander Lakhin. Fix this by disabling autovacuum on the table. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/d4493a28-589a-5328-fed5-250f2d7d3e2a@gmail.com Backpatch-through: 16, where this test was added.
*	Add functions to generate random numbers in a specified range.	Dean Rasheed	2024-03-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds 3 new variants of the random() function: random(min integer, max integer) returns integer random(min bigint, max bigint) returns bigint random(min numeric, max numeric) returns numeric Each returns a random number x in the range min <= x <= max. For the numeric function, the number of digits after the decimal point is equal to the number of digits that "min" or "max" has after the decimal point, whichever has more. The main entry points for these functions are in a new C source file. The existing random(), random_normal(), and setseed() functions are moved there too, so that they can all share the same PRNG state, which is kept private to that file. Dean Rasheed, reviewed by Jian He, David Zhang, Aleksander Alekseev, and Tomas Vondra. Discussion: https://postgr.es/m/CAEZATCV89Vxuq93xQdmc0t-0Y2zeeNQTdsjbmV7dyFBPykbV4Q@mail.gmail.com
*	Fix some typos and grammar issues from commit 87985cc92522	Alexander Korotkov	2024-03-27
\| \| \| \|	Reported-by: Alexander Lakhin
*	Fix random failure in 004_subscription.	Amit Kapila	2024-03-27
\| \| \| \| \| \| \| \| \| \| \| \|	After the upgrade, the failed test was ensuring that the changes made on the publisher should be replicated to the subscriber. We missed waiting for one of the subscriptions to catch up. Per buildfarm Author: Vignesh C Reviewed-by: Kuroda Hayato Discussion: https://postgr.es/m/CALDaNm0z=fLtio1h50K8WossUGXU+gy0H9y9=RYh1DDZiq2EDw@mail.gmail.com
*	Change last_inactive_time to inactive_since in pg_replication_slots.	Amit Kapila	2024-03-27
\| \| \| \| \| \| \| \| \| \| \| \| \|	Commit a11f330b55 added last_inactive_time to show the last time the slot was inactive. But, it tells the last time that a currently-inactive slot previously WAS active. This could be unclear, so we changed the name to inactive_since. Reported-by: Robert Haas Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Shveta Malik, Amit Kapila Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com Discussion: https://postgr.es/m/CALj2ACUXS0SfbHzsX8bqo+7CZhocsV52Kiu7OWGb5HVPAmJqnA@mail.gmail.com
*	Allow specifying initial and maximum segment sizes for DSA.	Masahiko Sawada	2024-03-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the DSA segment size always started with 1MB and grew up to DSA_MAX_SEGMENT_SIZE. It was inconvenient in certain scenarios, such as when the caller desired a soft constraint on the total DSA segment size, limiting it to less than 1MB. This commit introduces the capability to specify the initial and maximum DSA segment sizes when creating a DSA area, providing more flexibility and control over memory usage. Reviewed-by: John Naylor, Tomas Vondra Discussion: https://postgr.es/m/CAD21AoAYGGC1ePjVX0H%2Bpp9rH%3D9vuPK19nNOiu12NprdV5TVJA%40mail.gmail.com
*	Fix compiler warning for pg_lfind32().	Nathan Bossart	2024-03-26
\| \| \| \| \| \| \| \| \| \| \|	The newly-introduced "one_by_one" label produces -Wunused-label warnings when building without SIMD support. To fix, move the label into the SIMD section of this function. Oversight in commit 7644a7340c. Reported-by: Tom Lane Discussion: https://postgr.es/m/3189995.1711495704%40sss.pgh.pa.us
*	Remove some redundant set_cheapest() calls.	Tom Lane	2024-03-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit e2fa76d80 centralized the responsibility for doing set_cheapest() for a baserel, but these functions added later seemingly didn't get the memo. There's no apparent reason why we need the cheapest path for these relation types to be available any sooner than it is for other base relation types, so delete the duplicate calls. Doesn't save much since there's only one path in these cases, but it might improve clarity. Richard Guo Discussion: https://postgr.es/m/CAMbWs4-KFEU_fDuJPNCOkUu3rwvZvKBEytkd9VrM4kH4-2h1CQ@mail.gmail.com
*	Optimize roles_is_member_of() with a Bloom filter.	Nathan Bossart	2024-03-26
\| \| \| \| \| \| \| \| \| \| \|	When the list of roles gathered by roles_is_member_of() grows very large, a Bloom filter is created to help avoid some linear searches through the list. The threshold for creating the Bloom filter is set arbitrarily high and may require future adjustment. Suggested-by: Tom Lane Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz%3DvggErdSG7pv2s6vmmTOLJSg%40mail.gmail.com
*	Fix failure of ALTER FOREIGN TABLE SET SCHEMA to move sequences.	Tom Lane	2024-03-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ordinary ALTER TABLE SET SCHEMA will also move any owned sequences into the new schema. We failed to do likewise for foreign tables, because AlterTableNamespaceInternal believed that only certain relkinds could have indexes, owned sequences, or constraints. We could simply add foreign tables to that relkind list, but it seems likely that the same oversight could be made again in future. Instead let's remove the relkind filter altogether. These functions shouldn't cost much when there are no objects that they need to process, and surely this isn't an especially performance-critical case anyway. Per bug #18407 from Vidushi Gupta. Back-patch to all supported branches. Discussion: https://postgr.es/m/18407-4fd07373d252c6a0@postgresql.org
*	Micro-optimize pg_lfind32().	Nathan Bossart	2024-03-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit improves the performance of pg_lfind32() in many cases by modifying it to process the remaining "tail" of elements with SIMD instructions instead of processing them one-by-one. Since the SIMD code processes a large block of elements, this means that we will process a subset of elements more than once, but that won't affect the correctness of the result, and testing has shown that this helps more cases than it regresses. With this change, the standard one-by-one linear search code is only used for small arrays and for platforms without SIMD support. Suggested-by: John Naylor Reviewed-by: John Naylor Discussion: https://postgr.es/m/20231129171526.GA857928%40nathanxps13
*	Propagate pathkeys from CTEs up to the outer query.	Tom Lane	2024-03-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we know the sort order of a CTE's output, and it is relevant to the outer query, label the CTE's outer-query access path using those pathkeys. This may enable optimizations such as avoiding a sort in the outer query. The code for hoisting pathkeys into the outer query already exists for regular RTE_SUBQUERY subqueries, but it wasn't getting used for CTEs, possibly out of concern for maintaining an optimization fence between the CTE and the outer query. However, on the same arguments used for commit f7816aec2, there seems no harm in letting the outer query know what the inner query decided to do. In support of this, we now remember the best Path as well as Plan for each subquery for the rest of the planner run. There may be future applications for having that at hand, and it surely costs little to build one more List. Richard Guo (minor mods by me) Discussion: https://postgr.es/m/CAMbWs49xYd3f8CrE8-WW3--dV1zH_sDSDn-vs2DzHj81Wcnsew@mail.gmail.com
*	C comment: mention no doc for negative start of substring(text)	Bruce Momjian	2024-03-26
\| \| \| \| \| \|	Also add URL to hackers discussion. Backpatch-through: master
*	Allow "make check"-style testing to work with musl C library.	Tom Lane	2024-03-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The musl dynamic linker saves a pointer to the process' environment value of LD_LIBRARY_PATH very early in startup. When we move/clobber the environment to make more room for ps status strings, we clobber that value and thereby prevent libraries from being found via LD_LIBRARY_PATH, which breaks the use of a temporary installation for testing purposes. To fix, stop collecting usable space for ps status if we notice that the variable we are about to clobber is LD_LIBRARY_PATH. This will result in some reduction in how long the ps status can be, but it's only likely to occur in temporary test contexts, so it doesn't seem like a big problem. In any case, we don't have to do it if we see we are on glibc, which surely is where the majority of our Linux testing is done. Thomas Munro, Bruce Momjian, and Tom Lane, per report from Wolfgang Walther. Back-patch to all supported branches, with the hope that we'll set up a buildfarm animal to test on this platform. Discussion: https://postgr.es/m/fddd1cd6-dc16-40a2-9eb5-d7fef2101488@technowledgy.de
*	Remove ObjectClass type	Peter Eisentraut	2024-03-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ObjectClass is an enum whose values correspond to catalog OIDs. But the extra layer of redirection, which is used only in small parts of the code, and the similarity to ObjectType, are confusing and cumbersome. One advantage has been that some switches processing the OCLASS enum don't have "default:" cases. This is so that the compiler tells us when we fail to add support for some new object class. But you can also handle that with some assertions and proper test coverage. It's not even clear how strong this benefit is. For example, in AlterObjectNamespace_oid(), you could still put a new OCLASS into the "ignore object types that don't have schema-qualified names" case, and it might or might not be wrong. Also, there are already various OCLASS switches that do have a default case, so it's not even clear what the preferred coding style should be. Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw%40mail.gmail.com
*	Message fixes for pg_createsubscriber	Peter Eisentraut	2024-03-26
\| \| \| \| \|	Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/20240326.140116.1116279856046587865.horikyota.ntt@gmail.com
*	Fix inconsistent function prototypes with function definitions.	Masahiko Sawada	2024-03-26
\| \| \| \| \| \| \|	Introduced by 30e144287a. Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAD21AoCaDT%2B-ZaVjbtvumms0tyyHPNLELK2UX-MLG9XCgioaNw%40mail.gmail.com
*	Fix a calculation in TidStoreCreate().	Masahiko Sawada	2024-03-26
\| \| \| \| \| \| \| \| \| \| \| \|	Since we expect that the max_bytes is in bytes, not in kilobytes, it should not be multiplied by 1024. Introduced by 30e144287a. Reported-by: John Naylor, David Rowley Reviewed-by: John Naylor Discussion: https://postgr.es/m/CANWCAZZTE-14ofsucofTuhFsfuDGBNf%3DNZb22TMYT8bxA41oQQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvojg82NDaDEpj1WEZSbVTafj%3DDRmW%2BFrkBdW8ScL4OFxA%40mail.gmail.com