postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
...
*	Add bit_count SQL function	Peter Eisentraut	2021-03-23
\| \| \| \| \| \| \| \| \| \| \|	This function for bit and bytea counts the set bits in the bit or byte string. Internally, we use the existing popcount functionality. For the name, after some discussion, we settled on bit_count, which also exists with this meaning in MySQL, Java, and Python. Author: David Fetter <david@fetter.org> Discussion: https://www.postgresql.org/message-id/flat/20201230105535.GJ13234@fetter.org
*	Add per-index stats information in verbose logs of autovacuum	Michael Paquier	2021-03-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Once a relation's autovacuum is completed, the logs include more information about this relation state if the threshold of log_autovacuum_min_duration (or its relation option) is reached, with for example contents about the statistics of the VACUUM operation for the relation, WAL and system usage. This commit adds more information about the statistics of the relation's indexes, with one line of logs generated for each index. The index stats were already calculated, but not printed in the context of autovacuum yet. While on it, some refactoring is done to keep track of the index statistics directly within LVRelStats, simplifying some routines related to parallel VACUUMs. Author: Masahiko Sawada Reviewed-by: Michael Paquier, Euler Taveira Discussion: https://postgr.es/m/CAD21AoAy6SxHiTivh5yAPJSUE4S=QRPpSZUdafOSz0R+fRcM6Q@mail.gmail.com
*	Fix dangling pointer reference in stream_cleanup_files.	Amit Kapila	2021-03-23
\| \| \| \| \| \| \|	We can't access the entry after it is removed from dynahash. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Ps-pL++f6CJwPx2+vUqXuew=Xt-9Bi-6kCyxn+Fwi2M7w@mail.gmail.com
*	Use correct spelling of statistics kind	Tomas Vondra	2021-03-23
\| \| \| \| \| \| \| \|	A couple error messages and comments used 'statistic kind', not the correct 'statistics kind'. Fix and backpatch all the way back to 10, where extended statistics were introduced. Backpatch-through: 10
*	Change the type of WalReceiverWaitStart wait event from Client to IPC.	Fujii Masao	2021-03-23
\| \| \| \| \| \| \| \| \| \| \| \| \|	Previously the type of this wait event was Client. But while this wait event is being reported, walreceiver process is waiting for the startup process to set initial data for streaming replication. It's not waiting for any activity on a socket connected to a user application or walsender. So this commit changes the type for WalReceiverWaitStart wait event to IPC. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/cdacc27c-37ff-f1a4-20e2-ce19933abfcc@oss.nttdata.com
*	Add macro RelationIsPermanent() to report relation permanence	Bruce Momjian	2021-03-22
\| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, to check relation permanence, the Relation's Form_pg_class structure member relpersistence was compared to the value RELPERSISTENCE_PERMANENT ("p"). This commit adds the macro RelationIsPermanent() and is used in appropirate places to simplify the code. This matches other RelationIs* macros. This macro will be used in more places in future cluster file encryption patches. Discussion: https://postgr.es/m/20210318153134.GH20766@tamriel.snowman.net
*	Optimize allocations in bringetbitmap	Tomas Vondra	2021-03-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bringetbitmap function allocates memory for various purposes, which may be quite expensive, depending on the number of scan keys. Instead of allocating them separately, allocate one bit chunk of memory an carve it into smaller pieces as needed - all the pieces have the same lifespan, and it saves quite a bit of CPU and memory overhead. Author: Tomas Vondra <tomas.vondra@postgresql.org> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Mark Dilger <hornschnorter@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
*	Move IS [NOT] NULL handling from BRIN support functions	Tomas Vondra	2021-03-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The handling of IS [NOT] NULL clauses is independent of an opclass, and most of the code was exactly the same in both minmax and inclusion. So instead move the code from support procedures to the AM. This simplifies the code - especially the support procedures - quite a bit, as they don't need to care about NULL values and flags at all. It also means the IS [NOT] NULL clauses can be evaluated without invoking the support procedure. Author: Tomas Vondra <tomas.vondra@postgresql.org> Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru> Reviewed-by: Mark Dilger <hornschnorter@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
*	Pass all scan keys to BRIN consistent function at once	Tomas Vondra	2021-03-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit changes how we pass scan keys to BRIN consistent function. Instead of passing them one by one, we now pass all scan keys for a given attribute at once. That makes the consistent function a bit more complex, as it has to loop through the keys, but it does allow more elaborate opclasses that can use multiple keys to eliminate ranges much more effectively. The existing BRIN opclasses (minmax, inclusion) don't really benefit from this change. The primary purpose is to allow future opclases to benefit from seeing all keys at once. This does change the BRIN API, because the signature of the consistent function changes (a new parameter with number of scan keys). So this breaks existing opclasses, and will require supporting two variants of the code for different PostgreSQL versions. We've considered supporting two variants of the consistent, but we've decided not to do that. Firstly, there's another patch that moves handling of NULL values from the opclass, which means the opclasses need to be updated anyway. Secondly, we're not aware of any out-of-core BRIN opclasses, so it does not seem worth the extra complexity. Bump catversion, because of pg_proc changes. Author: Tomas Vondra <tomas.vondra@postgresql.org> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Mark Dilger <hornschnorter@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru> Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
*	Move bsearch_arg to src/port	Tomas Vondra	2021-03-23
\| \| \| \| \| \|	Until now the bsearch_arg function was used only in extended statistics code, so it was defined in that code. But we already have qsort_arg in src/port, so let's move it next to it.
*	Short-circuit slice requests that are for more than the object's size.	Tom Lane	2021-03-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	substring(), and perhaps other callers, isn't careful to pass a slice length that is no more than the datum's true size. Since toast_decompress_datum_slice's children will palloc the requested slice length, this can waste memory. Also, close study of the liblz4 documentation suggests that it is dependent on the caller to not ask for more than the correct amount of decompressed data; this squares with observed misbehavior with liblz4 1.8.3. Avoid these problems by switching to the normal full-decompression code path if the slice request is >= datum's decompressed size. Tom Lane and Dilip Kumar Discussion: https://postgr.es/m/507597.1616370729@sss.pgh.pa.us
*	Mostly-cosmetic adjustments of TOAST-related macros.	Tom Lane	2021-03-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The authors of bbe0a81db hadn't quite got the idea that macros named like SOMETHING_4B_C were only meant for internal endianness-related details in postgres.h. Choose more legible names for macros that are intended to be used elsewhere. Rearrange postgres.h a bit to clarify the separation between those internal macros and ones intended for wider use. Also, avoid using the term "rawsize" for true decompressed size; we've used "extsize" for that, because "rawsize" generally denotes total Datum size including header. This choice seemed particularly unfortunate in tests that were comparing one of these meanings to the other. This patch includes a couple of not-purely-cosmetic changes: be sure that the shifts aligning compression methods are unsigned (not critical today, but will be when compression method 2 exists), and fix broken definition of VARATT_EXTERNAL_GET_COMPRESSION (now VARATT_EXTERNAL_GET_COMPRESS_METHOD), whose callers worked only accidentally. Discussion: https://postgr.es/m/574197.1616428079@sss.pgh.pa.us
*	Error on invalid TOAST compression in CREATE or ALTER TABLE.	Robert Haas	2021-03-22
\| \| \| \| \| \| \| \| \|	The previous coding treated an invalid compression method name as equivalent to the default, which is certainly not right. Justin Pryzby Discussion: http://postgr.es/m/20210321235544.GD4203@telsasoft.com
*	More code cleanup for configurable TOAST compression.	Robert Haas	2021-03-22
\| \| \| \| \| \| \| \| \|	Remove unused macro. Fix confusion about whether a TOAST compression method is identified by an OID or a char. Justin Pryzby Discussion: http://postgr.es/m/20210321235544.GD4203@telsasoft.com
*	Fix concurrency issues with WAL segment recycling on Windows	Michael Paquier	2021-03-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit is mostly a revert of aaa3aed, that switched the routine doing the internal renaming of recycled WAL segments to use on Windows a combination of CreateHardLinkA() plus unlink() instead of rename(). As reported by several users of Postgres 13, this is causing concurrency issues when manipulating WAL segments, mostly in the shape of the following error: LOG: could not rename file "pg_wal/000000XX000000YY000000ZZ": Permission denied This moves back to a logic where a single rename() (well, pgrename() for Windows) is used. This issue has proved to be hard to hit when I tested it, facing it only once with an archive_command that was not able to do its work, so it is environment-sensitive. The reporters of this issue have been able to confirm that the situation improved once we switched back to a single rename(). In order to check things, I have provided to the reporters a patched build based on 13.2 with aaa3aed reverted, to test if the error goes away, and an unpatched build of 13.2 to test if the error still showed up (just to make sure that I did not mess up my build process). Extra thanks to Fujii Masao for pointing out what looked like the culprit commit, and to all the reporters for taking the time to test what I have sent them. Reported-by: Andrus, Guy Burgess, Yaroslav Pashinsky, Thomas Trenz Reviewed-by: Tom Lane, Andres Freund Discussion: https://postgr.es/m/3861ff1e-0923-7838-e826-094cc9bef737@hot.ee Discussion: https://postgr.es/m/16874-c3eecd319e36a2bf@postgresql.org Discussion: https://postgr.es/m/095ccf8d-7f58-d928-427c-b17ace23cae6@burgess.co.nz Discussion: https://postgr.es/m/16927-67c570d968c99567%40postgresql.org Discussion: https://postgr.es/m/YFBcRbnBiPdGZvfW@paquier.xyz Backpatch-through: 13
*	Fix timeline assignment in checkpoints with 2PC transactions	Michael Paquier	2021-03-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Any transactions found as still prepared by a checkpoint have their state data read from the WAL records generated by PREPARE TRANSACTION before being moved into their new location within pg_twophase/. While reading such records, the WAL reader uses the callback read_local_xlog_page() to read a page, that is shared across various parts of the system. This callback, since 1148e22a, has introduced an update of ThisTimeLineID when reading a record while in recovery, which is potentially helpful in the context of cascading WAL senders. This update of ThisTimeLineID interacts badly with the checkpointer if a promotion happens while some 2PC data is read from its record, as, by changing ThisTimeLineID, any follow-up WAL records would be written to an timeline older than the promoted one. This results in consistency issues. For instance, a subsequent server restart would cause a failure in finding a valid checkpoint record, resulting in a PANIC, for instance. This commit changes the code reading the 2PC data to reset the timeline once the 2PC record has been read, to prevent messing up with the static state of the checkpointer. It would be tempting to do the same thing directly in read_local_xlog_page(). However, based on the discussion that has led to 1148e22a, users may rely on the updates of ThisTimeLineID when a WAL record page is read in recovery, so changing this callback could break some cases that are working currently. A TAP test reproducing the issue is added, relying on a PITR to precisely trigger a promotion with a prepared transaction still tracked. Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao and myself. Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com Backpatch-through: 10
*	Fix assorted silliness in ATExecSetCompression().	Tom Lane	2021-03-21
\| \| \| \| \| \| \| \| \|	It's not okay to scribble directly on a syscache entry. Nor to continue accessing said entry after releasing it. Also get rid of not-used local variables. Per valgrind testing.
*	Recycle nbtree pages deleted during same VACUUM.	Peter Geoghegan	2021-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Maintain a simple array of metadata about pages that were deleted during nbtree VACUUM's current btvacuumscan() call. Use this metadata at the end of btvacuumscan() to attempt to place newly deleted pages in the FSM without further delay. It might not yet be safe to place any of the pages in the FSM by then (they may not be deemed recyclable), but we have little to lose and plenty to gain by trying. In practice there is a very good chance that this will work out when vacuuming larger indexes, where scanning the index naturally takes quite a while. This commit doesn't change the page recycling invariants; it merely improves the efficiency of page recycling within the confines of the existing design. Recycle safety is a part of nbtree's implementation of what Lanin & Shasha call "the drain technique". The design happens to use transaction IDs (they're stored in deleted pages), but that in itself doesn't align the cutoff for recycle safety to any of the XID-based cutoffs used by VACUUM (e.g., OldestXmin). All that matters is whether or not _other_ backends might be able to observe various inconsistencies in the tree structure (that they cannot just detect and recover from by moving right). Recycle safety is purely a question of maintaining the consistency (or the apparent consistency) of a physical data structure. Note that running a simple serial test case involving a large range DELETE followed by a VACUUM VERBOSE will probably show that any newly deleted nbtree pages are not yet reusable/recyclable. This is expected in the absence of even one concurrent XID assignment. It is an old implementation restriction. In practice it's unlikely to be the thing that makes recycling remain unsafe, at least with larger indexes, where recycling newly deleted pages during the same VACUUM actually matters. An important high-level goal of this commit (as well as related recent commits e5d8a999 and 9f3665fb) is to make expensive deferred cleanup operations in index AMs rare in general. If index vacuuming frequently depends on the next VACUUM operation finishing off work that the current operation started, then the general behavior of index vacuuming is hard to predict. This is relevant to ongoing work that adds a vacuumlazy.c mechanism to skip index vacuuming in certain cases. Anything that makes the real world behavior of index vacuuming simpler and more linear will also make top-down modeling in vacuumlazy.c more robust. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com
*	Suppress various new compiler warnings.	Tom Lane	2021-03-21
\| \| \| \| \| \| \| \|	Compilers that don't understand that elog(ERROR) doesn't return issued warnings here. In the cases in libpq_pipeline.c, we were not exactly helping things by failing to mark pg_fatal() as noreturn. Per buildfarm.
*	Move lwlock-release probe back where it belongs	Peter Eisentraut	2021-03-21
\| \| \| \| \| \| \| \| \| \|	The documentation specifically states that lwlock-release fires before any released waiters have been awakened. It worked that way until ab5194e6f617a9a9e7aadb3dd1cee948a42d0755, where is seems to have been misplaced accidentally. Move it back where it belongs. Author: Craig Ringer <craig.ringer@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/CAGRY4nwxKUS_RvXFW-ugrZBYxPFFM5kjwKT5O+0+Stuga5b4+Q@mail.gmail.com
*	Use valid compression method in brin_form_tuple	Tomas Vondra	2021-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When compressing the BRIN summary, we can't simply use the compression method from the indexed attribute. The summary may use a different data type, e.g. fixed-length attribute may have varlena summary, leading to compression failures. For the built-in BRIN opclasses this happens to work, because the summary uses the same data type as the attribute. When the data types match, we can inherit use the compression method specified for the attribute (it's copied into the index descriptor). Otherwise we don't have much choice and have to use the default one. Author: Tomas Vondra Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/e0367f27-392c-321a-7411-a58e1a7e4817%40enterprisedb.com
*	Fix memory leak when rejecting bogus DH parameters.	Tom Lane	2021-03-20
\| \| \| \| \| \| \| \| \| \| \| \| \|	While back-patching e0e569e1d, I noted that there were some other places where we ought to be applying DH_free(); namely, where we load some DH parameters from a file and then reject them as not being sufficiently secure. While it seems really unlikely that anybody would hit these code paths in production, let alone do so repeatedly, let's fix it for consistency. Back-patch to v10 where this code was introduced. Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org
*	Avoid leaking memory in RestoreGUCState(), and improve comments.	Tom Lane	2021-03-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RestoreGUCState applied InitializeOneGUCOption to already-live GUC entries, causing any malloc'd subsidiary data to be forgotten. We do want the effect of resetting the GUC to its compiled-in default, and InitializeOneGUCOption seems like the best way to do that, so add code to free any existing subsidiary data beforehand. The interaction between can_skip_gucvar, SerializeGUCState, and RestoreGUCState is way more subtle than their opaque comments would suggest to an unwary reader. Rewrite and enlarge the comments to try to make it clearer what's happening. Remove a long-obsolete assertion in read_nondefault_variables: the behavior of set_config_option hasn't depended on IsInitProcessingMode since f5d9698a8 installed a better way of controlling it. Although this is fixing a clear memory leak, the leak is quite unlikely to involve any large amount of data, and it can only happen once in the lifetime of a worker process. So it seems unnecessary to take any risk of back-patching. Discussion: https://postgr.es/m/4105247.1616174862@sss.pgh.pa.us
*	Provide recovery_init_sync_method=syncfs.	Thomas Munro	2021-03-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit 2ce439f3 we have opened every file in the data directory and called fsync() at the start of crash recovery. This can be very slow if there are many files, leading to field complaints of systems taking minutes or even hours to begin crash recovery. Provide an alternative method, for Linux only, where we call syncfs() on every possibly different filesystem under the data directory. This is equivalent, but avoids faulting in potentially many inodes from potentially slow storage. The new mode comes with some caveats, described in the documentation, so the default value for the new setting is "fsync", preserving the older behavior. Reported-by: Michael Brown <michael.brown@discourse.org> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Paul Guo <guopa@vmware.com> Reviewed-by: Bruce Momjian <bruce@momjian.us> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: David Steele <david@pgmasters.net> Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com
*	Use lfirst_int in cmp_list_len_contents_asc	Tomas Vondra	2021-03-20
\| \| \| \| \| \| \| \| \| \| \| \| \|	The function added in be45be9c33 is comparing integer lists (IntList) by length and contents, but there were two bugs. Firstly, it used intVal() to extract the value, but that's for Value nodes, not for extracting int values from IntList. Secondly, it called it directly on the ListCell, without doing lfirst(). So just do lfirst_int() instead. Interestingly enough, this did not cause any crashes on the buildfarm, but valgrind rightfully complained about it. Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
*	Fix use-after-ReleaseSysCache problem in ATExecAlterColumnType.	Robert Haas	2021-03-19
\| \| \| \| \| \|	Introduced by commit bbe0a81db69bd10bd166907c3701492a29aca294. Per buildfarm member prion.
*	Allow configurable LZ4 TOAST compression.	Robert Haas	2021-03-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is now a per-column COMPRESSION option which can be set to pglz (the default, and the only option in up until now) or lz4. Or, if you like, you can set the new default_toast_compression GUC to lz4, and then that will be the default for new table columns for which no value is specified. We don't have lz4 support in the PostgreSQL code, so to use lz4 compression, PostgreSQL must be built --with-lz4. In general, TOAST compression means compression of individual column values, not the whole tuple, and those values can either be compressed inline within the tuple or compressed and then stored externally in the TOAST table, so those properties also apply to this feature. Prior to this commit, a TOAST pointer has two unused bits as part of the va_extsize field, and a compessed datum has two unused bits as part of the va_rawsize field. These bits are unused because the length of a varlena is limited to 1GB; we now use them to indicate the compression type that was used. This means we only have bit space for 2 more built-in compresison types, but we could work around that problem, if necessary, by introducing a new vartag_external value for any further types we end up wanting to add. Hopefully, it won't be too important to offer a wide selection of algorithms here, since each one we add not only takes more coding but also adds a build dependency for every packager. Nevertheless, it seems worth doing at least this much, because LZ4 gets better compression than PGLZ with less CPU usage. It's possible for LZ4-compressed datums to leak into composite type values stored on disk, just as it is for PGLZ. It's also possible for LZ4-compressed attributes to be copied into a different table via SQL commands such as CREATE TABLE AS or INSERT .. SELECT. It would be expensive to force such values to be decompressed, so PostgreSQL has never done so. For the same reasons, we also don't force recompression of already-compressed values even if the target table prefers a different compression method than was used for the source data. These architectural decisions are perhaps arguable but revisiting them is well beyond the scope of what seemed possible to do as part of this project. However, it's relatively cheap to recompress as part of VACUUM FULL or CLUSTER, so this commit adjusts those commands to do so, if the configured compression method of the table happens not to match what was used for some column value stored therein. Dilip Kumar. The original patches on which this work was based were written by Ildus Kurbangaliev, and those were patches were based on even earlier work by Nikita Glukhov, but the design has since changed very substantially, since allow a potentially large number of compression methods that could be added and dropped on a running system proved too problematic given some of the architectural issues mentioned above; the choice of which specific compression method to add first is now different; and a lot of the code has been heavily refactored. More recently, Justin Przyby helped quite a bit with testing and reviewing and this version also includes some code contributions from him. Other design input and review from Tomas Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander Korotkov, and me. Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
*	Fix comments in postmaster.c.	Fujii Masao	2021-03-19
\| \| \| \| \| \| \| \| \| \| \| \|	Commit 86c23a6eb2 changed the option to specify that postgres will stop all other server processes by sending the signal SIGSTOP, from -s to -T. But previously there were comments incorrectly explaining that SIGSTOP behavior is set by -s option. This commit fixes them. Author: Kyotaro Horiguchi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20210316.165141.1400441966284654043.horikyota.ntt@gmail.com
*	Don't leak malloc'd error string in libpqrcv_check_conninfo().	Tom Lane	2021-03-18
\| \| \| \| \| \| \| \| \| \| \| \|	We leaked the error report from PQconninfoParse, when there was one. It seems unlikely that real usage patterns would repeat the failure often enough to create serious bloat, but let's back-patch anyway to keep the code similar in all branches. Found via valgrind testing. Back-patch to v10 where this code was added. Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
*	Don't leak malloc'd strings when a GUC setting is rejected.	Tom Lane	2021-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because guc.c prefers to keep all its string values in malloc'd not palloc'd storage, it has to be more careful than usual to avoid leaks. Error exits out of string GUC hook checks failed to clear the proposed value string, and error exits out of ProcessGUCArray() failed to clear the malloc'd results of ParseLongOption(). Found via valgrind testing. This problem is ancient, so back-patch to all supported branches. Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
*	Don't leak compiled regex(es) when an ispell cache entry is dropped.	Tom Lane	2021-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The text search cache mechanisms assume that we can clean up an invalidated dictionary cache entry simply by resetting the associated long-lived memory context. However, that does not work for ispell affixes that make use of regular expressions, because the regex library deals in plain old malloc. Hence, we leaked compiled regex(es) any time we dropped such a cache entry. That could quickly add up, since even a fairly trivial regex can use up tens of kB, and a large one can eat megabytes. Add a memory context callback to ensure that a regex gets freed when its owning cache entry is cleared. Found via valgrind testing. This problem is ancient, so back-patch to all supported branches. Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
*	Don't run RelationInitTableAccessMethod in a long-lived context.	Tom Lane	2021-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some code paths in this function perform syscache lookups, which can lead to table accesses and possibly leakage of cruft into the caller's context. If said context is CacheMemoryContext, we eventually will have visible bloat. But fixing this is no harder than moving one memory context switch step. (The other callers don't have a problem.) Andres Freund and I independently found this via valgrind testing. Back-patch to v12 where this code was added. Discussion: https://postgr.es/m/20210317023101.anvejcfotwka6gaa@alap3.anarazel.de Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
*	Don't leak rd_statlist when a relcache entry is dropped.	Tom Lane	2021-03-18
\| \| \| \| \| \| \| \| \| \| \|	Although these lists are usually NIL, and even when not empty are unlikely to be large, constant relcache update traffic could eventually result in visible bloat of CacheMemoryContext. Found via valgrind testing. Back-patch to v10 where this field was added. Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
*	Fix misuse of foreach_delete_current().	Tom Lane	2021-03-18
\| \| \| \| \| \| \| \| \|	Our coding convention requires this macro's result to be assigned back to the original List variable. In this usage, since the List could not become empty, there was no actual bug --- but some compilers warned about it. Oversight in be45be9c3. Discussion: https://postgr.es/m/35077b31-2d62-1e31-0e2e-ddb52d590b73@enterprisedb.com
*	Implement GROUP BY DISTINCT	Tomas Vondra	2021-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With grouping sets, it's possible that some of the grouping sets are duplicate. This is especially common with CUBE and ROLLUP clauses. For example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to GROUP BY GROUPING SETS ( (a, b, c), (a, b, c), (a, b, c), (a, b), (a, b), (a, b), (a), (a), (a), (c, a), (c, a), (c, a), (c), (b, c), (b), () ) Some of the grouping sets are calculated multiple times, which is mostly unnecessary. This commit implements a new GROUP BY DISTINCT feature, as defined in the SQL standard, which eliminates the duplicate sets. Author: Vik Fearing Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
*	Remove temporary files after backend crash	Tomas Vondra	2021-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After a crash of a backend using temporary files, the files used to be left behind, on the basis that it might be useful for debugging. But we don't have any reports of anyone actually doing that, and it means the disk usage may grow over time due to repeated backend failures (possibly even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing it required either manual cleanup (deleting files, which is error-prone) or restart of the instance (i.e. service disruption). This implements automatic cleanup of temporary files, controled by a new GUC remove_temp_files_after_crash. By default the files are removed, but it can be disabled to restore the old behavior if needed. Author: Euler Taveira Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com
*	Fix function name in error hint	Magnus Hagander	2021-03-18
\| \| \| \| \| \| \| \| \| \| \|	pg_read_file() is the function that's in core, pg_file_read() is in adminpack. But when using pg_file_read() in adminpack it calls the C level function pg_read_file() in core, which probably threw the original author off. But the error hint should be about the SQL function. Reported-By: Sergei Kornilov Backpatch-through: 11 Discussion: https://postgr.es/m/373021616060475@mail.yandex.ru
*	Add a new GUC and a reloption to enable inserts in parallel-mode.	Amit Kapila	2021-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 05c8482f7f added the implementation of parallel SELECT for "INSERT INTO ... SELECT ..." which may incur non-negligible overhead in the additional parallel-safety checks that it performs, even when, in the end, those checks determine that parallelism can't be used. This is normally only ever a problem in the case of when the target table has a large number of partitions. A new GUC option "enable_parallel_insert" is added, to allow insert in parallel-mode. The default is on. In addition to the GUC option, the user may want a mechanism to allow inserts in parallel-mode with finer granularity at table level. The new table option "parallel_insert_enabled" allows this. The default is true. Author: "Hou, Zhijie" Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
*	Fix memory lifetime issues of replication slot stats.	Andres Freund	2021-03-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When accessing replication slot stats, introduced in 98681675002d, pgstat_read_statsfiles() reads the data into newly allocated memory. Unfortunately the current memory context at that point is the callers, leading to leaks and use-after-free dangers. The fix is trivial, explicitly use pgStatLocalContext. There's some potential for further improvements, but that's outside of the scope of this bugfix. No backpatch necessary, feature is only in HEAD. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i@alap3.anarazel.de
*	Code review for server's handling of "tablespace map" files.	Tom Lane	2021-03-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While looking at Robert Foggia's report, I noticed a passel of other issues in the same area: * The scheme for backslash-quoting newlines in pathnames is just wrong; it will misbehave if the last ordinary character in a pathname is a backslash. I'm not sure why we're bothering to allow newlines in tablespace paths, but if we're going to do it we should do it without introducing other problems. Hence, backslashes themselves have to be backslashed too. * The author hadn't read the sscanf man page very carefully, because this code would drop any leading whitespace from the path. (I doubt that a tablespace path with leading whitespace could happen in practice; but if we're bothering to allow newlines in the path, it sure seems like leading whitespace is little less implausible.) Using sscanf for the task of finding the first space is overkill anyway. * While I'm not 100% sure what the rationale for escaping both \r and \n is, if the idea is to allow Windows newlines in the file then this code failed, because it'd throw an error if it saw \r followed by \n. * There's no cross-check for an incomplete final line in the map file, which would be a likely apparent symptom of the improper-escaping bug. On the generation end, aside from the escaping issue we have: * If needtblspcmapfile is true then do_pg_start_backup will pass back escaped strings in tablespaceinfo->path values, which no caller wants or is prepared to deal with. I'm not sure if there's a live bug from that, but it looks like there might be (given the dubious assumption that anyone actually has newlines in their tablespace paths). * It's not being very paranoid about the possibility of random stuff in the pg_tblspc directory. IMO we should ignore anything without an OID-like name. The escaping rule change doesn't seem back-patchable: it'll require doubling of backslashes in the tablespace_map file, which is basically a basebackup format change. The odds of that causing trouble are considerably more than the odds of the existing bug causing trouble. The rest of this seems somewhat unlikely to cause problems too, so no back-patch.
*	Prevent buffer overrun in read_tablespace_map().	Tom Lane	2021-03-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Robert Foggia of Trustwave reported that read_tablespace_map() fails to prevent an overrun of its on-stack input buffer. Since the tablespace map file is presumed trustworthy, this does not seem like an interesting security vulnerability, but still we should fix it just in the name of robustness. While here, document that pg_basebackup's --tablespace-mapping option doesn't work with tar-format output, because it doesn't. To make it work, we'd have to modify the tablespace_map file within the tarball sent by the server, which might be possible but I'm not volunteering. (Less-painful solutions would require changing the basebackup protocol so that the source server could adjust the map. That's not very appetizing either.)
*	Revert "Fix race in Parallel Hash Join batch cleanup."	Thomas Munro	2021-03-18
\| \| \| \| \| \| \|	This reverts commit 378802e3713c6c0fce31d2390c134cd5d7c30157. This reverts commit 3b8981b6e1a2aea0f18384c803e21e9391de669a. Discussion: https://postgr.es/m/CA%2BhUKGJmcqAE3MZeDCLLXa62cWM0AJbKmp2JrJYaJ86bz36LFA%40mail.gmail.com
*	Fix comment in indexing.c	Michael Paquier	2021-03-17
\| \| \| \| \| \| \| \| \|	578b229, that removed support for WITH OIDS, has changed CatalogTupleInsert() to not return an Oid, but one comment was still mentioning that. Author: Vik Fearing Discussion: https://postgr.es/m/fef01975-ed10-3601-7b9e-80ecef72d00b@postgresfriends.org
*	Small error message improvement	Peter Eisentraut	2021-03-17
\|
*	Update the names of Parallel Hash Join phases.	Thomas Munro	2021-03-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 3048898e dropped -ING from some wait event names that correspond to barrier phases. Update the phases' names to match. While we're here making cosmetic changes, also rename "DONE" to "FREE". That pairs better with "ALLOCATE", and describes the activity that actually happens in that phase (as we do for the other phases) rather than describing a state. The distinction is clearer after bugfix commit 3b8981b6 split the phase into two. As for the growth barriers, rename their "ALLOCATE" phase to "REALLOCATE", which is probably a better description of what happens then. Also improve the comments about the phases a bit. Discussion: https://postgr.es/m/CA%2BhUKG%2BMDpwF2Eo2LAvzd%3DpOh81wUTsrwU1uAwR-v6OGBB6%2B7g%40mail.gmail.com
*	Fix race in Parallel Hash Join batch cleanup.	Thomas Munro	2021-03-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With very unlucky timing and parallel_leader_participation off, PHJ could attempt to access per-batch state just as it was being freed. There was code intended to prevent that by checking for a cleared pointer, but it was buggy. Fix, by introducing an extra barrier phase. The new phase PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to find a batch to help with, and PHJ_BUILD_DONE means that it is too late. The last to detach will free the array of per-batch state as before, but now it will also atomically advance the phase at the same time, so that late attachers can avoid the hazard, without the data race. This mirrors the way per-batch hash tables are freed (see phases PHJ_BATCH_PROBING and PHJ_BATCH_DONE). Revealed by a one-off build farm failure, where BarrierAttach() failed a sanity check assertion, because the memory had been clobbered by dsa_free(). Back-patch to 11, where the code arrived. Reported-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz
*	Fix race condition in drop subscription's handling of tablesync slots.	Amit Kapila	2021-03-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit ce0fdbfe97 made tablesync slots permanent and allow Drop Subscription to drop such slots. However, it is possible that before tablesync worker could get the acknowledgment of slot creation, drop subscription stops it and that can lead to a dangling slot on the publisher. Prevent cancel/die interrupts while creating a slot in the tablesync worker. Reported-by: Thomas Munro as per buildfarm Author: Amit Kapila Reviewed-by: Vignesh C, Takamichi Osumi Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com
*	Enable parallelism in REFRESH MATERIALIZED VIEW.	Thomas Munro	2021-03-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pass CURSOR_OPT_PARALLEL_OK to pg_plan_query() so that parallel plans are considered when running the underlying SELECT query. This wasn't done in commit e9baa5e9, which did this for CREATE MATERIALIZED VIEW, because it wasn't yet known to be safe. Since REFRESH always inserts into a freshly created table before later merging or swapping the data into place with separate operations, we can enable such plans here too. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Hou, Zhijie <houzj.fnst@cn.fujitsu.com> Reviewed-by: Luc Vlaming <luc@swarm64.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CALj2ACXg-4hNKJC6nFnepRHYT4t5jJVstYvri%2BtKQHy7ydcr8A%40mail.gmail.com
*	Fix comment about promising tuples.	Peter Geoghegan	2021-03-16
\| \| \| \|	Oversight in commit d168b666823, which added bottom-up index deletion.
*	Avoid corner-case memory leak in SSL parameter processing.	Tom Lane	2021-03-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After reading the root cert list from the ssl_ca_file, immediately install it as client CA list of the new SSL context. That gives the SSL context ownership of the list, so that SSL_CTX_free will free it. This avoids a permanent memory leak if we fail further down in be_tls_init(), which could happen if bogus CRL data is offered. The leak could only amount to something if the CRL parameters get broken after server start (else we'd just quit) and then the server is SIGHUP'd many times without fixing the CRL data. That's rather unlikely perhaps, but it seems worth fixing, if only because the code is clearer this way. While we're here, add some comments about the memory management aspects of this logic. Noted by Jelte Fennema and independently by Andres Freund. Back-patch to v10; before commit de41869b6 it doesn't matter, since we'd not re-execute this code during SIGHUP. Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org