postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Remove unused wait events.	Amit Kapila	2021-10-21
\| \| \| \| \| \| \| \| \|	Commit 464824323e introduced the wait events which were neither used by that commit nor by follow-up commits for that work. Author: Masahiro Ikeda Backpatch-through: 14, where it was introduced Discussion: https://postgr.es/m/ff077840-3ab2-04dd-bbe4-4f5dfd2ad481@oss.nttdata.com
*	Fix corruption of pg_shdepend when copying deps from template database	Michael Paquier	2021-10-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using for a new database a template database with shared dependencies that need to be copied over was causing a corruption of pg_shdepend because of an off-by-one computation error of the index number used for the values inserted with a slot. Issue introduced by e3931d0. Monitoring the rest of the code, there are no similar mistakes. Reported-by: Sven Klemm Author: Aleksander Alekseev Reviewed-by: Daniel Gustafsson, Michael Paquier Discussion: https://postgr.es/m/CAJ7c6TP0AowkUgNL6zcAK-s5HYsVHVBRWfu69FRubPpfwZGM9A@mail.gmail.com Backpatch-through: 14
*	Ensure correct lock level is used in ALTER ... RENAME	Alvaro Herrera	2021-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 1b5d797cd4f7 intended to relax the lock level used to rename indexes, but inadvertently allowed any relation to be renamed with a lowered lock level, as long as the command is spelled ALTER INDEX. That's undesirable for other relation types, so retry the operation with the higher lock if the relation turns out not to be an index. After this fix, ALTER INDEX <sometable> RENAME will require access exclusive lock, which it didn't before. Author: Nathan Bossart <bossartn@amazon.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Onder Kalaci <onderk@microsoft.com> Discussion: https://postgr.es/m/PH0PR21MB1328189E2821CDEC646F8178D8AE9@PH0PR21MB1328.namprd21.prod.outlook.com
*	Fix assignment to array of domain over composite.	Tom Lane	2021-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An update such as "UPDATE ... SET fld[n].subfld = whatever" failed if the array elements were domains rather than plain composites. That's because isAssignmentIndirectionExpr() failed to cope with the CoerceToDomain node that would appear in the expression tree in this case. The result would typically be a crash, and even if we accidentally didn't crash, we'd not correctly preserve other fields of the same array element. Per report from Onder Kalaci. Back-patch to v11 where arrays of domains came in. Discussion: https://postgr.es/m/PH0PR21MB132823A46AA36F0685B7A29AD8BD9@PH0PR21MB1328.namprd21.prod.outlook.com
*	Remove bogus assertion in transformExpressionList().	Tom Lane	2021-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I think when I added this assertion (in commit 8f889b108), I was only thinking of the use of transformExpressionList at top level of INSERT and VALUES. But it's also called by transformRowExpr(), which can certainly occur in an UPDATE targetlist, so it's inappropriate to suppose that p_multiassign_exprs must be empty. Besides, since the input is not expected to contain ResTargets, there's no reason it should contain MultiAssignRefs either. Hence this code need not be concerned about the state of p_multiassign_exprs, and we should just drop the assertion. Per bug #17236 from ocean_li_996. It's been wrong for years, so back-patch to all supported branches. Discussion: https://postgr.es/m/17236-3210de9bcba1d7ca@postgresql.org
*	Block ALTER INDEX/TABLE index_name ALTER COLUMN colname SET (options)	Michael Paquier	2021-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The grammar of this command run on indexes with column names has always been authorized by the parser, and it has never been documented. Since 911e702, it is possible to define opclass parameters as of CREATE INDEX, which actually broke the old case of ALTER INDEX/TABLE where relation-level parameters n_distinct and n_distinct_inherited could be defined for an index (see 76a47c0 and its thread where this point has been touched, still remained unused). Attempting to do that in v13~ would cause the index to become unusable, as there is a new dedicated code path to load opclass parameters instead of the relation-level ones previously available. Note that it is possible to fix things with a manual catalog update to bring the relation back online. This commit disables this command for now as the use of column names for indexes does not make sense anyway, particularly when it comes to index expressions where names are automatically computed. One way to properly support this case properly in the future would be to use column numbers when it comes to indexes, in the same way as ALTER INDEX .. ALTER COLUMN .. SET STATISTICS. Partitioned indexes were already blocked, but not indexes. Some tests are added for both cases. There was some code in ANALYZE to enforce n_distinct to be used for an index expression if the parameter was defined, but just remove it for now until/if there is support for this (note that index-level parameters never had support in pg_dump either, previously), so this was just dead code. Reported-by: Matthijs van der Vleuten Author: Nathan Bossart, Michael Paquier Reviewed-by: Vik Fearing, Dilip Kumar Discussion: https://postgr.es/m/17220-15d684c6c2171a83@postgresql.org Backpatch-through: 13
*	Invalidate partitions of table being attached/detached	Alvaro Herrera	2021-10-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Failing to do that, any direct inserts/updates of those partitions would fail to enforce the correct constraint, that is, one that considers the new partition constraint of their parent table. Backpatch to 10. Reported by: Hou Zhijie <houzj.fnst@fujitsu.com> Author: Amit Langote <amitlangote09@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com> Discussion: https://postgr.es/m/OS3PR01MB5718DA1C4609A25186D1FBF194089%40OS3PR01MB5718.jpnprd01.prod.outlook.com
*	Fix parallel sort, broken by the balanced merge patch.	Heikki Linnakangas	2021-10-18
\| \| \| \| \| \| \| \| \| \| \|	The code for initializing the tapes on each merge iteration was skipped in a parallel worker. I put the !WORKER(state) check in wrong place while rebasing the patch. That caused failures in the index build in 'multiple-row-versions' isolation test, in multiple buildfarm members. On my laptop it was easier to reproduce by building an index on a larger table, so that you got a parallel sort more reliably.
*	Fix duplicate typedef LogicalTape.	Heikki Linnakangas	2021-10-18
\| \| \| \|	To make buildfarm member locust happy.
*	Fix format modifier used in elog.	Heikki Linnakangas	2021-10-18
\| \| \| \| \| \| \| \|	The previous commit 65014000b3 changed the variable passed to elog from an int64 to a size_t variable, but neglected to change the modifier in the format string accordingly. Per failure on buildfarm member lapwing.
*	Replace polyphase merge algorithm with a simple balanced k-way merge.	Heikki Linnakangas	2021-10-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The advantage of polyphase merge is that it can reuse the input tapes as output tapes efficiently, but that is irrelevant on modern hardware, when we can easily emulate any number of tape drives. The number of input tapes we can/should use during merging is limited by work_mem, but output tapes that we are not currently writing to only cost a little bit of memory, so there is no need to skimp on them. This makes sorts that need multiple merge passes faster. Discussion: https://www.postgresql.org/message-id/420a0ec7-602c-d406-1e75-1ef7ddc58d83%40iki.fi Reviewed-by: Peter Geoghegan, Zhihong Yu, John Naylor
*	Refactor LogicalTapeSet/LogicalTape interface.	Heikki Linnakangas	2021-10-18
\| \| \| \| \| \| \| \| \| \| \| \|	All the tape functions, like LogicalTapeRead and LogicalTapeWrite, now take a LogicalTape as argument, instead of LogicalTapeSet+tape number. You can create any number of LogicalTapes in a single LogicalTapeSet, and you don't need to decide the number upfront, when you create the tape set. This makes the tape management in hash agg spilling in nodeAgg.c simpler. Discussion: https://www.postgresql.org/message-id/420a0ec7-602c-d406-1e75-1ef7ddc58d83%40iki.fi Reviewed-by: Peter Geoghegan, Zhihong Yu, John Naylor
*	Reset properly snapshot export state during transaction abort	Michael Paquier	2021-10-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During a replication slot creation, an ERROR generated in the same transaction as the one creating a to-be-exported snapshot would have left the backend in an inconsistent state, as the associated static export snapshot state was not being reset on transaction abort, but only on the follow-up command received by the WAL sender that created this snapshot on replication slot creation. This would trigger inconsistency failures if this session tried to export again a snapshot, like during the creation of a replication slot. Note that a snapshot export cannot happen in a transaction block, so there is no need to worry resetting this state for subtransaction aborts. Also, this inconsistent state would very unlikely show up to users. For example, one case where this could happen is an out-of-memory error when building the initial snapshot to-be-exported. Dilip found this problem while poking at a different patch, that caused an error in this code path for reasons unrelated to HEAD. Author: Dilip Kumar Reviewed-by: Michael Paquier, Zhihong Yu Discussion: https://postgr.es/m/CAFiTN-s0zA1Kj0ozGHwkYkHwa5U0zUE94RSc_g81WrpcETB5=w@mail.gmail.com Backpatch-through: 9.6
*	Remove obsolete nbtree deduplication comments.	Peter Geoghegan	2021-10-15
\| \| \| \|	Follow up to commit 2903f140.
*	shm_mq: Update mq_bytes_written less often.	Robert Haas	2021-10-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do not update shm_mq's mq_bytes_written until we have written an amount of data greater than 1/4th of the ring size, unless the caller of shm_mq_send(v) requests a flush at the end of the message. This reduces the number of calls to SetLatch(), and also the number of CPU cache misses, considerably, and thus makes shm_mq significantly faster. Dilip Kumar, reviewed by Zhihong Yu and Tomas Vondra. Some minor cosmetic changes by me. Discussion: http://postgr.es/m/CAFiTN-tVXqn_OG7tHNeSkBbN+iiCZTiQ83uakax43y1sQb2OBA@mail.gmail.com
*	Check criticalSharedRelcachesBuilt in GetSharedSecurityLabel().	Jeff Davis	2021-10-14
\| \| \| \| \| \| \| \| \| \|	An extension may want to call GetSecurityLabel() on a shared object before the shared relcaches are fully initialized. For instance, a ClientAuthentication_hook might want to retrieve the security label on a role. Discussion: https://postgr.es/m/ecb7af0b26e3be1d96d291c8453a86f1f82d9061.camel@j-davis.com Backpatch-through: 9.6
*	Fix planner error with pulling up subquery expressions into function RTEs.	Tom Lane	2021-10-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a function-in-FROM laterally references the output of some sub-SELECT earlier in the FROM clause, and we are able to flatten that sub-SELECT into the outer query, the expression(s) copied into the function RTE missed being processed by eval_const_expressions. This'd lead to trouble and probable crashes at execution if such expressions contained named-argument function call syntax or functions with defaulted arguments. The bug is masked if the query contains any explicit JOIN syntax, which may help explain why we'd not noticed. Per bug #17227 from Bernd Dorn. This is an oversight in commit 7266d0997, so back-patch to v13 where that came in. Discussion: https://postgr.es/m/17227-5a28ed1512189fa4@postgresql.org
*	Postpone some end-of-recovery operations related to allowing WAL.	Robert Haas	2021-10-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CreateOverwriteContrecordRecord(), UpdateFullPageWrites(), PerformRecoveryXLogAction(), and CleanupAfterArchiveRecovery() are moved somewhat later in StartupXLOG(). This is preparatory work for a future patch that wants to allow recovery to end at one time and only later start to allow WAL writes. To do that, it's necessary to separate code that has to do with allowing WAL writes from other things that need to happen simply because recovery is ending, such as initializing shared memory data structures that depend on information that might not be accurate before redo is complete. This commit does not achieve that goal, but it is a step in that direction. For example, there are a few different bits of code that write things into WAL once we have finished recovery, and with this change, those bits of code are closer to each other than previously, with fewer unrelated bits of code interspersed. Robert Haas and Amul Sul Discussion: http://postgr.es/m/CAAJ_b97abMuq=470Wahun=aS1PHTSbStHtrjjPaD-C0YQ1AqVw@mail.gmail.com
*	Refactor some end-of-recovery code out of StartupXLOG().	Robert Haas	2021-10-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Create a new function PerformRecoveryXLogAction() and move the code which either writes an end-of-recovery record or requests a checkpoint there. Also create a new function CleanupAfterArchiveRecovery() to perform a few tasks that we want to do after we've actually exited archive recovery but before we start accepting new WAL writes. More refactoring of this file is planned, but this commit is just straightforward code movement to make StartupXLOG() a little bit shorter and a little bit easier to understand. Robert Haas and Amul Sul Discussion: http://postgr.es/m/CAAJ_b97abMuq=470Wahun=aS1PHTSbStHtrjjPaD-C0YQ1AqVw@mail.gmail.com
*	Fix use-after-free with multirange types in CREATE TYPE	Michael Paquier	2021-10-13
\| \| \| \| \| \| \| \| \| \| \| \|	The code was freeing the name of the multirange type function stored in the parse tree but it should not do that. Event triggers could for example look at such a corrupted parsed tree with a ddl_command_end event. Author: Alex Kozhemyakin, Sergey Shinderuk Reviewed-by: Peter Eisentraut, Michael Paquier Discussion: https://postgr.es/m/d5042d46-b9cd-6efb-219a-71ed0cf45bc8@postgrespro.ru Backpatch-through: 14
*	Refactor basebackup.c's _tarWriteDir() function.	Robert Haas	2021-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sometimes, we replace a symbolic link that we find in the data directory with an actual directory within the tarfile that we create. _tarWriteDir was responsible both for making this substitution and also for writing the tar header for the resulting directory into the tar file. Make it do only the first of those things, and rename to convert_link_to_directory. Substantially larger refactoring of this source file is planned, but this little bit seemed to make sense to commit independently. Discussion: http://postgr.es/m/CA+Tgmobz6tuv5tr-WxURe5JA1vVcGz85k4kkvoWxcyHvDpEqFA@mail.gmail.com
*	Make autovacuum launcher more responsive to pg_log_backend_memory_contexts().	Fujii Masao	2021-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously when pg_log_backend_memory_contexts() sent the request to the autovacuum launcher, it could take more than several seconds to log its memory contexts. Because the function (HandleAutoVacLauncherInterrupts) to process any new interrupts that autovacuum launcher received didn't handle the request for logging of memory contexts. This commit changes the function so that it handles the request, to make autovacuum launcher more responsitve to pg_log_backend_memory_contexts(). Back-patch to v14 where pg_log_backend_memory_contexts() was added. Author: Koyu Tanigawa Reviewed-by: Bharath Rupireddy, Atsushi Torikoshi Discussion: https://postgr.es/m/0aae3e074face409b35153451be5cc11@oss.nttdata.com
*	Fix EXPLAIN of SEARCH BREADTH FIRST queries some more.	Tom Lane	2021-10-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 3f50b8263 had an oversight: formerly, to deparse expressions attached to a plan node, it was only necessary to update the deparse_namespace ancestors list alongside calling set_deparse_plan. Now it's necessary to update the ancestors list first, because set_deparse_plan consults it, and one call site got that wrong. This error was masked in most cases because explain.c uses just one List object for the ancestors list, updating it in-place as the plan is scanned, so that we accidentally had the right List assigned to dpns->ancestors before it was needed. It would fail only if a WorkTableScan node were the first one that we tried to deparse a subexpression of. Per report from Markus Winand. Like the previous patch, back-patch to v14. Discussion: https://postgr.es/m/648B0505-AA57-42C2-A2DA-E551DE46FA15@winand.at
*	Clean up more code using "(expr) ? true : false"	Michael Paquier	2021-10-11
\| \| \| \| \| \| \| \| \|	This is similar to fd0625c, taking care of any remaining code paths that are worth the cleanup. This also changes some cases using opposite expression patterns. Author: Justin Pryzby, Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoCdF8dnUvr-BUWWGvA_XhKSoANacBMZb6jKyCk4TYfQ2Q@mail.gmail.com
*	Refactor fallback to stderr for csvlog to handle better WIN32 service case	Michael Paquier	2021-10-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	send_message_to_server_log() would force a redirection of a log entry to stderr in some cases for csvlog, like the syslogger not being available yet. If this happens, csvlog would fall back to stderr to log some information rather than nothing. The code was organized so as stderr is done before csvlog, with csvlog checking that stderr did not happen yet with a reversed condition. With this code organization, it could be possible to lose some messages if running Postgres as a service on WIN32, as there is no usable stderr, and the handling of the StringInfoData holding the message for stderr was rather confusing because of that. This commit moves the csvlog handling to be before stderr, as as we are able to track down if it is necessary to log something to stderr. The reduces the handling of stderr to be in a single code path, adding a fallback to event logs for a WIN32 service. This also simplifies the way we handle the StringInfoData for stderr, making easier the integration of new file-based log destinations. I got to play with services and event logs on Windows while checking this change. Reviewed-by: Chris Bandy Discussion: https://postgr.es/m/YV0vwBovEKf1WXkl@paquier.xyz
*	Add missing word to comment in joinrels.c.	Etsuro Fujita	2021-10-07
\| \| \| \| \| \|	Author: Amit Langote Backpatch-through: 13 Discussion: https://postgr.es/m/CA%2BHiwqGQNbtamQ_9DU3osR1XiWR4wxWFZurPmN6zgbdSZDeWmw%40mail.gmail.com
*	Fix compilation warning in syslogger.c	Michael Paquier	2021-10-07
\| \| \| \| \| \| \|	Oversight in 5c6e33f. Author: Nathan Bossart Discussion: https://postgr.es/m/DD8AD4CE-63B7-44BE-A3D2-14A4E4B19C26@amazon.com
*	Improve order in file	Peter Eisentraut	2021-10-07
\| \| \| \| \|	Move support functions for new PublicationTable node to more sensible locations in the files.
*	Refactor per-destination file rotation in logging collector	Michael Paquier	2021-10-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	stderr and csvlog have been using duplicated code when it came to the rotation of their file by size, age or if forced by a user request (pg_ctl logrotate or the SQL function pg_rotate_logfile). The main difference between both is that stderr requires its file to always be opened, so as it is possible to have a redirection route if the logging collector is not ready yet to do its work if alternate destinations are enabled. Also, if csvlog gets disabled, we need to close properly its meta-data stored in the logging collector (last file name for current_logfiles and fd currently open for business). Except for those points, the code is the same in terms of error handling and if a file should be created or just continued. This change makes the code simpler overall, and it will help in the introduction of more file-based log destinations. This refactoring is similar to the work done in 5b0b699. Most of the duplication originates from fd801f4. Some of the TAP tests of pg_ctl check the case of a forced log rotation, but this is somewhat limited as there is no coverage for log_rotation_age or log_rotation_size (these may not be worth the extra resources to run either), and no coverage for reload of log_destination with different combinations of stderr and csvlog. I have tested all those cases separately for this refactoring. Author: Michael Paquier Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com
*	Fix corner-case loss of precision in numeric_power().	Dean Rasheed	2021-10-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a loss of precision that occurs when the first input is very close to 1, so that its logarithm is very small. Formerly, during the initial low-precision calculation to estimate the result weight, the logarithm was computed to a local rscale that was capped to NUMERIC_MAX_DISPLAY_SCALE (1000). However, the base may be as close as 1e-16383 to 1, hence its logarithm may be as small as 1e-16383, and so the local rscale needs to be allowed to exceed 16383, otherwise all precision is lost, leading to a poor choice of rscale for the full-precision calculation. Fix this by removing the cap on the local rscale during the initial low-precision calculation, as we already do in the full-precision calculation. This doesn't change the fact that the initial calculation is a low-precision approximation, computing the logarithm to around 8 significant digits, which is very fast, especially when the base is very close to 1. Patch by me, reviewed by Alvaro Herrera. Discussion: https://postgr.es/m/CAEZATCV-Ceu%2BHpRMf416yUe4KKFv%3DtdgXQAe5-7S9tD%3D5E-T1g%40mail.gmail.com
*	Flexible options for CREATE_REPLICATION_SLOT.	Robert Haas	2021-10-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Like BASE_BACKUP, CREATE_REPLICATION_SLOT has historically used a hard-coded syntax. To improve future extensibility, adopt a flexible options syntax here, too. In the new syntax, instead of three mutually exclusive options EXPORT_SNAPSHOT, USE_SNAPSHOT, and NOEXPORT_SNAPSHOT, there is now a single SNAPSHOT option with three possible values: 'export', 'use', and 'nothing'. This commit does not remove support for the old syntax. It just adds the new one as an additional option, makes pg_receivewal, pg_recvlogical, and walreceiver processes use it. Patch by me, reviewed by Fabien Coelho, Sergei Kornilov, and Fujii Masao. Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
*	Flexible options for BASE_BACKUP.	Robert Haas	2021-10-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's hard to extend. Instead, adopt the same kind of syntax we've used for SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's not necessary for all of the option names to be parser keywords. In the new syntax, most of the options now take an optional Boolean argument. To match our practice in other in places, the options which the old syntax called NOWAIT and NOVERIFY_CHECKSUMS options are in the new syntax called WAIT and VERIFY_CHECKUMS, and the default value is false. In the new syntax, the FAST option has been replaced by a CHECKSUM option whose value may be 'fast' or 'spread'. This commit does not remove support for the old syntax. It just adds the new one as an additional option, and makes pg_basebackup prefer the new syntax when the server is new enough to support it. Patch by me, reviewed and tested by Fabien Coelho, Sergei Kornilov, Fujii Masao, and Tushar Ahuja. Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
*	Make recovery report error message when invalid page header is found.	Fujii Masao	2021-10-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 0668719801 changed XLogPageRead() so that it validated the page header, if invalid page header was found reset the error message and retried reading the page, to fix the scenario where streaming standby got stuck at a continuation record. This change hid the error message about invalid page header, which would make it harder for users to investigate what the actual issue was found in WAL. To fix the issue, this commit makes XLogPageRead() report the error message when invalid page header is found. When not in standby mode, an invalid page header should cause recovery to end, not retry reading the page, so XLogPageRead() doesn't need to validate the page header for the retry. Instead, ReadPageInternal() should be responsible for the validation in that case. Therefore this commit changes XLogPageRead() so that if not in standby mode it doesn't validate the page header for the retry. Reported-by: Yugo Nagata Author: Yugo Nagata, Kyotaro Horiguchi Reviewed-by: Ranier Vilela, Fujii Masao Discussion: https://postgr.es/m/20210718045505.32f463ed6c227111038d8ae4@sraoss.co.jp
*	Remove obsolete comment in snapbuild.c.	Amit Kapila	2021-10-05
\| \| \| \| \| \| \| \| \|	Commits 955a684e04 and a975ff4980 removed the usage of running xacts information from serialized snapshots but forgot to remove the corresponding comment. Author: Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoBifOr7RS=jRe7YCavc646y9omChv6zkWXvJeZcjS9mXA@mail.gmail.com
*	Make Unicode makefile parallel-safe	Peter Eisentraut	2021-10-04
\| \| \| \| \| \| \| \| \|	Fix the rules so that each rule is parallel safe, using the same trickery that we use elsewhere in the tree for rules that produce more than one output file. Refactor the whole makefile so that there is less repetition. Discussion: https://www.postgresql.org/message-id/18e34084-aab1-1b4c-edd1-c4f9fb04f714%40enterprisedb.com
*	Fix duplicate words in comments	Daniel Gustafsson	2021-10-04
\| \| \| \| \| \| \|	Remove accidentally duplicated words in code comments. Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://postgr.es/m/87bl45t0co.fsf@wibble.ilmari.org
*	Update Unicode map text files	Peter Eisentraut	2021-10-04
\| \| \| \| \| \| \|	A couple of newer ones are available. There are no functional differences, but let's get them in anyway, so that there is no surprise diff next time someone wants to do some actual work in this area.
*	Replace occurrences of InvalidXid with InvalidTransactionId	Daniel Gustafsson	2021-10-04
\| \| \| \| \| \| \| \| \|	While Xid is a known shortening of TransactionId, InvalidXid is not defined in the code. Fix comments which mistakenly were using the shorter version. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/CALj2ACUQzdigML868nV4cojfELPkEzNLNOk7b91Pho4JB90fng@mail.gmail.com
*	Fix snapshot builds during promotion of hot standby node with 2PC	Michael Paquier	2021-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some specific logic is done at the end of recovery when involving 2PC transactions: 1) Call RecoverPreparedTransactions(), to recover the state of 2PC transactions into memory (re-acquire locks, etc.). 2) ShutdownRecoveryTransactionEnvironment(), to move back to normal operations, mainly cleaning up recovery locks and KnownAssignedXids (including any 2PC transaction tracked previously). 3) Switch XLogCtl->SharedRecoveryState to RECOVERY_STATE_DONE, which is the tipping point for any process calling RecoveryInProgress() to check if the cluster is still in recovery or not. Any snapshot taken between steps 2) and 3) would be empty, causing any transaction relying on a snapshot at this point to potentially corrupt data as there could still be some 2PC transactions to track, with RecentXmin moving backwards on successive calls to GetSnapshotData() in the same transaction. As SharedRecoveryState is the point to take into account to know if it is safe to discard KnownAssignedXids, this commit moves step 2) after step 3), so as we can never finish with empty snapshots. This exists since the introduction of hot standby, so backpatch all the way down. The window with incorrect snapshots is extremely small, but I have seen it when running 023_pitr_prepared_xact.pl, as did buildfarm member fairywren. Thomas Munro also found it independently. Special thanks to Andres Freund for taking the time to analyze this issue. Reported-by: Thomas Munro, Michael Paquier Analyzed-by: Andres Freund Discussion: https://postgr.es/m/20210422203603.fdnh3fu2mmfp2iov@alap3.anarazel.de Backpatch-through: 9.6
*	Fix checking of query type in plpgsql's RETURN QUERY command.	Tom Lane	2021-10-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to v14, we insisted that the query in RETURN QUERY be of a type that returns tuples. (For instance, INSERT RETURNING was allowed, but not plain INSERT.) That happened indirectly because we opened a cursor for the query, so spi.c checked SPI_is_cursor_plan(). As a consequence, the error message wasn't terribly on-point, but at least it was there. Commit 2f48ede08 lost this detail. Instead, plain RETURN QUERY insisted that the query be a SELECT (by checking for SPI_OK_SELECT) while RETURN QUERY EXECUTE failed to check the query type at all. Neither of these changes was intended. The only convenient place to check this in the EXECUTE case is inside _SPI_execute_plan, because we haven't done parse analysis until then. So we need to pass down a flag saying whether to enforce that the query returns tuples. Fortunately, we can squeeze another boolean into struct SPIExecuteOptions without an ABI break, since there's padding space there. (It's unlikely that any extensions would already be using this new struct, but preserving ABI in v14 seems like a smart idea anyway.) Within spi.c, it seemed like _SPI_execute_plan's parameter list was already ridiculously long, and I didn't want to make it longer. So I thought of passing SPIExecuteOptions down as-is, allowing that parameter list to become much shorter. This makes the patch a bit more invasive than it might otherwise be, but it's all internal to spi.c, so that seems fine. Per report from Marc Bachmann. Back-patch to v14 where the faulty code came in. Discussion: https://postgr.es/m/1F2F75F0-27DF-406F-848D-8B50C7EEF06A@gmail.com
*	Enable deduplication in system catalog indexes.	Peter Geoghegan	2021-10-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "equality implies image equality" opclass infrastructure disallowed deduplication in system catalog indexes and TOAST indexes before now. That seemed like the right approach back when the infrastructure was added by commit 612a1ab7, since ALTER INDEX cannot set deduplicate_items to 'off' (due to an old implementation restriction). But that decision now seems arbitrary at best. Remove special case handling implementing this policy. No catversion bump, since existing catalog indexes will still work. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wz=rYQHFaJ3WYBdK=xgwxKzaiGMSSrh-ZCREa-pS-7Zjew@mail.gmail.com
*	Error out if SKIP LOCKED and WITH TIES are both specified	Alvaro Herrera	2021-10-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both bugs #16676[1] and #17141[2] illustrate that the combination of SKIP LOCKED and FETCH FIRST WITH TIES break expectations when it comes to rows returned to other sessions accessing the same row. Since this situation is detectable from the syntax and hard to fix otherwise, forbid for now, with the potential to fix in the future. [1] https://postgr.es/m/16676-fd62c3c835880da6@postgresql.org [2] https://postgr.es/m/17141-913d78b9675aac8e@postgresql.org Backpatch-through: 13, where WITH TIES was introduced Author: David Christensen <david.christensen@crunchydata.com> Discussion: https://postgr.es/m/CAOxo6XLPccCKru3xPMaYDpa+AXyPeWFs+SskrrL+HKwDjJnLhg@mail.gmail.com
*	Remove unstable, unnecessary test; fix typo	Alvaro Herrera	2021-10-01
\| \| \| \| \| \| \| \| \| \| \| \| \|	Commit ff9f111bce24 added some test code that's unportable and doesn't add meaningful coverage. Remove it rather than try and get it to work everywhere. While at it, fix a typo in a log message added by the aforementioned commit. Backpatch to 14. Discussion: https://postgr.es/m/3000074.1632947632@sss.pgh.pa.us
*	Avoid believing incomplete MCV-only stats in get_variable_range().	Tom Lane	2021-10-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	get_variable_range() would incautiously believe that statistics containing only an MCV list are sufficient to derive a range estimate. That's okay for an enum-like column that contains only MCVs, but otherwise the estimate could be pretty bad. Make it report that the range is indeterminate unless the MCVs plus nullfrac account for the whole table. I don't think this needs a dedicated test case, since a quick code coverage check verifies that the existing regression tests traverse all the alternatives. There is room to doubt that a future-proof test case could be built anyway, given that the submitted example accidentally doesn't fail before v11. Per bug #17207 from Simon Perepelitsa. Back-patch to v10. In principle this has been broken all along, but I'm hesitant to make such changes in 9.6, since if anyone is unhappy with 9.6.24's behavior there will be no second chance to fix it. Discussion: https://postgr.es/m/17207-5265aefa79e333b4@postgresql.org
*	Fix Portal snapshot tracking to handle subtransactions properly.	Tom Lane	2021-10-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 84f5c2908 forgot to consider the possibility that EnsurePortalSnapshotExists could run inside a subtransaction with lifespan shorter than the Portal's. In that case, the new active snapshot would be popped at the end of the subtransaction, leaving a dangling pointer in the Portal, with mayhem ensuing. To fix, make sure the ActiveSnapshot stack entry is marked with the same subtransaction nesting level as the associated Portal. It's certainly safe to do so since we won't be here at all unless the stack is empty; hence we can't create an out-of-order stack. Let's also apply this logic in the case where PortalRunUtility sets portalSnapshot, just to be sure that path can't cause similar problems. It's slightly less clear that that path can't create an out-of-order stack, so add an assertion guarding it. Report and patch by Bertrand Drouvot (with kibitzing by me). Back-patch to v11, like the previous commit. Discussion: https://postgr.es/m/ff82b8c5-77f4-3fe7-6028-fcf3303e82dd@amazon.com
*	Ensure interleaved_parts field is always initialized	David Rowley	2021-10-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This field was recently added in db632fbca, however that commit missed one place where it should have initialized the new field to NULL. The missed location is where the PartitionBoundInfo is created for partition-wise join relations. Technically there could be interleaved partitions in a partition-wise join relation, but currently the only optimization we use this field for only does so for base rels and other member rels. So just document that we don't populate this field for join rels. Reported-by: Amit Langote Author: Amit Langote, David Rowley Reviewed-by: Amit Langote, David Rowley Discussion: https://postgr.es/m/CA+HiwqE76Rps24kwHsd2Cr82Ua07tJC9t9reG0c7ScX9n_xrEA@mail.gmail.com
*	Treat ETIMEDOUT as indicating a non-recoverable connection failure.	Tom Lane	2021-09-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add ETIMEDOUT to ALL_CONNECTION_FAILURE_ERRNOS' list of "errnos that identify hard failure of a previously-established network connection". While one could imagine that this is sometimes recoverable, the same could be said of other entries such as ENETDOWN. In support of this, handle ETIMEDOUT on par with other socket errors in relevant infrastructure, such as TranslateSocketError(). (I made a couple of cosmetic adjustments in TranslateSocketError(), too.) The code now assumes that ETIMEDOUT is defined everywhere, which it should be given that POSIX has required it since SUSv2. Perhaps this should be back-patched, but I'm hesitant to do so given the lack of previous complaints, and the hazard that there's a small ABI break on Windows from redefining the symbol. Even if we decide to do that, it'd be prudent to let this bake awhile in HEAD first. Jelte Fennema Discussion: https://postgr.es/m/AM5PR83MB01782BFF2978505F6D6C559AF7AA9@AM5PR83MB0178.EURPRD83.prod.outlook.com
*	Fix WAL replay in presence of an incomplete record	Alvaro Herrera	2021-09-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Physical replication always ships WAL segment files to replicas once they are complete. This is a problem if one WAL record is split across a segment boundary and the primary server crashes before writing down the segment with the next portion of the WAL record: WAL writing after crash recovery would happily resume at the point where the broken record started, overwriting that record ... but any standby or backup may have already received a copy of that segment, and they are not rewinding. This causes standbys to stop following the primary after the latter crashes: LOG: invalid contrecord length 7262 at A8/D9FFFBC8 because the standby is still trying to read the continuation record (contrecord) for the original long WAL record, but it is not there and it will never be. A workaround is to stop the replica, delete the WAL file, and restart it -- at which point a fresh copy is brought over from the primary. But that's pretty labor intensive, and I bet many users would just give up and re-clone the standby instead. A fix for this problem was already attempted in commit 515e3d84a0b5, but it only addressed the case for the scenario of WAL archiving, so streaming replication would still be a problem (as well as other things such as taking a filesystem-level backup while the server is down after having crashed), and it had performance scalability problems too; so it had to be reverted. This commit fixes the problem using an approach suggested by Andres Freund, whereby the initial portion(s) of the split-up WAL record are kept, and a special type of WAL record is written where the contrecord was lost, so that WAL replay in the replica knows to skip the broken parts. With this approach, we can continue to stream/archive segment files as soon as they are complete, and replay of the broken records will proceed across the crash point without a hitch. Because a new type of WAL record is added, users should be careful to upgrade standbys first, primaries later. Otherwise they risk the standby being unable to start if the primary happens to write such a record. A new TAP test that exercises this is added, but the portability of it is yet to be seen. This has been wrong since the introduction of physical replication, so backpatch all the way back. In stable branches, keep the new XLogReaderState members at the end of the struct, to avoid an ABI break. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Nathan Bossart <bossartn@amazon.com> Discussion: https://postgr.es/m/202108232252.dh7uxf6oxwcy@alvherre.pgsql
*	Clarify use of "statistics objects" in the code	Michael Paquier	2021-09-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code inconsistently used "statistic object" or "statistics" where the correct term, as discussed, is actually "statistics object". This improves the state of the code to be more consistent. While on it, fix an incorrect error message introduced in a4d75c8. This error should never happen, as the code states, but it would be misleading. Author: Justin Pryzby Reviewed-by: Álvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com Backpatch-through: 14
*	Refactor output file handling when forking syslogger under EXEC_BACKEND	Michael Paquier	2021-09-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A forked logging collector in EXEC_BACKEND builds passes down file descriptors (or HANDLEs in WIN32) through a command for files to be reopened (for stderr and csvlog). Some of its logic was duplicated, and this commit refactors the code with some wrapper routines for file reopening after forking and fd grabbing when building the command for the fork. While on it, this simplifies a use of "long" in the code, introduced by ab0ba6e to take care of a warning related to MinGW-W64 when mapping a intptr_t to a printed value. "long" is 32-bit long on Windows, and interoperability of Win32 and Win64 ensures that handles are always 32-bit significant, so we can just use "int" for the same result. This also makes the new routines more symmetric. This change makes easier the introduction of new log destinations in the logging collector, and this is not the only piece of refactoring planned. I have tested this change with EXEC_BACKEND on linux, macos, and of course MSVC (both Win32 and Win64), but not MinGW so the buildfarm may have something to say here. Author: Sehrope Sarkuni, Michael Paquier Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com