aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
* Avoid crash with WHERE CURRENT OF and a custom scan plan.Tom Lane2021-01-18
| | | | | | | | | | | | | | | | | | | | | | | execCurrent.c's search_plan_tree() assumed that ForeignScanStates and CustomScanStates necessarily have a valid ss_currentRelation. This is demonstrably untrue for postgres_fdw's remote join and remote aggregation plans, and non-leaf custom scans might not have an identifiable scan relation either. Avoid crashing by ignoring such nodes when the field is null. This solution will lead to errors like 'cursor "foo" is not a simply updatable scan of table "bar"' in cases where maybe we could have allowed WHERE CURRENT OF to work. That's not an issue for postgres_fdw's usages, since joins or aggregations would render WHERE CURRENT OF invalid anyway. But an otherwise-transparent upper level custom scan node might find this annoying. When and if someone cares to expend work on such a scenario, we could invent a custom-scan-provider callback to determine what's safe. Report and patch by David Geier, commentary by me. It's been like this for awhile, so back-patch to all supported branches. Discussion: https://postgr.es/m/0253344d-9bdd-11c4-7f0d-d88c02cd7991@swarm64.com
* Fix pg_dump for GRANT OPTION among initial privileges.Noah Misch2021-01-16
| | | | | | | | | | | | | | | | The context is an object that no longer bears some aclitem that it bore initially. (A user issued REVOKE or GRANT statements upon the object.) pg_dump is forming SQL to reproduce the object ACL. Since initdb creates no ACL bearing GRANT OPTION, reaching this bug requires an extension where the creation script establishes such an ACL. No PGXN extension does that. If an installation did reach the bug, pg_dump would have omitted a semicolon, causing a REVOKE and the next SQL statement to fail. Separately, since the affected code exists to eliminate an entire aclitem, it wants plain REVOKE, not REVOKE GRANT OPTION FOR. Back-patch to 9.6, where commit 23f34fa4ba358671adab16773e79c17c92cbc870 first appeared. Discussion: https://postgr.es/m/20210109102423.GA160022@rfd.leadboat.com
* Prevent excess SimpleLruTruncate() deletion.Noah Misch2021-01-16
| | | | | | | | | | | | | | | | | Every core SLRU wraps around. With the exception of pg_notify, the wrap point can fall in the middle of a page. Account for this in the PagePrecedes callback specification and in SimpleLruTruncate()'s use of said callback. Update each callback implementation to fit the new specification. This changes SerialPagePrecedesLogically() from the style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes(). (Whereas pg_clog and pg_serial share a key space, pg_serial is nothing like pg_notify.) The bug fixed here has the same symptoms and user followup steps as 592a589a04bd456410b853d86bd05faa9432cbbb. Back-patch to 9.5 (all supported versions). Reviewed by Andrey Borodin and (in earlier versions) by Tom Lane. Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com
* Disallow CREATE STATISTICS on system catalogsTomas Vondra2021-01-15
| | | | | | | | | | | | | | Add a check that CREATE STATISTICS does not add extended statistics on system catalogs, similarly to indexes etc. It can be overriden using the allow_system_table_mods GUC. This bug exists since 7b504eb282c, adding the extended statistics, so backpatch all the way back to PostgreSQL 10. Author: Tomas Vondra Reported-by: Dean Rasheed Backpatch-through: 10 Discussion: https://postgr.es/m/CAEZATCXAPrrOKwEsyZKQ4uzzJQWBCt6QAvOcgqRGdWwT1zb%2BrQ%40mail.gmail.com
* Improve our heuristic for selecting PG_SYSROOT on macOS.Tom Lane2021-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cases where Xcode is newer than the underlying macOS version, asking xcodebuild for the SDK path will produce a pointer to the SDK shipped with Xcode, which may end up building code that does not work on the underlying macOS version. It appears that in such cases, xcodebuild's answer also fails to match the default behavior of Apple's compiler: assuming one has installed Xcode's "command line tools", there will be an SDK for the OS's own version in /Library/Developer/CommandLineTools, and the compiler will default to using that. This is all pretty poorly documented, but experimentation suggests that "xcrun --show-sdk-path" gives the sysroot path that the compiler is actually using, at least in some cases. Hence, try that first, but revert to xcodebuild if xcrun fails (in very old Xcode, it is missing or lacks the --show-sdk-path switch). Also, "xcrun --show-sdk-path" may give a path that is valid but lacks any OS version identifier. We don't really want that, since most of the motivation for wiring -isysroot into the build flags at all is to ensure that all parts of a PG installation are built against the same SDK, even when considering extensions built later and/or on a different machine. Insist on finding "N.N" in the directory name before accepting the result. (Adding "--sdk macosx" to the xcrun call seems to produce the same answer as xcodebuild, but usually more quickly because it's cached, so we also try that as a fallback.) The core reason why we don't want to use Xcode's default SDK in cases like this is that Apple's technology for introducing new syscalls does not play nice with Autoconf: for example, configure will think that preadv/pwritev exist when using a Big Sur SDK, even when building on an older macOS version where they don't exist. It'd be nice to have a better solution to that problem, but this patch doesn't attempt to fix that. Per report from Sergey Shinderuk. Back-patch to all supported versions. Discussion: https://postgr.es/m/ed3b8e5d-0da8-6ebd-fd1c-e0ac80a4b204@postgrespro.ru
* Fix calculation of how much shared memory is required to store a TOC.Fujii Masao2021-01-15
| | | | | | | | | | | Commit ac883ac453 refactored shm_toc_estimate() but changed its calculation of shared memory size for TOC incorrectly. Previously this could cause too large memory to be allocated. Back-patch to v11 where the bug was introduced. Author: Takayuki Tsunakawa Discussion: https://postgr.es/m/TYAPR01MB2990BFB73170E2C4921E2C4DFEA80@TYAPR01MB2990.jpnprd01.prod.outlook.com
* pg_dump: label PUBLICATION TABLE ArchiveEntries with an owner.Tom Lane2021-01-14
| | | | | | | | | | | | | | | | | | | | | | This is the same fix as commit 9eabfe300 applied to INDEX ATTACH entries, but for table-to-publication attachments. As in that case, even though the backend doesn't record "ownership" of the attachment, we still ought to label it in the dump archive with the role name that should run the ALTER PUBLICATION command. The existing behavior causes the ALTER to be done by the original role that started the restore; that will usually work fine, but there may be corner cases where it fails. The bulk of the patch is concerned with changing struct PublicationRelInfo to include a pointer to the associated PublicationInfo object, so that we can get the owner's name out of that when the time comes. While at it, I rewrote getPublicationTables() to do just one query of pg_publication_rel, not one per table. Back-patch to v10 where this code was introduced. Discussion: https://postgr.es/m/1165710.1610473242@sss.pgh.pa.us
* Prevent drop of tablespaces used by partitioned relationsAlvaro Herrera2021-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a tablespace is used in a partitioned relation (per commits ca4103025dfe in pg12 for tables and 33e6c34c3267 in pg11 for indexes), it is possible to drop the tablespace, potentially causing various problems. One such was reported in bug #16577, where a rewriting ALTER TABLE causes a server crash. Protect against this by using pg_shdepend to keep track of tablespaces when used for relations that don't keep physical files; we now abort a tablespace if we see that the tablespace is referenced from any partitioned relations. Backpatch this to 11, where this problem has been latent all along. We don't try to create pg_shdepend entries for existing partitioned indexes/tables, but any ones that are modified going forward will be protected. Note slight behavior change: when trying to drop a tablespace that contains both regular tables as well as partitioned ones, you'd previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST. Arguably, the latter is more correct. It is possible to add protecting pg_shdepend entries for existing tables/indexes, by doing ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default; ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace; for each partitioned table/index that is not in the database default tablespace. Because these partitioned objects do not have storage, no file needs to be actually moved, so it shouldn't take more time than what's required to acquire locks. This query can be used to search for such relations: SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0 Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Michael Paquier <michael@paquier.xyz>
* Stabilize timeline switch regression test.Fujii Masao2021-01-14
| | | | | | | | | | | | | | | | | | Commit fef5b47f6b added the regression test to check whether a standby is able to follow a primary on a newer timeline when WAL archiving is enabled. But the buildfarm member florican reported that this test failed because the requested WAL segment was removed and replication failed. This is a timing issue. Since neither replication slot is used nor wal_keep_size is set in the test, checkpoint could remove the WAL segment that's still necessary for replication. This commit stabilizes the test by setting wal_keep_size. Back-patch to v13 where the regression test that this commit stabilizes was added. Author: Fujii Masao Discussion: https://postgr.es/m/X//PsenxcC50jDzX@paquier.xyz
* Ensure that a standby is able to follow a primary on a newer timeline.Fujii Masao2021-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 709d003fbd refactored WAL-reading code, but accidentally caused WalSndSegmentOpen() to fail to follow a timeline switch while reading from a historic timeline. This issue caused a standby to fail to follow a primary on a newer timeline when WAL archiving is enabled. If there is a timeline switch within the segment, WalSndSegmentOpen() should read from the WAL segment belonging to the new timeline. But previously since it failed to follow a timeline switch, it tried to read the WAL segment with old timeline. When WAL archiving is enabled, that WAL segment with old timeline doesn't exist because it's renamed to .partial. This leads a primary to have tried to read non-existent WAL segment, and which caused replication to faill with the error "ERROR: requested WAL segment ... has already been removed". This commit fixes WalSndSegmentOpen() so that it's able to follow a timeline switch, to ensure that a standby is able to follow a primary on a newer timeline even when WAL archiving is enabled. This commit also adds the regression test to check whether a standby is able to follow a primary on a newer timeline when WAL archiving is enabled. Back-patch to v13 where the bug was introduced. Reported-by: Kyotaro Horiguchi Author: Kyotaro Horiguchi, tweaked by Fujii Masao Reviewed-by: Alvaro Herrera, Fujii Masao Discussion: https://postgr.es/m/20201209.174314.282492377848029776.horikyota.ntt@gmail.com
* Disallow a digit as the first character of a variable name in pgbench.Tom Lane2021-01-13
| | | | | | | | | | | | | | | | | | The point of this restriction is to avoid trying to substitute variables into timestamp literal values, which may contain strings like '12:34'. There is a good deal more that should be done to reduce pgbench's tendency to substitute where it shouldn't. But this is sufficient to solve the case complained of by Jaime Soler, and it's simple enough to back-patch. Back-patch to v11; before commit 9d36a3866, pgbench had a slightly different definition of what a variable name is, and anyway it seems unwise to change long-stable branches for this. Fabien Coelho Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2006291740420.805678@pseudo
* Fix memory leak in SnapBuildSerialize.Amit Kapila2021-01-13
| | | | | | | | | | | | | | The memory for the snapshot was leaked while serializing it to disk during logical decoding. This memory will be freed only once walsender stops streaming the changes. This can lead to a huge memory increase when master logs Standby Snapshot too frequently say when the user is trying to create many replication slots. Reported-by: funnyxj.fxj@alibaba-inc.com Diagnosed-by: funnyxj.fxj@alibaba-inc.com Author: Amit Kapila Backpatch-through: 9.5 Discussion: https://postgr.es/m/033ab54c-6393-42ee-8ec9-2b399b5d8cde.funnyxj.fxj@alibaba-inc.com
* pg_dump: label INDEX ATTACH ArchiveEntries with an owner.Tom Lane2021-01-12
| | | | | | | | | | | | | | | | Although a partitioned index's attachment to its parent doesn't have separate ownership, the ArchiveEntry for it needs to be marked with an owner anyway, to ensure that the ALTER command is run by the appropriate role when restoring with --use-set-session-authorization. Without this, the ALTER will be run by the role that started the restore session, which will usually work but it's formally the wrong thing. Back-patch to v11 where this type of ArchiveEntry was added. In HEAD, add equivalent commentary to the just-added TABLE ATTACH case, which I'd made do the right thing already. Discussion: https://postgr.es/m/1094034.1610418498@sss.pgh.pa.us
* Fix thinko in commentAlvaro Herrera2021-01-12
| | | | | | | | This comment has been wrong since its introduction in commit 2c03216d8311. Author: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoAzz6qipFJBbGEaHmyWxvvNDp8httbwLR9tUQWaTjUs2Q@mail.gmail.com
* Fix relation descriptor leak.Amit Kapila2021-01-12
| | | | | | | | | | | We missed closing the relation descriptor while sending changes via the root of partitioned relations during logical replication. Author: Amit Langote and Mark Zhao Reviewed-by: Amit Kapila and Ashutosh Bapat Backpatch-through: 13, where it was introduced Discussion: https://postgr.es/m/tencent_41FEA657C206F19AB4F406BE9252A0F69C06@qq.com Discussion: https://postgr.es/m/tencent_6E296D2F7D70AFC90D83353B69187C3AA507@qq.com
* Fix ancient bug in parsing of BRE-mode regular expressions.Tom Lane2021-01-08
| | | | | | | | | | | | | | | | | brenext(), when parsing a '*' quantifier, forgot to return any "value" for the token; per the equivalent case in next(), it should return value 1 to indicate that greedy rather than non-greedy behavior is wanted. The result is that the compiled regexp could behave like 'x*?' rather than the intended 'x*', if we were unlucky enough to have a zero in v->nextvalue at this point. That seems to happen with some reliability if we have '.*' at the beginning of a BRE-mode regexp, although that depends on the initial contents of a stack-allocated struct, so it's not guaranteed to fail. Found by Alexander Lakhin using valgrind testing. This bug seems to be aboriginal in Spencer's code, so back-patch all the way. Discussion: https://postgr.es/m/16814-6c5e3edd2bdf0d50@postgresql.org
* Adjust createdb TAP tests to work on recent OpenBSD.Tom Lane2021-01-07
| | | | | | | | | | | | | | | | | | | We found last February that the error-case tests added by commit 008cf0409 failed on OpenBSD, because that platform doesn't really check locale names. At the time it seemed that that was only an issue for LC_CTYPE, but testing on a more recent version of OpenBSD shows that it's now equally lax about LC_COLLATE. Rather than dropping the LC_COLLATE test too, put back LC_CTYPE (reverting c4b0edb07), and adjust these tests to accept the different error message that we get if setlocale() doesn't reject a bogus locale name. The point of these tests is not really what the backend does with the locale name, but to show that createdb quotes funny locale names safely; so we're not losing test reliability this way. Back-patch as appropriate. Discussion: https://postgr.es/m/231373.1610058324@sss.pgh.pa.us
* Further second thoughts about idle_session_timeout patch.Tom Lane2021-01-07
| | | | | | | | | | | On reflection, the order of operations in PostgresMain() is wrong. These timeouts ought to be shut down before, not after, we do the post-command-read CHECK_FOR_INTERRUPTS, to guarantee that any timeout error will be detected there rather than at some ill-defined later point (possibly after having wasted a lot of work). This is really an error in the original idle_in_transaction_timeout patch, so back-patch to 9.6 where that was introduced.
* Detect the deadlocks between backends and the startup process.Fujii Masao2021-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The deadlocks that the recovery conflict on lock is involved in can happen between hot-standby backends and the startup process. If a backend takes an access exclusive lock on the table and which finally triggers the deadlock, that deadlock can be detected as expected. On the other hand, previously, if the startup process took an access exclusive lock and which finally triggered the deadlock, that deadlock could not be detected and could remain even after deadlock_timeout passed. This is a bug. The cause of this bug was that the code for handling the recovery conflict on lock didn't take care of deadlock case at all. It assumed that deadlocks involving the startup process and backends were able to be detected by the deadlock detector invoked within backends. But this assumption was incorrect. The startup process also should have invoked the deadlock detector if necessary. To fix this bug, this commit makes the startup process invoke the deadlock detector if deadlock_timeout is reached while handling the recovery conflict on lock. Specifically, in that case, the startup process requests all the backends holding the conflicting locks to check themselves for deadlocks. Back-patch to v9.6. v9.5 has also this bug, but per discussion we decided not to back-patch the fix to v9.5. Because v9.5 doesn't have some infrastructure codes (e.g., 37c54863cf) that this bug fix patch depends on. We can apply those codes for the back-patch, but since the next minor version release is the final one for v9.5, it's risky to do that. If we unexpectedly introduce new bug to v9.5 by the back-patch, there is no chance to fix that. We determined that the back-patch to v9.5 would give more risk than gain. Author: Fujii Masao Reviewed-by: Bertrand Drouvot, Masahiko Sawada, Kyotaro Horiguchi Discussion: https://postgr.es/m/4041d6b6-cf24-a120-36fa-1294220f8243@oss.nttdata.com
* Add an explicit cast to double when using fabs().Dean Rasheed2021-01-05
| | | | | | | Commit bc43b7c2c0 used fabs() directly on an int variable, which apparently requires an explicit cast on some platforms. Per buildfarm.
* Fix numeric_power() when the exponent is INT_MIN.Dean Rasheed2021-01-05
| | | | | | | | | | | | In power_var_int(), the computation of the number of significant digits to use in the computation used log(Abs(exp)), which isn't safe because Abs(exp) returns INT_MIN when exp is INT_MIN. Use fabs() instead of Abs(), so that the exponent is cast to a double before the absolute value is taken. Back-patch to 9.6, where this was introduced (by 7d9a4737c2). Discussion: https://postgr.es/m/CAEZATCVd6pMkz=BrZEgBKyqqJrt2xghr=fNc8+Z=5xC6cgWrWA@mail.gmail.com
* Fix integer-overflow corner cases in substring() functions.Tom Lane2021-01-04
| | | | | | | | | | | | | | | | | | | | | | | | | If the substring start index and length overflow when added together, substring() misbehaved, either throwing a bogus "negative substring length" error on a case that should succeed, or failing to complain that a negative length is negative (and instead returning the whole string, in most cases). Unsurprisingly, the text, bytea, and bit variants of the function all had this issue. Rearrange the logic to ensure that negative lengths are always rejected, and add an overflow check to handle the other case. Also install similar guards into detoast_attr_slice() (nee heap_tuple_untoast_attr_slice()), since it's far from clear that no other code paths leading to that function could pass it values that would overflow. Patch by myself and Pavel Stehule, per bug #16804 from Rafi Shamim. Back-patch to v11. While these bugs are old, the common/int.h infrastructure for overflow-detecting arithmetic didn't exist before commit 4d6ad3125, and it doesn't seem like these misbehaviors are bad enough to justify developing a standalone fix for the older branches. Discussion: https://postgr.es/m/16804-f4eeeb6c11ba71d4@postgresql.org
* Get heap page max offset with buffer lock held.Peter Geoghegan2020-12-30
| | | | | | | | | | On further reflection it seems better to call PageGetMaxOffsetNumber() after acquiring a buffer lock on the page. This shouldn't really matter, but doing it this way is cleaner. Follow-up to commit 42288174. Backpatch: 12-, just like commit 42288174
* Fix index deletion latestRemovedXid bug.Peter Geoghegan2020-12-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The logic for determining the latest removed XID for the purposes of generating recovery conflicts in REDO routines was subtly broken. It failed to follow links from HOT chains, and so failed to consider all relevant heap tuple headers in some cases. To fix, expand the loop that deals with LP_REDIRECT line pointers to also deal with HOT chains. The new version of the loop is loosely based on a similar loop from heap_prune_chain(). The impact of this bug is probably quite limited, since the horizon code necessarily deals with heap tuples that are pointed to by LP_DEAD-set index tuples. The process of setting LP_DEAD index tuples (e.g. within the kill_prior_tuple mechanism) is highly correlated with opportunistic pruning of pointed-to heap tuples. Plus the question of generating a recovery conflict usually comes up some time after index tuple LP_DEAD bits were initially set, unlike heap pruning, where a latestRemovedXid is generated at the point of the pruning operation (heap pruning has no deferred "would-be page split" style processing that produces conflicts lazily). Only backpatch to Postgres 12, the first version where this logic runs during original execution (following commit 558a9165e08). The index latestRemovedXid mechanism has had the same bug since it first appeared over 10 years ago (in commit a760893d), but backpatching to all supported versions now seems like a bad idea on balance. Running the new improved code during recovery seems risky, especially given the lack of complaints from the field. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wz=Eib393+HHcERK_9MtgNS7Ew1HY=RDC_g6GL46zM5C6Q@mail.gmail.com Backpatch: 12-
* Fix up usage of krb_server_keyfile GUC parameter.Tom Lane2020-12-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | secure_open_gssapi() installed the krb_server_keyfile setting as KRB5_KTNAME unconditionally, so long as it's not empty. However, pg_GSS_recvauth() only installed it if KRB5_KTNAME wasn't set already, leading to a troubling inconsistency: in theory, clients could see different sets of server principal names depending on whether they use GSSAPI encryption. Always using krb_server_keyfile seems like the right thing, so make both places do that. Also fix up secure_open_gssapi()'s lack of a check for setenv() failure --- it's unlikely, surely, but security-critical actions are no place to be sloppy. Also improve the associated documentation. This patch does nothing about secure_open_gssapi()'s use of setenv(), and indeed causes pg_GSS_recvauth() to use it too. That's nominally against project portability rules, but since this code is only built with --with-gssapi, I do not feel a need to do something about this in the back branches. A fix will be forthcoming for HEAD though. Back-patch to v12 where GSSAPI encryption was introduced. The dubious behavior in pg_GSS_recvauth() goes back further, but it didn't have anything to be inconsistent with, so let it be. Discussion: https://postgr.es/m/2187460.1609263156@sss.pgh.pa.us
* In pg_upgrade cross-version test, handle lack of oldstyle_length().Noah Misch2020-12-30
| | | | | This suffices for testing v12 -> v13; some other version pairs need more changes. Back-patch to v10, which removed the function.
* Improve log messages related to pg_hba.conf not matching a connection.Tom Lane2020-12-28
| | | | | | | | | | Include details on whether GSS encryption has been activated; since we added "hostgssenc" type HBA entries, that's relevant info. Kyotaro Horiguchi and Tom Lane. Back-patch to v12 where GSS encryption was introduced. Discussion: https://postgr.es/m/e5b0b6ed05764324a2f3fe7acfc766d5@smhi.se
* Fix assorted issues in backend's GSSAPI encryption support.Tom Lane2020-12-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unrecoverable errors detected by GSSAPI encryption can't just be reported with elog(ERROR) or elog(FATAL), because attempting to send the error report to the client is likely to lead to infinite recursion or loss of protocol sync. Instead make this code do what the SSL encryption code has long done, which is to just report any such failure to the server log (with elevel COMMERROR), then pretend we've lost the connection by returning errno = ECONNRESET. Along the way, fix confusion about whether message translation is done by pg_GSS_error() or its callers (the latter should do it), and make the backend version of that function work more like the frontend version. Avoid allocating the port->gss struct until it's needed; we surely don't need to allocate it in the postmaster. Improve logging of "connection authorized" messages with GSS enabled. (As part of this, I back-patched the code changes from dc11f31a1.) Make BackendStatusShmemSize() account for the GSS-related space that will be allocated by CreateSharedBackendStatus(). This omission could possibly cause out-of-shared-memory problems with very high max_connections settings. Remove arbitrary, pointless restriction that only GSS authentication can be used on a GSS-encrypted connection. Improve documentation; notably, document the fact that libpq now prefers GSS encryption over SSL encryption if both are possible. Per report from Mikael Gustavsson. Back-patch to v12 where this code was introduced. Discussion: https://postgr.es/m/e5b0b6ed05764324a2f3fe7acfc766d5@smhi.se
* Fix bugs in libpq's GSSAPI encryption support.Tom Lane2020-12-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The critical issue fixed here is that if a GSSAPI-encrypted connection is successfully made, pqsecure_open_gss() cleared conn->allow_ssl_try, as an admittedly-hacky way of preventing us from then trying to tunnel SSL encryption over the already-encrypted connection. The problem with that is that if we abandon the GSSAPI connection because of a failure during authentication, we would not attempt SSL encryption in the next try with the same server. This can lead to unexpected connection failure, or silently getting a non-encrypted connection where an encrypted one is expected. Fortunately, we'd only manage to make a GSSAPI-encrypted connection if both client and server hold valid tickets in the same Kerberos infrastructure, which is a relatively uncommon environment. Nonetheless this is a very nasty bug with potential security consequences. To fix, don't reset the flag, instead adding a check for conn->gssenc being already true when deciding whether to try to initiate SSL. While here, fix some lesser issues in libpq's GSSAPI code: * Use the need_new_connection stanza when dropping an attempted GSSAPI connection, instead of partially duplicating that code. The consequences of this are pretty minor: AFAICS it could only lead to auth_req_received or password_needed remaining set when they shouldn't, which is not too harmful. * Fix pg_GSS_error() to not repeat the "mprefix" it's given multiple times, and to notice any failure return from gss_display_status(). * Avoid gratuitous dependency on NI_MAXHOST in pg_GSS_load_servicename(). Per report from Mikael Gustavsson. Back-patch to v12 where this code was introduced. Discussion: https://postgr.es/m/e5b0b6ed05764324a2f3fe7acfc766d5@smhi.se
* Expose the default for channel_binding in PQconndefaults().Tom Lane2020-12-28
| | | | | | | | | If there's a static default value for a connection option, it should be shown in the PQconninfoOptions array. Daniele Varrazzo Discussion: https://postgr.es/m/CA+mi_8Zo8Rgn7p+6ZRY7QdDu+23ukT9AvoHNyPbgKACxwgGhZA@mail.gmail.com
* Further fix thinko in plpgsql memory leak fix.Tom Lane2020-12-28
| | | | | | | | | | There's a second call of get_eval_mcontext() that should also be get_stmt_mcontext(). This is actually dead code, since no interesting allocations happen before switching back to the original context, but we should keep it in sync with the other call to forestall possible future bugs. Discussion: https://postgr.es/m/f075f7be-c654-9aa8-3ffc-e9214622f02a@enterprisedb.com
* Fix thinko in plpgsql memory leak fix.Tom Lane2020-12-28
| | | | | | | | | | | | | | | | Commit a6b1f5365 intended to place the transient "target" list of a CALL statement in the function's statement-lifespan context, but I fat-fingered that and used get_eval_mcontext() instead of get_stmt_mcontext(). The eval_mcontext belongs to the "simple expression" infrastructure, which is destroyed at transaction end. The net effect is that a CALL in a procedure to another procedure that has OUT or INOUT parameters would fail if the called procedure did a COMMIT. Per report from Peter Eisentraut. Back-patch to v11, like the prior patch. Discussion: https://postgr.es/m/f075f7be-c654-9aa8-3ffc-e9214622f02a@enterprisedb.com
* Fix inconsistent code with shared invalidations of snapshotsMichael Paquier2020-12-28
| | | | | | | | | | | | The code in charge of processing a single invalidation message has been using since 568d413 the structure for relation mapping messages. This had fortunately no consequence as both locate the database ID at the same location, but it could become a problem in the future if this area of the code changes. Author: Konstantin Knizhnik Discussion: https://postgr.es/m/8044c223-4d3a-2cdb-42bf-29940840ce94@postgrespro.ru Backpatch-through: 9.5
* Second attempt to stabilize 05c02589.Jeff Davis2020-12-27
| | | | | | Removing the EXPLAIN test to stabilize the buildfarm. The execution test should still be effective to catch the bug even if the plan is slightly different on different platforms.
* Stabilize test introduced in 05c02589, per buildfarm.Jeff Davis2020-12-27
| | | | | | In passing, make the capitalization match the rest of the file. Reported-by: Tom Lane
* Fix bug #16784 in Disk-based Hash Aggregation.Jeff Davis2020-12-26
| | | | | | | | | | | | | | | | | Before processing tuples, agg_refill_hash_table() was setting all pergroup pointers to NULL to signal to advance_aggregates() that it should not attempt to advance groups that had spilled. The problem was that it also set the pergroups for sorted grouping sets to NULL, which caused rescanning to fail. Instead, change agg_refill_hash_table() to only set the pergroups for hashed grouping sets to NULL; and when compiling the expression, pass doSort=false. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/16784-7ff169bf2c3d1588%40postgresql.org Backpatch-through: 13
* Invalidate acl.c caches when pg_authid changes.Noah Misch2020-12-25
| | | | | | | | | | This makes existing sessions reflect "ALTER ROLE ... [NO]INHERIT" as quickly as they have been reflecting "GRANT role_name". Back-patch to 9.5 (all supported versions). Reviewed by Nathan Bossart. Discussion: https://postgr.es/m/20201221095028.GB3777719@rfd.leadboat.com
* Avoid time-of-day-dependent failure in log rotation test.Tom Lane2020-12-24
| | | | | | | | | | | | | | | | | | | | | | | Buildfarm members pogona and petalura have shown a failure when pg_ctl/t/004_logrotate.pl starts just before local midnight. The default rotate-at-midnight behavior occurs just before the Perl script examines current_logfiles, so it figures that the rotation it's already requested has occurred ... but in reality, that rotation happens just after it looks, so the expected new log data goes into a different file than the one it's examining. In HEAD, src/test/kerberos/t/001_auth.pl has acquired similar code that evidently has a related failure mode. Besides being quite new, few buildfarm critters run that test, so it's unsurprising that we've not yet seen a failure there. Fix both cases by setting log_rotation_age = 0 so that no time-based rotation can occur. Also absorb 004_logrotate.pl's decision to set lc_messages = 'C' into the kerberos test, in hopes that it will work in non-English prevailing locales. Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-12-24%2022%3A10%3A04 Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-02-01%2022%3A20%3A04
* Fix race condition between shutdown and unstarted background workers.Tom Lane2020-12-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a database shutdown (smart or fast) is commanded between the time some process decides to request a new background worker and the time that the postmaster can launch that worker, then nothing happens because the postmaster won't launch any bgworkers once it's exited PM_RUN state. This is fine ... unless the requesting process is waiting for that worker to finish (or even for it to start); in that case the requestor is stuck, and only manual intervention will get us to the point of being able to shut down. To fix, cancel pending requests for workers when the postmaster sends shutdown (SIGTERM) signals, and similarly cancel any new requests that arrive after that point. (We can optimize things slightly by only doing the cancellation for workers that have waiters.) To fit within the existing bgworker APIs, the "cancel" is made to look like the worker was started and immediately stopped, causing deregistration of the bgworker entry. Waiting processes would have to deal with premature worker exit anyway, so this should introduce no bugs that weren't there before. We do have a side effect that registration records for restartable bgworkers might disappear when theoretically they should have remained in place; but since we're shutting down, that shouldn't matter. Back-patch to v10. There might be value in putting this into 9.6 as well, but the management of bgworkers is a bit different there (notably see 8ff518699) and I'm not convinced it's worth the effort to validate the patch for that branch. Discussion: https://postgr.es/m/661570.1608673226@sss.pgh.pa.us
* Fix portability issues with parsing of recovery_target_xidMichael Paquier2020-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | The parsing of this parameter has been using strtoul(), which is not portable across platforms. On most Unix platforms, unsigned long has a size of 64 bits, while on Windows it is 32 bits. It is common in recovery scenarios to rely on the output of txid_current() or even the newer pg_current_xact_id() to get a transaction ID for setting up recovery_target_xid. The value returned by those functions includes the epoch in the computed result, which would cause strtoul() to fail where unsigned long has a size of 32 bits once the epoch is incremented. WAL records and 2PC data include only information about 32-bit XIDs and it is not possible to have XIDs across more than one epoch, so discarding the high bits from the transaction ID set has no impact on recovery. On the contrary, the use of strtoul() prevents a consistent behavior across platforms depending on the size of unsigned long. This commit changes the parsing of recovery_target_xid to use pg_strtouint64() instead, available down to 9.6. There is one TAP test stressing recovery with recovery_target_xid, where a tweak based on pg_reset{xlog,wal} is added to bump the XID epoch so as this change gets tested, as per an idea from Alexander Lakhin. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/16780-107fd0c0385b1035@postgresql.org Backpatch-through: 9.6
* Improve find_em_expr_usable_for_sorting_rel commentTomas Vondra2020-12-22
| | | | | | | | | | | Clarify the relationship between find_em_expr_usable_for_sorting_rel and prepare_sort_from_pathkeys, i.e. what restrictions need to be shared between those two places. Author: James Coleman Reviewed-by: Tomas Vondra Backpatch-through: 13 Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com
* Don't search for volatile expr in find_em_expr_usable_for_sorting_relTomas Vondra2020-12-21
| | | | | | | | | | | | While prepare_sort_from_pathkeys has to be concerned about matching up a volatile expression to the proper tlist entry, we don't need to do that in find_em_expr_usable_for_sorting_rel becausee such a sort will have to be postponed anyway. Author: James Coleman Reviewed-by: Tomas Vondra Backpatch-through: 13 Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com
* Disallow SRFs when considering sorts below Gather MergeTomas Vondra2020-12-21
| | | | | | | | | | | | | | | | | While we do allow SRFs in ORDER BY, scan/join processing should not consider such cases - such sorts should only happen via final Sort atop a ProjectSet. So make sure we don't try adding such sorts below Gather Merge, just like we do for expressions that are volatile and/or not parallel safe. Backpatch to PostgreSQL 13, where this code was introduced as part of the Incremental Sort patch. Author: James Coleman Reviewed-by: Tomas Vondra Backpatch-through: 13 Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com Discussion: https://postgr.es/m/295524.1606246314%40sss.pgh.pa.us
* Remove "invalid concatenation of jsonb objects" error case.Tom Lane2020-12-21
| | | | | | | | | | | | | | | The jsonb || jsonb operator arbitrarily rejected certain combinations of scalar and non-scalar inputs, while being willing to concatenate other combinations. This was of course quite undocumented. Rather than trying to document it, let's just remove the restriction, creating a uniform rule that unless we are handling an object-to-object concatenation, non-array inputs are converted to one-element arrays, resulting in an array-to-array concatenation. (This does not change the behavior for any case that didn't throw an error before.) Per complaint from Joel Jacobson. Back-patch to all supported branches. Discussion: https://postgr.es/m/163099.1608312033@sss.pgh.pa.us
* Check parallel safety in generate_useful_gather_pathsTomas Vondra2020-12-21
| | | | | | | | | | | | | | | | | | Commit ebb7ae839d ensured we ignore pathkeys with volatile expressions when considering adding a sort below a Gather Merge. Turns out we need to care about parallel safety of the pathkeys too, otherwise we might try sorting e.g. on results of a correlated subquery (as demonstrated by a report from Luis Roberto). Initial investigation by Tom Lane, patch by James Coleman. Backpatch to 13, where the code was instroduced (as part of Incremental Sort). Reported-by: Luis Roberto Author: James Coleman Reviewed-by: Tomas Vondra Backpatch-through: 13 Discussion: https://postgr.es/m/622580997.37108180.1604080457319.JavaMail.zimbra%40siscobra.com.br Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
* Consider unsorted paths in generate_useful_gather_pathsTomas Vondra2020-12-21
| | | | | | | | | | | | | | generate_useful_gather_paths used to skip unsorted paths (without any pathkeys), but that is unnecessary - the later code actually can handle such paths just fine by adding a Sort node. This is clearly a thinko, preventing construction of useful plans. Backpatch to 13, where Incremental Sort was introduced. Author: James Coleman Reviewed-by: Tomas Vondra Backpatch-through: 13 Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
* Avoid memcpy() with same source and destination during relmapper init.Tom Lane2020-12-18
| | | | | | | | | | | | | | | | | A narrow reading of the C standard says that memcpy(x,x,n) is undefined, although it's hard to envision an implementation that would really misbehave. However, analysis tools such as valgrind might whine about this; accordingly, let's band-aid relmapper.c to not do it. See also 5b630501e, d3f4e8a8a, ad7b48ea0, and other similar fixes. Apparently, none of those folk tried valgrinding initdb? This has been like this for long enough that I'm surprised it hasn't been reported before. Back-patch, just in case anybody wants to use a back branch on a platform that complains about this; we back-patched those earlier fixes too. Discussion: https://postgr.es/m/161790.1608310142@sss.pgh.pa.us
* Revert "Cannot use WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE."Jeff Davis2020-12-14
| | | | | | | | | | | | | This reverts commit 3a9e64aa0d96c8ffb6c682b082d0f72b1d373327. Commit 4bad60e3 fixed the root of the problem that 3a9e64aa worked around. This enables proper pipelining of commands after terminating replication, eliminating an undocumented limitation. Discussion: https://postgr.es/m/3d57bc29-4459-578b-79cb-7641baf53c57%40iki.fi Backpatch-through: 9.5
* initdb: complete getopt_long alphabetizationBruce Momjian2020-12-12
| | | | Backpatch-through: 9.5
* initdb: properly alphabetize getopt_long options in C stringBruce Momjian2020-12-12
| | | | Backpatch-through: 9.5