aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* lwlock: Fix quadratic behavior with very long wait listsAndres Freund2024-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now LWLockDequeueSelf() sequentially searched the list of waiters to see if the current proc is still is on the list of waiters, or has already been removed. In extreme workloads, where the wait lists are very long, this leads to a quadratic behavior. #backends iterating over a list #backends long. Additionally, the likelihood of needing to call LWLockDequeueSelf() in the first place also increases with the increased length of the wait queue, as it becomes more likely that a lock is released while waiting for the wait list lock, which is held for longer during lock release. Due to the exponential back-off in perform_spin_delay() this is surprisingly hard to detect. We should make that easier, e.g. by adding a wait event around the pg_usleep() - but that's a separate patch. The fix is simple - track whether a proc is currently waiting in the wait list or already removed but waiting to be woken up in PGPROC->lwWaiting. In some workloads with a lot of clients contending for a small number of lwlocks (e.g. WALWriteLock), the fix can substantially increase throughput. This has been originally fixed for 16~ with a4adc31f6902 without a backpatch, and we have heard complaints from users impacted by this quadratic behavior in older versions as well. Author: Andres Freund <andres@anarazel.de> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/20221027165914.2hofzp4cvutj6gin@awork3.anarazel.de Discussion: https://postgr.es/m/CALj2ACXktNbG=K8Xi7PSqbofTZozavhaxjatVc14iYaLu4Maag@mail.gmail.com Backpatch-through: 12
* Close socket in case of errors in setting non-blockingDaniel Gustafsson2024-01-17
| | | | | | | | | | | | | If configuring the newly created socket non-blocking fails we error out and return INVALID_SOCKET, but the socket that had been created wasn't closed. Fix by issuing closesocket in the errorpath. Backpatch to all supported branches. Author: Ranier Vilela <ranier.vf@gmail.com> Discussion: https://postgr.es/m/CAEudQApmU5CrKefH85VbNYE2y8H=-qqEJbg6RAPU65+vCe+89A@mail.gmail.com Backpatch-through: v12
* Re-pgindent catcache.c after previous commit.Tom Lane2024-01-13
| | | | | Discussion: https://postgr.es/m/1393953.1698353013@sss.pgh.pa.us Discussion: https://postgr.es/m/CAGjhLkOoBEC9mLsnB42d3CO1vcMx71MLSEuigeABbQ8oRdA6gw@mail.gmail.com
* Cope with catcache entries becoming stale during detoasting.Tom Lane2024-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've long had a policy that any toasted fields in a catalog tuple should be pulled in-line before entering the tuple in a catalog cache. However, that requires access to the catalog's toast table, and we'll typically do AcceptInvalidationMessages while opening the toast table. So it's possible that the catalog tuple is outdated by the time we finish detoasting it. Since no cache entry exists yet, we can't mark the entry stale during AcceptInvalidationMessages, and instead we'll press forward and build an apparently-valid cache entry. The upshot is that we have a race condition whereby an out-of-date entry could be made in a backend's catalog cache, and persist there indefinitely causing indeterminate misbehavior. To fix, use the existing systable_recheck_tuple code to recheck whether the catalog tuple is still up-to-date after we finish detoasting it. If not, loop around and restart the process of searching the catalog and constructing cache entries from the top. The case is rare enough that this shouldn't create any meaningful performance penalty, even in the SearchCatCacheList case where we need to tear down and reconstruct the whole list. Indeed, the case is so rare that AFAICT it doesn't occur during our regression tests, and there doesn't seem to be any easy way to build a test that would exercise it reliably. To allow testing of the retry code paths, add logic (in USE_ASSERT_CHECKING builds only) that randomly pretends that the recheck failed about one time out of a thousand. This is enough to ensure that we'll pass through the retry paths during most regression test runs. By adding an extra level of looping, this commit creates a need to reindent most of SearchCatCacheMiss and SearchCatCacheList. I'll do that separately, to allow putting those changes in .git-blame-ignore-revs. Patch by me; thanks to Alexander Lakhin for having built a test case to prove the bug is real, and to Xiaoran Wang for review. Back-patch to all supported branches. Discussion: https://postgr.es/m/1393953.1698353013@sss.pgh.pa.us Discussion: https://postgr.es/m/CAGjhLkOoBEC9mLsnB42d3CO1vcMx71MLSEuigeABbQ8oRdA6gw@mail.gmail.com
* pg_regress: Disable autoruns for cmd.exe on WindowsMichael Paquier2024-01-12
| | | | | | | | | | | | | | This is similar to 9886744a361b, to prevent the execution of other programs due to autorun configurations which could influence the postmaster startup. This was originally applied on HEAD as of 83c75ac7fb69 without a backpatch, but the patch has survived CI and buildfarm cycles. I have checked that cmd /d exists down to Windows XP, which should make this change work correctly in the oldest branches still supported. Discussion: https://postgr.es/m/20230922.161551.320043332510268554.horikyota.ntt@gmail.com Backpatch-through: 12
* pg_ctl: Disable autoruns for cmd.exe on WindowsMichael Paquier2024-01-12
| | | | | | | | | | | | | | | | | | | On Windows, cmd.exe is used to launch the postmaster process to ease its redirection setup. However, cmd.exe may execute other programs at startup due to autorun configurations, which could influence the postmaster startup. This patch adds /D flag to the launcher cmd.exe command line to disable autorun settings written in the registry. This was originally applied on HEAD as of 9886744a361b without a backpatch, but the patch has survived CI and buildfarm cycles. I have checked that cmd /d exists down to Windows XP, which should make this change work correctly in the oldest branches still supported. Reported-by: Hayato Kuroda Author: Kyotaro Horiguchi Reviewed-by: Robert Haas, Michael Paquier Discussion: https://postgr.es/m/20230922.161551.320043332510268554.horikyota.ntt@gmail.com Backpatch-through: 12
* Allow subquery pullup to wrap a PlaceHolderVar in another one.Tom Lane2024-01-11
| | | | | | | | | | | | | | | | | | | | | | | The code for wrapping subquery output expressions in PlaceHolderVars believed that if the expression already was a PlaceHolderVar, it was never necessary to wrap that in another one. That's wrong if the expression is underneath an outer join and involves a lateral reference to outside that scope: failing to add an additional PHV risks evaluating the expression at the wrong place and hence not forcing it to null when the outer join should do so. This is an oversight in commit 9e7e29c75, which added logic to forcibly wrap lateral-reference Vars in PlaceHolderVars, but didn't see that the adjacent case for PlaceHolderVars needed the same treatment. The test case we have for this doesn't fail before 4be058fe9, but now that I see the problem I wonder if it is possible to demonstrate related errors before that. That's moot though, since all such branches are out of support. Per bug #18284 from Holger Reise. Back-patch to all supported branches. Discussion: https://postgr.es/m/18284-47505a20c23647f8@postgresql.org
* Fix indentation in ExecParallelHashIncreaseNumBatches()Alexander Korotkov2024-01-08
| | | | Backpatch-through: 12
* Fix oversized memory allocation in Parallel Hash JoinAlexander Korotkov2024-01-07
| | | | | | | | | | | | During the calculations of the maximum for the number of buckets, take into account that later we round that to the next power of 2. Reported-by: Karen Talarico Bug: #16925 Discussion: https://postgr.es/m/16925-ec96d83529d0d629%40postgresql.org Author: Thomas Munro, Andrei Lepikhov, Alexander Korotkov Reviewed-by: Alena Rybakina Backpatch-through: 12
* Avoid masking EOF (no-password-supplied) conditions in auth.c.Tom Lane2024-01-03
| | | | | | | | | | | | | | | CheckPWChallengeAuth() would return STATUS_ERROR if the user does not exist or has no password assigned, even if the client disconnected without responding to the password challenge (as libpq often will, for example). We should return STATUS_EOF in that case, and the lower-level functions do, but this code level got it wrong since the refactoring done in 7ac955b34. This breaks the intent of not logging anything for EOF cases (cf. comments in auth_failed()) and might also confuse users of ClientAuthentication_hook. Per report from Liu Lang. Back-patch to all supported versions. Discussion: https://postgr.es/m/b725238c-539d-cb09-2bff-b5e6cb2c069c@esgyn.cn
* In pg_dump, don't dump a stats object unless dumping underlying table.Tom Lane2023-12-29
| | | | | | | | | | | | | | | | If the underlying table isn't being dumped, it's useless to dump an extended statistics object; it'll just cause errors at restore. We have always applied similar policies to, say, indexes. (When and if we get cross-table stats objects, it might be profitable to think a little harder about what to do with them. But for now there seems no point in considering a stats object as anything but an appendage of its table.) Rian McGuire and Tom Lane, per report from Rian McGuire. Back-patch to supported branches. Discussion: https://postgr.es/m/7075d3aa-3f05-44a5-b68f-47dc6a8a0550@buildkite.com
* Fix failure to verify PGC_[SU_]BACKEND GUCs in pg_file_settings view.Tom Lane2023-12-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set_config_option() bails out early if it detects that the option to be set is PGC_BACKEND or PGC_SU_BACKEND class and we're reading the config file in a postmaster child; we don't want to apply any new value in such a case. That's fine as far as it goes, but it fails to consider the requirements of the pg_file_settings view: for that, we need to check validity of the value even though we have no intention to apply it. Because we didn't, even very silly values for affected GUCs would be reported as valid by the view. There are only half a dozen such GUCs, which perhaps explains why this got overlooked for so long. Fix by continuing when changeVal is false; this parallels the logic in some other early-exit paths. Also, the check added by commit 924bcf4f1 to prevent GUC changes in parallel workers seems a few bricks shy of a load: it's evidently assuming that ereport(elevel, ...) won't return. Make sure we bail out if it does. The lack of trouble reports suggests that this is only a latent bug, i.e. parallel workers don't actually reach here with elevel < ERROR. (Per the code coverage report, we never reach here at all in the regression suite.) But we clearly don't want to risk proceeding if that does happen. Per report from Rıdvan Korkmaz. These are ancient bugs, so back-patch to all supported branches. Discussion: https://postgr.es/m/2089235.1703617353@sss.pgh.pa.us
* Hide warnings from Python headers when using gcc-compatible compiler.Tom Lane2023-12-26
| | | | | | | | | | | | | | | | | | | | Like commit 388e80132, use "#pragma GCC system_header" to silence warnings appearing within the Python headers, since newer Python versions no longer worry about some restrictions we still use like -Wdeclaration-after-statement. This patch improves on 388e80132 by inventing a separate wrapper header file, allowing the pragma to be tightly scoped to just the Python headers and not other stuff we have laying about in plpython.h. I applied the same technique to plperl for the same reason: the original patch suppressed warnings for a good deal of our own code, not only the Perl headers. Like the previous commit, back-patch to supported branches. Peter Eisentraut and Tom Lane Discussion: https://postgr.es/m/ae523163-6d2a-4b81-a875-832e48dec502@eisentraut.org
* Avoid trying to fetch metapage of an SPGist partitioned index.Tom Lane2023-12-21
| | | | | | | | | | | | | | | | | | | This is necessary when spgcanreturn() is invoked on a partitioned index, and the failure might be reachable in other scenarios as well. The rest of what spgGetCache() does is perfectly sensible for a partitioned index, so we should allow it to go through. I think the main takeaway from this is that we lack sufficient test coverage for non-btree partitioned indexes. Therefore, I added simple test cases for brin and gin as well as spgist (hash and gist AMs were covered already in indexing.sql). Per bug #18256 from Alexander Lakhin. Although the known test case only fails since v16 (3c569049b), I've got no faith at all that there aren't other ways to reach this problem; so back-patch to all supported branches. Discussion: https://postgr.es/m/18256-0b0e1b6e4a620f1b@postgresql.org
* Fix bugs in manipulation of large objects.Tom Lane2023-12-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In v16 and up (since commit afbfc0298), large object ownership checking has been broken because object_ownercheck() didn't take care of the discrepancy between our object-address representation of large objects (classId == LargeObjectRelationId) and the catalog where their ownership info is actually stored (LargeObjectMetadataRelationId). This resulted in failures such as "unrecognized class ID: 2613" when trying to update blob properties as a non-superuser. Poking around for related bugs, I found that AlterObjectOwner_internal would pass the wrong classId to the PostAlterHook in the no-op code path where the large object already has the desired owner. Also, recordExtObjInitPriv checked for the wrong classId; that bug is only latent because the stanza is dead code anyway, but as long as we're carrying it around it should be less wrong. These bugs are quite old. In HEAD, we can reduce the scope for future bugs of this ilk by changing AlterObjectOwner_internal's API to let the translation happen inside that function, rather than requiring callers to know about it. A more bulletproof fix, perhaps, would be to start using LargeObjectMetadataRelationId as the dependency and object-address classId for blobs. However that has substantial risk of breaking third-party code; even within our own code, it'd create hassles for pg_dump which would have to cope with a version-dependent representation. For now, keep the status quo. Discussion: https://postgr.es/m/2650449.1702497209@sss.pgh.pa.us
* Prevent tuples to be marked as dead in subtransactions on standbysMichael Paquier2023-12-12
| | | | | | | | | | | | | | | | | | | | | | | Dead tuples are ignored and are not marked as dead during recovery, as it can lead to MVCC issues on a standby because its xmin may not match with the primary. This information is tracked by a field called "xactStartedInRecovery" in the transaction state data, switched on when starting a transaction in recovery. Unfortunately, this information was not correctly tracked when starting a subtransaction, because the transaction state used for the subtransaction did not update "xactStartedInRecovery" based on the state of its parent. This would cause index scans done in subtransactions to return inconsistent data, depending on how the xmin of the primary and/or the standby evolved. This is broken since the introduction of hot standby in efc16ea52067, so backpatch all the way down. Author: Fei Changhong Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/tencent_C4D907A5093C071A029712E73B43C6512706@qq.com Backpatch-through: 12
* Fix typo in commentDaniel Gustafsson2023-12-12
| | | | | | | | Commit 98e675ed7af accidentally mistyped IDENTIFY_SYSTEM as IDENTIFY_SERVER. Backpatch to all supported branches. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/68138521-5345-8780-4390-1474afdcba1f@gmail.com
* Be more wary about OpenSSL not setting errno on error.Tom Lane2023-12-11
| | | | | | | | | | | | | | | | | | | | | | OpenSSL will sometimes return SSL_ERROR_SYSCALL without having set errno; this is apparently a reflection of recv(2)'s habit of not setting errno when reporting EOF. Ensure that we treat such cases the same as read EOF. Previously, we'd frequently report them like "could not accept SSL connection: Success" which is confusing, or worse report them with an unrelated errno left over from some previous syscall. To fix, ensure that errno is zeroed immediately before the call, and report its value only when it's not zero afterwards; otherwise report EOF. For consistency, I've applied the same coding pattern in libpq's pqsecure_raw_read(). Bare recv(2) shouldn't really return -1 without setting errno, but in case it does we might as well cope. Per report from Andres Freund. Back-patch to all supported versions. Discussion: https://postgr.es/m/20231208181451.deqnflwxqoehhxpe@awork3.anarazel.de
* jit: Create void type in the right contextDaniel Gustafsson2023-12-11
| | | | | | | | | | | | | | Commit 3b991f81c45 introduced a specific context for types such that all no longer referenced types can be dropped periodically rather than leaking. One void pointer type creation was however missed leading to an assertion failure in LLVM Debug builds. Per buildfarm members canebreak and urutu. Fix with assistance from Andres. The codepath in question was refactored in version 13 hence why this only affected version 12. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1106876.1700409912@sss.pgh.pa.us
* Fix an undetected deadlock due to apply worker.Amit Kapila2023-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The apply worker needs to update the state of the subscription tables to 'READY' during the synchronization phase which requires locking the corresponding subscription. The apply worker also waits for the subscription tables to reach the 'SYNCDONE' state after holding the locks on the subscription and the wait is done using WaitLatch. The 'SYNCDONE' state is changed by tablesync workers again by locking the corresponding subscription. Both the state updates use AccessShareLock mode to lock the subscription, so they can't block each other. However, a backend can simultaneously try to acquire a lock on the same subscription using AccessExclusiveLock mode to alter the subscription. Now, the backend's wait on a lock can sneak in between the apply worker and table sync worker causing deadlock. In other words, apply_worker waits for tablesync worker which waits for backend, and backend waits for apply worker. This is not detected by the deadlock detector because apply worker uses WaitLatch. The fix is to release existing locks in apply worker before it starts to wait for tablesync worker to change the state. Reported-by: Tomas Vondra Author: Shlok Kyal Reviewed-by: Amit Kapila, Peter Smith Backpatch-through: 12 Discussion: https://postgr.es/m/d291bb50-12c4-e8af-2af2-7bb9bb4d8e3e@enterprisedb.com
* Fix incorrect error message for IDENTIFY_SYSTEMDaniel Gustafsson2023-12-05
| | | | | | | | | | | | | | | | Commit 5a991ef8692e accidentally reversed the order of the tuples and fields parameters, making the error message incorrectly refer to 3 tuples with 1 field when IDENTIFY_SYSTEM returns 1 tuple and 3 or 4 fields. Fix by changing the order of the parameters. This also adds a comment describing why we check for < 3 when postgres since 9.4 has been sending 4 fields. Backpatch all the way since the bug is almost a decade old. Author: Tomonari Katsumata <t.katsumata1122@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Bug: #18224 Backpatch-through: v12
* Check collation when creating partitioned indexPeter Eisentraut2023-12-01
| | | | | | | | | | | | | | | | | When creating a partitioned index, the partition key must be a subset of the index's columns. But this currently doesn't check that the collations between the partition key and the index definition match. So you can construct a unique index that fails to enforce uniqueness. (This would most likely involve a nondeterministic collation, so it would have to be crafted explicitly and is not something that would just happen by accident.) This patch adds the required collation check. As a result, any previously allowed unique index that has a collation mismatch would no longer be allowed to be created. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/3327cb54-f7f1-413b-8fdb-7a9dceebb938%40eisentraut.org
* Use BIO_{get,set}_app_data instead of BIO_{get,set}_data.Tom Lane2023-11-28
| | | | | | | | | | | | | | | | | | | | | | We should have done it this way all along, but we accidentally got away with using the wrong BIO field up until OpenSSL 3.2. There, the library's BIO routines that we rely on use the "data" field for their own purposes, and our conflicting use causes assorted weird behaviors up to and including core dumps when SSL connections are attempted. Switch to using the approved field for the purpose, i.e. app_data. While at it, remove our configure probes for BIO_get_data as well as the fallback implementation. BIO_{get,set}_app_data have been there since long before any OpenSSL version that we still support, even in the back branches. Also, update src/test/ssl/t/001_ssltests.pl to allow for a minor change in an error message spelling that evidently came in with 3.2. Tristan Partin and Bo Andreson. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAN55FZ1eDDYsYaL7mv+oSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ@mail.gmail.com
* Fix assertions with RI triggers in heap_update and heap_delete.Heikki Linnakangas2023-11-28
| | | | | | | | | | | | If the tuple being updated is not visible to the crosscheck snapshot, we return TM_Updated but the assertions would not hold in that case. Move them to before the cross-check. Fixes bug #17893. Backpatch to all supported versions. Author: Alexander Lakhin Backpatch-through: 12 Discussion: https://www.postgresql.org/message-id/17893-35847009eec517b5%40postgresql.org
* Fix race condition with BIO methods initialization in libpq with threadsMichael Paquier2023-11-27
| | | | | | | | | | | | | | | | | | | | | | | The libpq code in charge of creating per-connection SSL objects was prone to a race condition when loading the custom BIO methods needed by my_SSL_set_fd(). As BIO methods are stored as a static variable, the initialization of a connection could fail because it could be possible to have one thread refer to my_bio_methods while it is being manipulated by a second concurrent thread. This error has been introduced by 8bb14cdd33de, that has removed ssl_config_mutex around the call of my_SSL_set_fd(), that itself sets the custom BIO methods used in libpq. Like previously, the BIO method initialization is now protected by the existing ssl_config_mutex, itself initialized earlier for WIN32. While on it, document that my_bio_methods is protected by ssl_config_mutex, as this can be easy to miss. Reported-by: Willi Mann Author: Willi Mann, Michael Paquier Discussion: https://postgr.es/m/e77abc4c-4d03-4058-a9d7-ef0035657e04@celonis.com Backpatch-through: 12
* Fix timing-dependent failure in GSSAPI data transmission.Tom Lane2023-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using GSSAPI encryption in non-blocking mode, libpq sometimes failed with "GSSAPI caller failed to retransmit all data needing to be retried". The cause is that pqPutMsgEnd rounds its transmit request down to an even multiple of 8K, and sometimes that can lead to not requesting a write of data that was requested to be written (but reported as not written) earlier. That can upset pg_GSS_write's logic for dealing with not-yet-written data, since it's possible the data in question had already been incorporated into an encrypted packet that we weren't able to send during the previous call. We could fix this with a one-or-two-line hack to disable pqPutMsgEnd's round-down behavior, but that seems like making the caller work around a behavior that pg_GSS_write shouldn't expose in this way. Instead, adjust pg_GSS_write to never report a partial write: it either reports a complete write, or reflects the failure of the lower-level pqsecure_raw_write call. The requirement still exists for the caller to present at least as much data as on the previous call, but with the caller-visible write start point not moving there is no temptation for it to present less. We lose some ability to reclaim buffer space early, but I doubt that that will make much difference in practice. This also gets rid of a rather dubious assumption that "any interesting failure condition (from pqsecure_raw_write) will recur on the next try". We've not seen failure reports traceable to that, but I've never trusted it particularly and am glad to remove it. Make the same adjustments to the equivalent backend routine be_gssapi_write(). It is probable that there's no bug on the backend side, since we don't have a notion of nonblock mode there; but we should keep the logic the same to ease future maintenance. Per bug #18210 from Lars Kanis. Back-patch to all supported branches. Discussion: https://postgr.es/m/18210-4c6d0b14627f2eb8@postgresql.org
* Fix query checking consistency of table amhandlers in opr_sanity.sqlMichael Paquier2023-11-22
| | | | | | | | | | | | As written, the query checked for an access method of type 's', which is not an AM type supported in the core code. Error introduced by 8586bf7ed888. As this query is not checking what it should, backpatch all the way down. Reviewed-by: Aleksander Alekseev Discussion: https://postgr.es/m/ZVxJkAJrKbfHETiy@paquier.xyz Backpatch-through: 12
* Lock table in DROP STATISTICSTomas Vondra2023-11-19
| | | | | | | | | | | | | | | | | | | | | The DROP STATISTICS code failed to properly lock the table, leading to ERROR: tuple concurrently deleted when executed concurrently with ANALYZE. Fixed by modifying RemoveStatisticsById() to acquire the same lock as ANALYZE. This function is called only by DROP STATISTICS, as ANALYZE calls RemoveStatisticsDataById() directly. Reported by Justin Pryzby, fix by me. Backpatch through 12. The code was like this since it was introduced in 10, but older releases are EOL. Reported-by: Justin Pryzby Reviewed-by: Tom Lane Backpatch-through: 12 Discussion: https://postgr.es/m/ZUuk-8CfbYeq6g_u@pryzbyj2023
* Guard against overflow in interval_mul() and interval_div().Dean Rasheed2023-11-18
| | | | | | | | | | | | | | | | | | | | Commits 146604ec43 and a898b409f6 added overflow checks to interval_mul(), but not to interval_div(), which contains almost identical code, and so is susceptible to the same kinds of overflows. In addition, those checks did not catch all possible overflow conditions. Add additional checks to the "cascade down" code in interval_mul(), and copy all the overflow checks over to the corresponding code in interval_div(), so that they both generate "interval out of range" errors, rather than returning bogus results. Given that these errors are relatively easy to hit, back-patch to all supported branches. Per bug #18200 from Alexander Lakhin, and subsequent investigation. Discussion: https://postgr.es/m/18200-5ea288c7b2d504b1%40postgresql.org
* llvmjit: Use explicit LLVMContextRef for inliningDaniel Gustafsson2023-11-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When performing inlining LLVM unfortunately "leaks" types (the types survive and are usable, but a new round of inlining will recreate new structurally equivalent types). This accumulation will over time amount to a memory leak which for some queries can be large enough to trigger the OOM process killer. To avoid accumulation of types, all IR related data is stored in an LLVMContextRef which is dropped and recreated in order to release all types. Dropping and recreating incurs overhead, so it will be done only after 100 queries. This is a heuristic which might be revisited, but until we can get the size of the context from LLVM we are flying a bit blind. This issue has been reported several times, there may be more references to it in the archives on top of the threads linked below. This is a backpatch of 9dce22033d5 to all supported branches. Reported-By: Justin Pryzby <pryzby@telsasoft.com> Reported-By: Kurt Roeckx <kurt@roeckx.be> Reported-By: Jaime Casanova <jcasanov@systemguards.com.ec> Reported-By: Lauri Laanmets <pcspets@gmail.com> Author: Andres Freund and Daniel Gustafsson Discussion: https://postgr.es/m/7acc8678-df5f-4923-9cf6-e843131ae89d@www.fastmail.com Discussion: https://postgr.es/m/20201218235607.GC30237@telsasoft.com Discussion: https://postgr.es/m/CAPH-tTxLf44s3CvUUtQpkDr1D8Hxqc2NGDzGXS1ODsfiJ6WSqA@mail.gmail.com Backpatch-through: v12
* Register llvm_shutdown using on_proc_exit, not before_shmem_exit.Daniel Gustafsson2023-11-17
| | | | | | | | | | | | | | | | | | | This seems more correct, because other before_shmem_exit calls may expect the infrastructure that is needed to run queries and access the database to be working, and also because this cleanup has nothing to do with shared memory. This is a back-patch of bab150045bd9. There were no known user-visible consequences to this, though, apart from what was previous fixed by commit 303640199d0 and back-patched as commit bcbc27251d35 and commit f7013683d9bb, so bab150045bd9 was not no back-patched at the time. Bharath Rupireddy Discussion: http://postgr.es/m/CALj2ACWk7j4F2v2fxxYfrroOF=AdFNPr1WsV+AGtHAFQOqm_pw@mail.gmail.com Backpatch-through: 13, 12
* Ensure we preprocess expressions before checking their volatility.Tom Lane2023-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | contain_mutable_functions and contain_volatile_functions give reliable answers only after expression preprocessing (specifically eval_const_expressions). Some places understand this, but some did not get the memo --- which is not entirely their fault, because the problem is documented only in places far away from those functions. Introduce wrapper functions that allow doing the right thing easily, and add commentary in hopes of preventing future mistakes from copy-and-paste of code that's only conditionally safe. Two actual bugs of this ilk are fixed here. We failed to preprocess column GENERATED expressions before checking mutability, so that the code could fail to detect the use of a volatile function default-argument expression, or it could reject a polymorphic function that is actually immutable on the datatype of interest. Likewise, column DEFAULT expressions weren't preprocessed before determining if it's safe to apply the attmissingval mechanism. A false negative would just result in an unnecessary table rewrite, but a false positive could allow the attmissingval mechanism to be used in a case where it should not be, resulting in unexpected initial values in a new column. In passing, re-order the steps in ComputePartitionAttrs so that its checks for invalid column references are done before applying expression_planner, rather than after. The previous coding would not complain if a partition expression contains a disallowed column reference that gets optimized away by constant folding, which seems to me to be a behavior we do not want. Per bug #18097 from Jim Keener. Back-patch to all supported versions. Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
* Fix fallback implementation for pg_atomic_test_set_flag().Nathan Bossart2023-11-15
| | | | | | | | | | | The fallback implementation of pg_atomic_test_set_flag() that uses atomic-exchange gives pg_atomic_exchange_u32_impl() an extra argument. This issue has been present since the introduction of the atomics API in commit b64d92f1a5. Reviewed-by: Andres Freund Discussion: https://postgr.es/m/20231114035439.GA1809032%40nathanxps13 Backpatch-through: 12
* Allow new role 'regress_dump_login_role' to log in under SSPI.Tom Lane2023-11-14
| | | | | Semi-blind attempt to fix a70f2a57f to work on Windows, along the same lines as 5253519b2. Per buildfarm.
* Don't try to dump RLS policies or security labels for extension objects.Tom Lane2023-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | checkExtensionMembership() set the DUMP_COMPONENT_SECLABEL and DUMP_COMPONENT_POLICY flags for extension member objects, even though we lack any infrastructure for tracking extensions' initial settings of these properties. This is not OK. The result was that a dump would always include commands to set these properties for extension objects that have them, with at least three negative consequences: 1. The restoring user might not have privilege to set these properties on these objects. 2. The properties might be incorrect/irrelevant for the version of the extension that's installed in the destination database. 3. The dump itself might fail, in the case of RLS properties attached to extension tables that the dumping user lacks privilege to LOCK. (That's because we must get at least AccessShareLock to ensure that we don't fail while trying to decompile the RLS expressions.) When and if somebody cares to invent initial-state infrastructure for extensions' RLS policies and security labels, we could think about finding another way around problem #3. But in the absence of such infrastructure, this whole thing is just wrong and we shouldn't do it. (Note: this applies only to ordinary dumps; binary-upgrade dumps still dump and restore extension member objects separately, with all properties.) Tom Lane and Jacob Champion. Back-patch to all supported branches. Discussion: https://postgr.es/m/00d46a48-3324-d9a0-49bf-e7f0f11d1038@timescale.com
* Don't release index root page pin in ginFindParents().Tom Lane2023-11-13
| | | | | | | | | | | | | | | | | | | | | It's clearly stated in the comments that ginFindParents() must keep the pin on the index's root page that's associated with the topmost GinBtreeStack item. However, the code path for the case that the desired downlink has been pushed down to the next index level ignored this proviso, and would release the pin anyway if we were still examining the root level. That led to an assertion failure or "buffer NNNN is not owned by resource owner" error later, when we try to release the pin again at the end of the insertion. This is quite hard to reproduce, since it can only happen if an index root page split occurs concurrently with our own insertion. Thanks to Jeff Janes for finding a test case that triggers it often enough to allow investigation. This has been there since the beginning of GIN, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAMkU=1yCAKtv86dMrD__Ja-7KzjE=uMeKX8y__cx5W-OEWy2ow@mail.gmail.com
* Remove incorrect file reference in comment.Etsuro Fujita2023-11-13
| | | | | | | | | | | | | | | Commit b7eda3e0e moved XidInMVCCSnapshot() from tqual.c into snapmgr.c, but follow-up commit c91560def incorrectly updated this reference. We could fix it, but as pointed out by Daniel Gustafsson, 1) the reader can easily find the file that contains the definition of that function, e.g. by grepping, and 2) this kind of reference is prone to going stale; so let's just remove it. Back-patch to all supported branches. Reviewed by Daniel Gustafsson. Discussion: https://postgr.es/m/CAPmGK145VdKkPBLWS2urwhgsfidbSexwY-9zCL6xSUJH%2BBTUUg%40mail.gmail.com
* Ensure we use the correct spelling of "ensure"David Rowley2023-11-10
| | | | | | | | | We seem to have accidentally used "insure" in a few places. Correct that. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv0biqrhA3pMhu40aDsj343mTsD75khKnHsLqR8P04f=Q@mail.gmail.com Backpatch-through: 12, oldest supported version
* Fix corner-case 64-bit integer subtraction bug on some platforms.Dean Rasheed2023-11-09
| | | | | | | | | | | | | | When computing "0 - INT64_MIN", most platforms would report an overflow error, which is correct. However, platforms without integer overflow builtins or 128-bit integers would fail to spot the overflow, and incorrectly return INT64_MIN. Back-patch to all supported branches. Patch be me. Thanks to Jian He for initial investigation, and Laurenz Albe and Tom Lane for review. Discussion: https://postgr.es/m/CAEZATCUNK-AZSD0jVdgkk0N%3DNcAXBWeAEX-QU9AnJPensikmdQ%40mail.gmail.com
* Stamp 12.17.REL_12_17Tom Lane2023-11-06
|
* Detect integer overflow while computing new array dimensions.Tom Lane2023-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | array_set_element() and related functions allow an array to be enlarged by assigning to subscripts outside the current array bounds. While these places were careful to check that the new bounds are allowable, they neglected to consider the risk of integer overflow in computing the new bounds. In edge cases, we could compute new bounds that are invalid but get past the subsequent checks, allowing bad things to happen. Memory stomps that are potentially exploitable for arbitrary code execution are possible, and so is disclosure of server memory. To fix, perform the hazardous computations using overflow-detecting arithmetic routines, which fortunately exist in all still-supported branches. The test cases added for this generate (after patching) errors that mention the value of MaxArraySize, which is platform-dependent. Rather than introduce multiple expected-files, use psql's VERBOSITY parameter to suppress the printing of the message text. v11 psql lacks that parameter, so omit the tests in that branch. Our thanks to Pedro Gallegos for reporting this problem. Security: CVE-2023-5869
* Compute aggregate argument types correctly in transformAggregateCall().Tom Lane2023-11-06
| | | | | | | | | | | | | | | | | | | | | | | | transformAggregateCall() captures the datatypes of the aggregate's arguments immediately to construct the Aggref.aggargtypes list. This seems reasonable because the arguments have already been transformed --- but there is an edge case where they haven't been. Specifically, if we have an unknown-type literal in an ANY argument position, nothing will have been done with it earlier. But if we also have DISTINCT, then addTargetToGroupList() converts the literal to "text" type, resulting in the aggargtypes list not matching the actual runtime type of the argument. The end result is that the aggregate tries to interpret a "text" value as being of type "unknown", that is a zero-terminated C string. If the text value contains no zero bytes, this could result in disclosure of server memory following the text literal value. To fix, move the collection of the aggargtypes list to the end of transformAggregateCall(), after DISTINCT has been handled. This requires slightly more code, but not a great deal. Our thanks to Jingzhou Fu for reporting this problem. Security: CVE-2023-5868
* Set GUC "is_superuser" in all processes that set AuthenticatedUserId.Noah Misch2023-11-06
| | | | | | | | | It was always false in single-user mode, in autovacuum workers, and in background workers. This had no specifically-identified security consequences, but non-core code or future work might make it security-relevant. Back-patch to v11 (all supported versions). Jelte Fennema-Nio. Reported by Jelte Fennema-Nio.
* Ban role pg_signal_backend from more superuser backend types.Noah Misch2023-11-06
| | | | | | | | | | | | | | | | | | | Documentation says it cannot signal "a backend owned by a superuser". On the contrary, it could signal background workers, including the logical replication launcher. It could signal autovacuum workers and the autovacuum launcher. Block all that. Signaling autovacuum workers and those two launchers doesn't stall progress beyond what one could achieve other ways. If a cluster uses a non-core extension with a background worker that does not auto-restart, this could create a denial of service with respect to that background worker. A background worker with bugs in its code for responding to terminations or cancellations could experience those bugs at a time the pg_signal_backend member chooses. Back-patch to v11 (all supported versions). Reviewed by Jelte Fennema-Nio. Reported by Hemanth Sandrana and Mahendrakar Srinivasarao. Security: CVE-2023-5870
* Translation updatesPeter Eisentraut2023-11-06
| | | | | Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: db060e1afcf150db436cc05807372480754013e5
* doc: \copy can get data values \. and end-of-input confusedBruce Momjian2023-11-03
| | | | | | | | Reported-by: Svante Richter Discussion: https://postgr.es/m/fcd57e4-8f23-4c3e-a5db-2571d09208e2@beta.fastmail.com Backpatch-through: 11
* pg_upgrade: Add missing newline to messagePeter Eisentraut2023-11-03
| | | | | This was the backport of 2e3dc8c148, but in older releases the newline must be in the message.
* Be more wary about NULL values for GUC string variables.Tom Lane2023-11-02
| | | | | | | | | | | | | get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN has a NULL boot_val. Nosing around found a couple of other places that seemed insufficiently cautious about NULL string values, although those are likely unreachable in practice. Add some commentary defining the expectations for NULL values of string variables, in hopes of forestalling future additions of more such bugs. Xing Guo, Aleksander Alekseev, Tom Lane Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
* doc: 1-byte varlena headers can be used for user PLAIN storageBruce Momjian2023-10-31
| | | | | | | | | | | | This also updates some C comments. Reported-by: suchithjn22@gmail.com Discussion: https://postgr.es/m/167336599095.2667301.15497893107226841625@wrigleys.postgresql.org Author: Laurenz Albe (doc patch) Backpatch-through: 11
* Diagnose !indisvalid in more SQL functions.Noah Misch2023-10-30
| | | | | | | | | | | | | pgstatindex failed with ERRCODE_DATA_CORRUPTED, of the "can't-happen" class XX. The other functions succeeded on an empty index; they might have malfunctioned if the failed index build left torn I/O or other complex state. Report an ERROR in statistics functions pgstatindex, pgstatginindex, pgstathashindex, and pgstattuple. Report DEBUG1 and skip all index I/O in maintenance functions brin_desummarize_range, brin_summarize_new_values, brin_summarize_range, and gin_clean_pending_list. Back-patch to v11 (all supported versions). Discussion: https://postgr.es/m/20231001195309.a3@google.com