aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
* Remove most msys special processing in TAP testsAndrew Dunstan2022-02-20
| | | | | | | | | | Following migration of Windows buildfarm members running TAP tests to use of ucrt64 perl for those tests, special processing for msys perl is no longer necessary and so is removed. Backpatch to release 10 Discussion: https://postgr.es/m/c65a8781-77ac-ea95-d185-6db291e1baeb@dunslane.net
* Remove PostgreSQL::Test::Utils::perl2host completelyAndrew Dunstan2022-02-20
| | | | | | | | | | | Commit f1ac4a74de disabled this processing, and as nothing has broken (as expected) here we proceed to remove the routine and adjust all the call sites. Backpatch to release 10 Discussion: https://postgr.es/m/0ba775a2-8aa0-0d56-d780-69427cf6f33d@dunslane.net Discussion: https://postgr.es/m/20220125023609.5ohu3nslxgoygihl@alap3.anarazel.de
* Suppress warning about stack_base_ptr with late-model GCC.Tom Lane2022-02-17
| | | | | | | | | | | | | | | | | | | | GCC 12 complains that set_stack_base is storing the address of a local variable in a long-lived pointer. This is an entirely reasonable warning (indeed, it just helped us find a bug); but that behavior is intentional here. We can work around it by using __builtin_frame_address(0) instead of a specific local variable; that produces an address a dozen or so bytes different, in my testing, but we don't care about such a small difference. Maybe someday a compiler lacking that function will start to issue a similar warning, but we'll worry about that when it happens. Patch by me, per a suggestion from Andres Freund. Back-patch to v12, which is as far back as the patch will go without some pain. (Recently-established project policy would permit a back-patch as far as 9.2, but I'm disinclined to expend the work until GCC 12 is much more widespread.) Discussion: https://postgr.es/m/3773792.1645141467@sss.pgh.pa.us
* WAL log unchanged toasted replica identity key attributes.Amit Kapila2022-02-14
| | | | | | | | | | | | | | | Currently, during UPDATE, the unchanged replica identity key attributes are not logged separately because they are getting logged as part of the new tuple. But if they are stored externally then the untoasted values are not getting logged as part of the new tuple and logical replication won't be able to replicate such UPDATEs. So we need to log such attributes as part of the old_key_tuple during UPDATE. Reported-by: Haiying Tang Author: Dilip Kumar and Amit Kapila Reviewed-by: Alvaro Herrera, Haiying Tang, Andres Freund Backpatch-through: 10 Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
* Fix memory leak in IndexScan node with reorderingAlexander Korotkov2022-02-14
| | | | | | | | | | Fix ExecReScanIndexScan() to free the referenced tuples while emptying the priority queue. Backpatch to all supported versions. Discussion: https://postgr.es/m/CAHqSB9gECMENBQmpbv5rvmT3HTaORmMK3Ukg73DsX5H7EJV7jw%40mail.gmail.com Author: Aliaksandr Kalenik Reviewed-by: Tom Lane, Alexander Korotkov Backpatch-through: 10
* Fix thinko in PQisBusy().Tom Lane2022-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 1f39a1c06 I made PQisBusy consider conn->write_failed, but that is now looking like complete brain fade. In the first place, the logic is quite wrong: it ought to be like "and not" rather than "or". This meant that once we'd gotten into a write_failed state, PQisBusy would always return true, probably causing the calling application to iterate its loop until PQconsumeInput returns a hard failure thanks to connection loss. That's not what we want: the intended behavior is to return an error PGresult, which the application probably has much cleaner support for. But in the second place, checking write_failed here seems like the wrong thing anyway. The idea of the write_failed mechanism is to postpone handling of a write failure until we've read all we can from the server; so that flag should not interfere with input-processing behavior. (Compare 7247e243a.) What we *should* check for is status = CONNECTION_BAD, ie, socket already closed. (Most places that close the socket don't touch asyncStatus, but they do reset status.) This primarily ensures that if PQisBusy() returns true then there is an open socket, which is assumed by several call sites in our own code, and probably other applications too. While at it, fix a nearby thinko in libpq's my_sock_write: we should only consult errno for res < 0, not res == 0. This is harmless since pqsecure_raw_write would force errno to zero in such a case, but it still could confuse readers. Noted by Andres Freund. Backpatch to v12 where 1f39a1c06 came in. Discussion: https://postgr.es/m/20220211011025.ek7exh6owpzjyudn@alap3.anarazel.de
* Don't use_physical_tlist for an IOS with non-returnable columns.Tom Lane2022-02-11
| | | | | | | | | | | | | | | | | createplan.c tries to save a runtime projection step by specifying a scan plan node's output as being exactly the table's columns, or index's columns in the case of an index-only scan, if there is not a reason to do otherwise. This logic did not previously pay attention to whether an index's columns are returnable. That worked, sort of accidentally, until commit 9a3ddeb51 taught setrefs.c to reject plans that try to read a non-returnable column. I have no desire to loosen setrefs.c's new check, so instead adjust use_physical_tlist() to not try to optimize this way when there are non-returnable column(s). Per report from Ryan Kelly. Like the previous patch, back-patch to all supported branches. Discussion: https://postgr.es/m/CAHUie24ddN+pDNw7fkhNrjrwAX=fXXfGZZEHhRuofV_N_ftaSg@mail.gmail.com
* Make pg_ctl stop/restart/promote recheck postmaster aliveness.Tom Lane2022-02-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "pg_ctl stop/restart" checked that the postmaster PID is valid just once, as a side-effect of sending the stop signal, and then would wait-till-timeout for the postmaster.pid file to go away. This neglects the case wherein the postmaster dies uncleanly after we signal it. Similarly, once "pg_ctl promote" has sent the signal, it'd wait for the corresponding on-disk state change to occur even if the postmaster dies. I'm not sure how we've managed not to notice this problem, but it seems to explain slow execution of the 017_shm.pl test script on AIX since commit 4fdbf9af5, which added a speculative "pg_ctl stop" with the idea of making real sure that the postmaster isn't there. In the test steps that kill-9 and then restart the postmaster, it's possible to get past the initial signal attempt before kill() stops working for the doomed postmaster. If that happens, pg_ctl waited till PGCTLTIMEOUT before giving up ... and the buildfarm's AIX members have that set very high. To fix, include a "kill(pid, 0)" test (similar to what postmaster_is_alive uses) in these wait loops, so that we'll give up immediately if the postmaster PID disappears. While here, I chose to refactor those loops out of where they were. do_stop() and do_restart() can perfectly well share one copy of the wait-for-stop loop, and it seems desirable to put a similar function beside that for wait-for-promote. Back-patch to all supported versions, since pg_ctl's wait logic is substantially identical in all, and we're seeing the slow test behavior in all branches. Discussion: https://postgr.es/m/20220210023537.GA3222837@rfd.leadboat.com
* Use gendef instead of pexports for building windows .def filesAndrew Dunstan2022-02-10
| | | | | | | | | Modern msys systems lack pexports but have gendef instead, so use that. Discussion: https://postgr.es/m/3ccde7a9-e4f9-e194-30e0-0936e6ad68ba@dunslane.net Backpatch to release 9.4 to enable building with perl on older branches. Before that pexports is not used for plperl.
* Make timeout.c more robust against missed timer interrupts.Tom Lane2022-02-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 09cf1d522 taught schedule_alarm() to not do anything if the next requested event is after when we expect the next interrupt to fire. However, if somehow an interrupt gets lost, we'll continue to not do anything indefinitely, even after the "next interrupt" time is obviously in the past. Thus, one missed interrupt can break timeout scheduling for the life of the session. Michael Harris reported a scenario where a bug in a user-defined function caused this to happen, so you don't even need to assume kernel bugs exist to think this is worth fixing. We can make things more robust at little cost by detecting the case where signal_due_at is before "now" and forcing a new setitimer call to occur. This isn't a completely bulletproof fix of course; but in our typical usage pattern where we frequently set timeouts and clear them before they are reached, the interrupt will get re-enabled after at most one timeout interval, which with a little luck will be before we really need it. While here, let's mark signal_due_at as volatile, since the signal handler can both examine and set it. I'm not sure there's any actual risk given that signal_pending is already volatile, but it's surely questionable. Backpatch to v14 where this logic came in. Michael Harris and Tom Lane Discussion: https://postgr.es/m/CADofcAWbMrvgwSMqO4iG_iD3E2v8ZUrC-_crB41my=VMM02-CA@mail.gmail.com
* Set SNI ClientHello extension to localhost in testsDaniel Gustafsson2022-02-10
| | | | | | | | | | | | | | | | | | | | | | The connection strings in the SSL client tests were using the host set up from Cluster.pm which is a temporary pathname. When SNI is enabled we pass the host to OpenSSL in order to set the server name indication ClientHello extension via SSL_set_tlsext_host_name. OpenSSL doesn't validate the hostname apart from checking the max length, but LibreSSL checks for RFC 5890 conformance which results in errors during testing as the pathname from Cluster.pm is not a valid hostname. Fix by setting the host explicitly to localhost, as that's closer to the intent of the test. Backpatch through 14 where SNI support came in. Reported-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/17391-304f81bcf724b58b@postgresql.org Backpatch-through: 14
* Test honestly for <sys/signalfd.h>.Tom Lane2022-02-09
| | | | | | | | | | | | | | | Commit 6a2a70a02 supposed that any platform having <sys/epoll.h> would also have <sys/signalfd.h>. It turns out there are still a few people using platforms where that's not so, so we'd better make a separate configure probe for it. But since it took this long to notice, I'm content with the decision to not have a separate code path for epoll-only machines; we'll just fall back to using poll() for these stragglers. Per gripe from Gabriela Serventi. Back-patch to v14 where this code came in. Discussion: https://postgr.es/m/CAHOHWE-JjJDfcYuLAAEO7Jk07atFAU47z8TzHzg71gbC0aMy=g@mail.gmail.com
* Remove ppport.h's broken re-implementation of eval_pv().Tom Lane2022-02-08
| | | | | | | | | | | | | | | | | | Recent versions of Devel::PPPort try to redefine eval_pv() to dodge a bug in pre-5.31 Perl versions. Unfortunately the redefinition fails on compilers that don't support statements nested within expressions. However, we aren't actually interested in this bug fix, since we always call eval_pv() with croak_on_error = FALSE. So, until there's an upstream fix for this breakage, just comment out the macro to revert to the older behavior. Per report from Wei Sun, as well as previous buildfarm failure on pademelon (which I'd unfortunately not looked at carefully enough to understand the cause). Back-patch to all supported versions, since we're using the same ppport.h in all. Discussion: https://postgr.es/m/tencent_2EFCC8BA0107B6EC0F97179E019A8A43C806@qq.com Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pademelon&dt=2022-02-02%2001%3A22%3A58
* Translation updatesPeter Eisentraut2022-02-07
| | | | | Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 063c497a909612d444c7c7188482db9aef86200f
* Test, don't just Assert, that mergejoin's inputs are in order.Tom Lane2022-02-05
| | | | | | | | | | | | | | | | | There are two Asserts in nodeMergejoin.c that are reachable if the input data is not in the expected order. This seems way too fragile. Alexander Lakhin reported a case where the assertions could be triggered with misconfigured foreign-table partitions, and bitter experience with unstable operating system collation definitions suggests another easy route to hitting them. Neither Assert is in a place where we can't afford one more test-and-branch, so replace 'em with plain test-and-elog logic. Per bug #17395. While the reported symptom is relatively recent, collation changes could happen anytime, so back-patch to all supported branches. Discussion: https://postgr.es/m/17395-8c326292078d1a57@postgresql.org
* Fix compiler warning in non-assert builds, introduced in f862d57057f.Andres Freund2022-02-03
| | | | | Discussion: https://postgr.es/m/20220203183655.ralgkh54sdcgysmn@alap3.anarazel.de Backpatch: 14-, like f862d57057f
* Further fix for EvalPlanQual with mix of local and foreign partitions.Etsuro Fujita2022-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We assume that direct-modify ForeignScan nodes cannot be re-evaluated during EvalPlanQual processing, but the rework for inherited UPDATE/DELETE in commit 86dc90056 changed things, without considering that, so that such ForeignScan nodes get called as part of the EvalPlanQual subtree during EvalPlanQual processing in the case of an inherited UPDATE/DELETE where the inheritance set contains foreign target relations. To avoid re-evaluating such ForeignScan nodes during EvalPlanQual processing, commit c3928b467 modified nodeForeignscan.c, but the assumption made there that ExecForeignScan() should never be called for such ForeignScan nodes during EvalPlanQual processing turned out to be wrong in some cases, leading to a segmentation fault or a "cannot re-evaluate a Foreign Update or Delete during EvalPlanQual" error. Fix by modifying nodeForeignscan.c further to avoid re-evaluating such ForeignScan nodes even in ExecForeignScan()/ExecReScanForeignScan() during EvalPlanQual processing. Since this makes non-reachable the test-and-elog added to ForeignNext() by commit c3928b467 that produced the aforesaid error, convert the test-and-elog to an Assert. Per bug #17355 from Alexander Lakhin. Back-patch to v14 where both commits came in. Patch by me, reviewed and tested by Alexander Lakhin and Amit Langote. Discussion: https://postgr.es/m/17355-de8e362eb7001a96@postgresql.org
* Revert "plperl: Fix breakage of c89f409749c in back branches."Tom Lane2022-01-31
| | | | | | | This reverts commits d81cac47a et al. We shouldn't need that hack after the preceding commits. Discussion: https://postgr.es/m/20220131015130.shn6wr2fzuymerf6@alap3.anarazel.de
* plperl: update ppport.h to Perl 5.34.0.Tom Lane2022-01-31
| | | | | | | | | | | | | | | | Also apply the changes suggested by running perl ppport.h --compat-version=5.8.0 And remove some no-longer-required NEED_foo declarations. Dagfinn Ilmari Mannsåker Back-patch of commit 05798c9f7 into all supported branches. At the time we thought this update was mostly cosmetic, but the lack of it has caused trouble, while the patch itself hasn't. Discussion: https://postgr.es/m/87y278s6iq.fsf@wibble.ilmari.org Discussion: https://postgr.es/m/20220131015130.shn6wr2fzuymerf6@alap3.anarazel.de
* plperl: Fix breakage of c89f409749c in back branches.Andres Freund2022-01-30
| | | | | | | | | | | | | | ppport.h was only updated in 05798c9f7f0 (master). Unfortunately my commit c89f409749c uses PERL_VERSION_LT which came in with that update. Breaking most buildfarm animals. I should have noticed that... We might want to backpatch the ppport update instead, but for now lets get the buildfarm green again. Discussion: https://postgr.es/m/20220131015130.shn6wr2fzuymerf6@alap3.anarazel.de Backpatch: 10-14, master doesn't need it
* plperl: windows: Use Perl_setlocale on 5.28+, fixing compile failure.Andres Freund2022-01-30
| | | | | | | | | | | | | For older versions we need our own copy of perl's setlocale(), because it was not exposed (why we need the setlocale in the first place is explained in plperl_init_interp) . The copy stopped working in 5.28, as some of the used macros are not public anymore. But Perl_setlocale is available in 5.28, so use that. Author: Victor Wagner <vitus@wagner.pp.ru> Reviewed-By: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://postgr.es/m/20200501134711.08750c5f@antares.wagner.home Backpatch: all versions
* Fix failure to validate the result of select_common_type().Tom Lane2022-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although select_common_type() has a failure-return convention, an apparent successful return just provides a type OID that *might* work as a common supertype; we've not validated that the required casts actually exist. In the mainstream use-cases that doesn't matter, because we'll proceed to invoke coerce_to_common_type() on each input, which will fail appropriately if the proposed common type doesn't actually work. However, a few callers didn't read the (nonexistent) fine print, and thought that if they got back a nonzero OID then the coercions were sure to work. This affects in particular the recently-added "anycompatible" polymorphic types; we might think that a function/operator using such types matches cases it really doesn't. A likely end result of that is unexpected "ambiguous operator" errors, as for example in bug #17387 from James Inform. Another, much older, case is that the parser might try to transform an "x IN (list)" construct to a ScalarArrayOpExpr even when the list elements don't actually have a common supertype. It doesn't seem desirable to add more checking to select_common_type itself, as that'd just slow down the mainstream use-cases. Instead, write a separate function verify_common_type that performs the missing checks, and add a call to that where necessary. Likewise add verify_common_type_from_oids to go with select_common_type_from_oids. Back-patch to v13 where the "anycompatible" types came in. (The symptom complained of in bug #17387 doesn't appear till v14, but that's just because we didn't get around to converting || to use anycompatible till then.) In principle the "x IN (list)" fix could go back all the way, but I'm not currently convinced that it makes much difference in real-world cases, so I won't bother for now. Discussion: https://postgr.es/m/17387-5dfe54b988444963@postgresql.org
* Fix incorrect memory context switch in COPY TO executionMichael Paquier2022-01-29
| | | | | | | | | | | | | | | | | | | | c532d15 has split the logic of COPY commands into multiple files, one change being to move the internals of BeginCopy() to BeginCopyTo(). Originally the code was written so as we'd switch back-and-forth between the current execution memory context and the dedicated memory context for the COPY command, and this refactoring has introduced an extra switch to the current memory context from the COPY context once BeginCopyTo() is done with the past logic coming from BeginCopy(). The code was correctly doing the analyze, rewrite and planning phases in the COPY context, but it was not assigning "copy_file" (FILE* used when copying to a source file) and "filename" in the COPY context, making the COPY status data inconsistent. Author: Bharath Rupireddy Reviewed-by: Japin Li Discussion: https://postgr.es/m/CALj2ACWvVa69foi9jhHFY=2BuHxAoYboyE+vXQTARwxZcJnVrQ@mail.gmail.com Backpatch-through: 14
* Fix typo in comment.Etsuro Fujita2022-01-28
|
* Prevent memory context logging from sending log message to connected client.Fujii Masao2022-01-28
| | | | | | | | | | | | | | | | | | When pg_log_backend_memory_contexts() is executed, the target backend should use LOG_SERVER_ONLY to log its memory contexts, to prevent them from being sent to its connected client regardless of client_min_messages. But previously the backend unexpectedly used LOG to log the message "logging memory contexts of PID %d" and it could be sent to the client. This is a bug in memory context logging. To fix the bug, this commit changes that message so that it's logged with LOG_SERVER_ONLY. Back-patch to v14 where pg_log_backend_memory_contexts() was added. Author: Fujii Masao Reviewed-by: Bharath Rupireddy, Atsushi Torikoshi Discussion: https://postgr.es/m/82c12f36-86f7-5e72-79af-7f5c37f6cad7@oss.nttdata.com
* Fix ordering of XIDs in ProcArrayApplyRecoveryInfoTomas Vondra2022-01-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8431e296ea reworked ProcArrayApplyRecoveryInfo to sort XIDs before adding them to KnownAssignedXids. But the XIDs are sorted using xidComparator, which compares the XIDs simply as uint32 values, not logically. KnownAssignedXidsAdd() however expects XIDs in logical order, and calls TransactionIdFollowsOrEquals() to enforce that. If there are XIDs for which the two orderings disagree, an error is raised and the recovery fails/restarts. Hitting this issue is fairly easy - you just need two transactions, one started before the 4B limit (e.g. XID 4294967290), the other sometime after it (e.g. XID 1000). Logically (4294967290 <= 1000) but when compared using xidComparator we try to add them in the opposite order. Which makes KnownAssignedXidsAdd() fail with an error like this: ERROR: out-of-order XID insertion in KnownAssignedXids This only happens during replica startup, while processing RUNNING_XACTS records to build the snapshot. Once we reach STANDBY_SNAPSHOT_READY, we skip these records. So this does not affect already running replicas, but if you restart (or create) a replica while there are transactions with XIDs for which the two orderings disagree, you may hit this. Long-running transactions and frequent replica restarts increase the likelihood of hitting this issue. Once the replica gets into this state, it can't be started (even if the old transactions are terminated). Fixed by sorting the XIDs logically - this is fine because we're dealing with normal XIDs (because it's XIDs assigned to backends) and from the same wraparound epoch (otherwise the backends could not be running at the same time on the primary node). So there are no problems with the triangle inequality, which is why xidComparator compares raw values. Investigation and root cause analysis by Abhijit Menon-Sen. Patch by me. This issue is present in all releases since 9.4, however releases up to 9.6 are EOL already so backpatch to 10 only. Reviewed-by: Abhijit Menon-Sen Reviewed-by: Alvaro Herrera Backpatch-through: 10 Discussion: https://postgr.es/m/36b8a501-5d73-277c-4972-f58a4dce088a%40enterprisedb.com
* Improve msys2 detection for TAP testsAndrew Dunstan2022-01-27
| | | | | | | | Perl instances on some msys toolchains (e.g. UCRT64) have their configured osname set to 'MSWin32' rather than 'msys'. The test for the msys2 platform is adjusted accordingly. Backpatch to release 14.
* On sparc64+ext4, suppress test failures from known WAL read failure.Noah Misch2022-01-26
| | | | | | | | | | Buildfarm members kittiwake, tadarida and snapper began to fail frequently when commits 3cd9c3b921977272e6650a5efbeade4203c4bca2 and f47ed79cc8a0cfa154dc7f01faaf59822552363f added tests of concurrency, but the problem was reachable before those commits. Back-patch to v10 (all supported versions). Discussion: https://postgr.es/m/20220116210241.GC756210@rfd.leadboat.com
* Fix pg_hba_file_rules for authentication method certMagnus Hagander2022-01-26
| | | | | | | | | | | For authentication method cert, clientcert=verify-full is implied. But the pg_hba_file_rules entry would incorrectly show clientcert=verify-ca. Per bug #17354 Reported-By: Feike Steenbergen Reviewed-By: Jonathan Katz Backpatch-through: 12
* Revert "graceful shutdown" changes for Windows, in back branches only.Tom Lane2022-01-25
| | | | | | | | | | | | This reverts commits 6051857fc and ed52c3707, but only in the back branches. Further testing has shown that while those changes do fix some things, they also break others; in particular, it looks like walreceivers fail to detect walsender-initiated connection close reliably if the walsender shuts down this way. We'll keep trying to improve matters in HEAD, but it now seems unwise to push these changes into stable releases. Discussion: https://postgr.es/m/CA+hUKG+OeoETZQ=Qw5Ub5h3tmwQhBmDA=nuNO3KG=zWfUypFAw@mail.gmail.com
* Consider parallel awareness when removing single-child AppendsDavid Rowley2022-01-25
| | | | | | | | | | | | | | | | | 8edd0e794 added some code to remove Append and MergeAppend nodes when they contained a single child node. As it turned out, this was unsafe to do when the Append/MergeAppend was parallel_aware and the child node was not. Removing the Append/MergeAppend, in this case, could lead to the child plan being called multiple times by parallel workers when it was unsafe to do so. Here we fix this by just not removing the Append/MergeAppend when the parallel_aware flag of the parent and child node don't match. Reported-by: Yura Sokolov Bug: #17335 Discussion: https://postgr.es/m/b59605fecb20ba9ea94e70ab60098c237c870628.camel%40postgrespro.ru Backpatch-through: 12, where 8edd0e794 was first introduced
* Fix limitations on what SQL commands can be issued to a walsender.Tom Lane2022-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In logical replication mode, a WalSender is supposed to be able to execute any regular SQL command, as well as the special replication commands. Poor design of the replication-command parser caused it to fail in various cases, notably: * semicolons embedded in a command, or multiple SQL commands sent in a single message; * dollar-quoted literals containing odd numbers of single or double quote marks; * commands starting with a comment. The basic problem here is that we're trying to run repl_scanner.l across the entire input string even when it's not a replication command. Since repl_scanner.l does not understand all of the token types known to the core lexer, this is doomed to have failure modes. We certainly don't want to make repl_scanner.l as big as scan.l, so instead rejigger stuff so that we only lex the first token of a non-replication command. That will usually look like an IDENT to repl_scanner.l, though a comment would end up getting reported as a '-' or '/' single-character token. If the token is a replication command keyword, we push it back and proceed normally with repl_gram.y parsing. Otherwise, we can drop out of exec_replication_command() without examining the rest of the string. (It's still theoretically possible for repl_scanner.l to fail on the first token; but that could only happen if it's an unterminated single- or double-quoted string, in which case you'd have gotten largely the same error from the core lexer too.) In this way, repl_gram.y isn't involved at all in handling general SQL commands, so we can get rid of the SQLCmd node type. (In the back branches, we can't remove it because renumbering enum NodeTag would be an ABI break; so just leave it sit there unused.) I failed to resist the temptation to clean up some other sloppy coding in repl_scanner.l while at it. The only externally-visible behavior change from that is it now accepts \r and \f as whitespace, same as the core lexer. Per bug #17379 from Greg Rychlewski. Back-patch to all supported branches. Discussion: https://postgr.es/m/17379-6a5c6cfb3f1f5e77@postgresql.org
* Remember to reset yy_start state when firing up repl_scanner.l.Tom Lane2022-01-24
| | | | | | | | | | | | | | | Without this, we get odd behavior when the previous cycle of lexing exited in a non-default exclusive state. Every other copy of this code is aware that it has to do BEGIN(INITIAL), but repl_scanner.l did not get that memo. The real-world impact of this is probably limited, since most replication clients would abandon their connection after getting a syntax error. Still, it's a bug. This mistake is old, so back-patch to all supported branches. Discussion: https://postgr.es/m/1874781.1643035952@sss.pgh.pa.us
* pg_dump: avoid useless query in binary_upgrade_set_type_oids_by_type_oidTom Lane2022-01-23
| | | | | | | | | | | Commit 6df7a9698 wrote appendPQExpBuffer where it should have written printfPQExpBuffer. This resulted in re-issuing the previous query along with the desired one, which very accidentally had no negative consequences except for some wasted cycles. Back-patch to v14 where that came in. Discussion: https://postgr.es/m/1714711.1642962663@sss.pgh.pa.us
* Suppress variable-set-but-not-used warning from clang 13.Tom Lane2022-01-23
| | | | | | | | | | | | | | | | | In the normal configuration where GEQO_DEBUG isn't defined, recent clang versions have started to complain that geqo_main.c accumulates the edge_failures count but never does anything with it. As a minimal back-patchable fix, insert a void cast to silence this warning. (I'd speculated about ripping out the GEQO_DEBUG logic altogether, but I don't think we'd wish to back-patch that.) Per recently-established project policy, this is a candidate for back-patching into out-of-support branches: it suppresses an annoying compiler warning but changes no behavior. Hence, back-patch all the way to 9.2. Discussion: https://postgr.es/m/CA+hUKGLTSZQwES8VNPmWO9AO0wSeLt36OCPDAZTccT1h7Q7kTQ@mail.gmail.com
* Correct type of front_pathkey to PathKeyTomas Vondra2022-01-23
| | | | | | | | | | | | In sort_inner_and_outer we iterate a list of PathKey elements, but the variable is declared as (List *). This mistake is benign, because we only pass the pointer to lcons() and never dereference it. This exists since ~2004, but it's confusing. So fix and backpatch to all supported branches. Backpatch-through: 10 Discussion: https://postgr.es/m/bf3a6ea1-a7d8-7211-0669-189d5c169374%40enterprisedb.com
* Check syscache result in AlterStatisticsTomas Vondra2022-01-23
| | | | | | | | | | | | The syscache lookup may return NULL even for valid OID, for example due to a concurrent DROP STATISTICS, so a HeapTupleIsValid is necessary. Without it, it may fail with a segfault. Reported by Alexander Lakhin, patch by me. Backpatch to 13, where ALTER STATISTICS ... SET STATISTICS was introduced. Backpatch-through: 13 Discussion: https://postgr.es/m/17372-bf3b6e947e35ae77%40postgresql.org
* Flush table's relcache during ALTER TABLE ADD PRIMARY KEY USING INDEX.Tom Lane2022-01-22
| | | | | | | | | | | | | | | | | Previously, unless we had to add a NOT NULL constraint to the column, this command resulted in updating only the index's relcache entry. That's problematic when replication behavior is being driven off the existence of a primary key: other sessions (and ours too for that matter) failed to recalculate their opinion of whether the table can be replicated. Add a relcache invalidation to fix it. This has been broken since pg_class.relhaspkey was removed in v11. Before that, updating the table's relhaspkey value sufficed to cause a cache flush. Hence, backpatch to v11. Report and patch by Hou Zhijie Discussion: https://postgr.es/m/OS0PR01MB5716EBE01F112C62F8F9B786947B9@OS0PR01MB5716.jpnprd01.prod.outlook.com
* Fix race condition in gettext() initialization in libpq and ecpglib.Tom Lane2022-01-21
| | | | | | | | | | | | | | | | | | | | In libpq and ecpglib, multiple threads can concurrently enter the initialization logic for message localization. Since we set the its-done flag before actually doing the work, it'd be possible for some threads to reach gettext() before anyone has called bindtextdomain(). Barring bugs in libintl itself, this would not result in anything worse than failure to localize some early messages. Nonetheless, it's a bug, and an easy one to fix. Noted while investigating bug #17299 from Clemens Zeidler (much thanks to Liam Bowen for followup investigation on that). It currently appears that that actually *is* a bug in libintl itself, but that doesn't let us off the hook for this bit. Back-patch to all supported versions. Discussion: https://postgr.es/m/17299-7270741958c0b1ab@postgresql.org Discussion: https://postgr.es/m/CAE7q7Eit4Eq2=bxce=Fm8HAStECjaXUE=WBQc-sDDcgJQ7s7eg@mail.gmail.com
* fsync pg_logical/mappings in CheckPointLogicalRewriteHeap().Andres Freund2022-01-21
| | | | | | | | | | | While individual logical rewrite files were synced to disk, the directory was not. On some filesystems that could lead to loosing directory entries after a crash. Reported-By: Tom Lane <tgl@sss.pgh.pa.us> Author: Nathan Bossart <bossartn@amazon.com> Discussion: https://postgr.es/m/867F2E29-2782-4869-970E-B984C6D35A8F@amazon.com Backpatch: 10-
* Fix one-off bug causing missing commit timestamps for subtransactionsMichael Paquier2022-01-21
| | | | | | | | | | | | | | | | | | | The logic in charge of writing commit timestamps (enabled with track_commit_timestamp) for subtransactions had a one-bug bug, where it would be possible that commit timestamps go missing for the last subtransaction committed. While on it, simplify a bit the iteration logic in the loop writing the commit timestamps, as per suggestions from Kyotaro Horiguchi and Tom Lane, so as some variable initializations are not part of the loop itself. Issue introduced in 73c986a. Analyzed-by: Alex Kingsborough Author: Alex Kingsborough, Kyotaro Horiguchi Discussion: https://postgr.es/m/73A66172-4050-4F2A-B7F1-13508EDA2144@amazon.com Backpatch-through: 10
* Tighten TAP tests' tracking of postmaster state some more.Tom Lane2022-01-20
| | | | | | | | | | | | | | | | | | | Commits 6c4a8903b et al. had a couple of deficiencies: * The logic I added to Cluster::start to see if a PID file is present could be fooled by a stale PID file left over from a previous postmaster. To fix, if we're not sure whether we expect to find a running postmaster or not, validate the PID using "kill 0". * 017_shm.pl has a loop in which it just issues repeated Cluster::start calls; this will fail if some invocation fails but leaves self->_pid set. Per buildfarm results, the above fix is not enough to make this safe: we might have "validated" a PID for a postmaster that exits immediately after we look. Hence, match each failed start call with a stop call that will get us back to the self->_pid == undef state. Add a fail_ok option to Cluster::stop to make this work. Discussion: https://postgr.es/m/CA+hUKGKV6fOHvfiPt8=dOKzvswjAyLoFoJF1iQXMNpi7+hD1JQ@mail.gmail.com
* Allow clean.bat to be run from anywhereAndrew Dunstan2022-01-20
| | | | | | | | | | | This was omitted from c3879a7b4c which modified the other msvc .bat files. Per request from Juan José Santamaría Flecha Discussion: https://postgr.es/m/CAC+AXB0_fxYGbQoaYjCA8um7TTbOVP4L9aXnVmHwK8WzaT4gdA@mail.gmail.com Backpatch to all live branches.
* Try to stabilize reloptions test, again.Thomas Munro2022-01-20
| | | | | | | | | | Since the test requires reproducible behavior from VACUUM, and since DISABLE_PAGE_SKIPPING doesn't actually disable all forms of page skipping, let's use a temporary table to avoid contention. Back-patch to 12, like commit 3414099c. Discussion: https://postgr.es/m/20220120052404.sonrhq3f3qgplpzj%40alap3.anarazel.de
* TAP tests: check for postmaster.pid anyway when "pg_ctl start" fails.Tom Lane2022-01-19
| | | | | | | | | | | | "pg_ctl start" might start a new postmaster and then return failure anyway, for example if PGCTLTIMEOUT is exceeded. If there is a postmaster there, it's still incumbent on us to shut it down at script end, so check for the PID file even though we are about to fail. This has been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/647439.1642622744@sss.pgh.pa.us
* Try to stabilize the reloptions test.Thomas Munro2022-01-19
| | | | | | | | | | | Where we test vacuum_truncate's effects, sometimes this is failing to truncate as expected on the build farm. That could be explained by page skipping, so disable it explicitly, with the theory that commit fe246d1c didn't go far enough. Back-patch to 12, where the vacuum_truncate tests were added. Discussion: https://postgr.es/m/CA%2BhUKGLT2UL5_JhmBzUgkdyKfc%3D5J-gJSQJLysMs4rqLUKLAzw%40mail.gmail.com
* Fix psql \d's query for identifying parent triggers.Tom Lane2022-01-17
| | | | | | | | | | | | | | | | | | | | | The original coding (from c33869cc3) failed with "more than one row returned by a subquery used as an expression" if there were unrelated triggers of the same tgname on parent partitioned tables. (That's possible because statement-level triggers don't get inherited.) Fix by applying LIMIT 1 after sorting the candidates by inheritance level. Also, wrap the subquery in a CASE so that we don't have to execute it at all when the trigger is visibly non-inherited. Aside from saving some cycles, this avoids the need for a confusing and undocumented NULLIF(). While here, tweak the format of the emitted query to look a bit nicer for "psql -E", and add some explanation of this subquery, because it badly needs it. Report and patch by Justin Pryzby (with some editing by me). Back-patch to v13 where the faulty code came in. Discussion: https://postgr.es/m/20211217154356.GJ17618@telsasoft.com
* Avoid calling gettext() in signal handlers.Tom Lane2022-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | It seems highly unlikely that gettext() can be relied on to be async-signal-safe. psql used to understand that, but someone got it wrong long ago in the src/bin/scripts/ version of handle_sigint, and then the bad idea was perpetuated when those two versions were unified into src/fe_utils/cancel.c. I'm unsure why there have not been field complaints about this ... maybe gettext() is signal-safe once it's translated at least one message? But we have no business assuming any such thing. In cancel.c (v13 and up), I preserved our ability to localize "Cancel request sent" messages by invoking gettext() before the signal handler is set up. In earlier branches I just made src/bin/scripts/ not localize those messages, as psql did then. (Just for extra unsafety, the src/bin/scripts/ version was invoking fprintf() from a signal handler. Sigh.) Noted while fixing signal-safety issues in PQcancel() itself. Back-patch to all supported branches. Discussion: https://postgr.es/m/2937814.1641960929@sss.pgh.pa.us
* Avoid calling strerror[_r] in PQcancel().Tom Lane2022-01-17
| | | | | | | | | | | | | | | | | | | | | | | | PQcancel() is supposed to be safe to call from a signal handler, and indeed psql uses it that way. All of the library functions it uses are specified to be async-signal-safe by POSIX ... except for strerror. Neither plain strerror nor strerror_r are considered safe. When this code was written, back in the dark ages, we probably figured "oh, strerror will just index into a constant array of strings" ... but in any locale except C, that's unlikely to be true. Probably the reason we've not heard complaints is that (a) this error-handling code is unlikely to be reached in normal use, and (b) in many scenarios, localized error strings would already have been loaded, after which maybe it's safe to call strerror here. Still, this is clearly unacceptable. The best we can do without relying on strerror is to print the decimal value of errno, so make it do that instead. (This is probably not much loss of user-friendliness, given that it is hard to get a failure here.) Back-patch to all supported branches. Discussion: https://postgr.es/m/2937814.1641960929@sss.pgh.pa.us
* Teach hash_ok_operator() that record_eq is only sometimes hashable.Tom Lane2022-01-16
| | | | | | | | | | | The need for this was foreseen long ago, but when record_eq actually became hashable (in commit 01e658fa7), we missed updating this spot. Per bug #17363 from Elvis Pranskevichus. Back-patch to v14 where the faulty commit came in. Discussion: https://postgr.es/m/17363-f6d42fd0d726be02@postgresql.org