aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* Guard against core dump from uninitialized subplan.Tom Lane2020-11-03
| | | | | | | | | | | | | | If the planner erroneously puts a non-parallel-safe SubPlan into a parallelized portion of the query tree, nodeSubplan.c will fail in the worker processes because it finds a null in es_subplanstates, which it's unable to cope with. It seems worth a test-and-elog to make that an error case rather than a core dump case. This probably should have been included in commit 16ebab688, which was responsible for allowing nulls to appear in es_subplanstates to begin with. So, back-patch to v10 where that came in. Discussion: https://postgr.es/m/924226.1604422326@sss.pgh.pa.us
* Allow users with BYPASSRLS to alter their own passwords.Tom Lane2020-11-03
| | | | | | | | | | | | | | | | The intention in commit 491c029db was to require superuserness to change the BYPASSRLS property, but the actual effect of the coding in AlterRole() was to require superuserness to change anything at all about a BYPASSRLS role. Other properties of a BYPASSRLS role should be changeable under the same rules as for a normal role, though. Fix that, and also take care of some documentation omissions related to BYPASSRLS and REPLICATION role properties. Tom Lane and Stephen Frost, per bug report from Wolfgang Walther. Back-patch to all supported branches. Discussion: https://postgr.es/m/a5548a9f-89ee-3167-129d-162b5985fcf8@technowledgy.de
* Fix unportable use of getnameinfo() in pg_hba_file_rules view.Tom Lane2020-11-02
| | | | | | | | | | | | | | | | | | | | | | | fill_hba_line() thought it could get away with passing sizeof(struct sockaddr_storage) rather than the actual addrlen previously returned by getaddrinfo(). While that appears to work on many platforms, it does not work on FreeBSD 11: you get back a failure, which leads to the view showing NULL for the address and netmask columns in all rows. The POSIX spec for getnameinfo() is pretty clearly on FreeBSD's side here: you should pass the actual address length. So it seems plausible that there are other platforms where this coding also fails, and we just hadn't noticed. Also, IMO the fact that getnameinfo() failure leads to a NULL output is pretty bogus in itself. Our pg_getnameinfo_all() wrapper is careful to emit "???" on failure, and we should use that in such cases. NULL should only be emitted in rows that don't have IP addresses. Per bug #16695 from Peter Vandivier. Back-patch to v10 where this code was added. Discussion: https://postgr.es/m/16695-a665558e2f630be7@postgresql.org
* Extend PageIsVerified() to handle more custom optionsMichael Paquier2020-11-02
| | | | | | | | | | | | | | | | | | | | | | This is useful for checks of relation pages without having to load the pages into the shared buffers, and two cases can make use of that: page verification in base backups and the online, lock-safe, flavor. Compatibility is kept with past versions using a routine that calls the new extended routine with the set of options compatible with the original version. Contrary to d401c576, a macro cannot be used as there may be external code relying on the presence of the original routine. This is applied down to 11, where this will be used by a follow-up commit addressing a set of issues with page verification in base backups. Extracted from a larger patch by the same author. Author: Anastasia Lubennikova Reviewed-by: Michael Paquier, Julien Rouhaud Discussion: https://postgr.es/m/608f3476-0598-2514-2c03-e05c7d2b0cbd@postgrespro.ru Backpatch-through: 11
* Avoid null pointer dereference if error result lacks SQLSTATE.Tom Lane2020-11-01
| | | | | | | | | Although error results received from the backend should always have a SQLSTATE field, ones generated by libpq won't, making this code vulnerable to a crash after, say, untimely loss of connection. Noted by Coverity. Oversight in commit 403a3d91c. Back-patch to 9.5, as that was.
* Preserve index data in pg_statistic across REINDEX CONCURRENTLYMichael Paquier2020-11-01
| | | | | | | | | | | | | | | | | | Statistics associated to an index got lost after running REINDEX CONCURRENTLY, while the non-concurrent case preserves these correctly. The concurrent and non-concurrent operations need to be consistent for the end-user, and missing statistics would force to wait for a new analyze to happen, which could take some time depending on the activity of the existing autovacuum workers. This issue is fixed by copying any existing entries in pg_statistic associated to the old index to the new one. Note that this copy is already done with the data of the index in the stats collector. Reported-by: Fabrízio de Royes Mello Author: Michael Paquier, Fabrízio de Royes Mello Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/CAFcNs+qpFPmiHd1oTXvcPdvAHicJDA9qBUSujgAhUMJyUMb+SA@mail.gmail.com Backpatch-through: 12
* Reproduce debug_query_string==NULL on parallel workers.Noah Misch2020-10-31
| | | | | | | | | | Certain background workers initiate parallel queries while debug_query_string==NULL, at which point they attempted strlen(NULL) and died to SIGSEGV. Older debug_query_string observers allow NULL, so do likewise in these newer ones. Back-patch to v11, where commit 7de4a1bcc56f494acbd0d6e70781df877dc8ecb5 introduced the first of these. Discussion: https://postgr.es/m/20201014022636.GA1962668@rfd.leadboat.com
* Stabilize timetz test across DST transitions.Tom Lane2020-10-29
| | | | | | | | | | | | | | The timetz test cases I added in commit a9632830b were unintentionally sensitive to whether or not DST is active in the PST8PDT time zone. Thus, they'll start failing this coming weekend, as reported by Bernhard M. Wiedemann in bug #16689. Fortunately, DST-awareness is not significant to the purpose of these test cases, so we can just force them all to PDT (DST hours) to preserve stability of the results. Back-patch to v10, as the prior patch was. Discussion: https://postgr.es/m/16689-57701daa23b377bf@postgresql.org
* Use mode "r" for popen() in psql's evaluate_backtick().Tom Lane2020-10-28
| | | | | | | | | | | | | | | | | | In almost all other places, we use plain "r" or "w" mode in popen() calls (the exceptions being for COPY data). This one has been overlooked (possibly because it's buried in a ".l" flex file?), but it's using PG_BINARY_R. Kensuke Okamura complained in bug #16688 that we fail to strip \r when stripping the trailing newline from a backtick result string. That's true enough, but we'd also fail to convert embedded \r\n cleanly, which also seems undesirable. Fixing the popen() mode seems like the best way to deal with this. It's been like this for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/16688-c649c7b69cd7e6f8@postgresql.org
* Calculate extraUpdatedCols in query rewriter, not parser.Tom Lane2020-10-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's unsafe to do this at parse time because addition of generated columns to a table would not invalidate stored rules containing UPDATEs on the table ... but there might now be dependent generated columns that were not there when the rule was made. This also fixes an oversight that rewriteTargetView failed to update extraUpdatedCols when transforming an UPDATE on an updatable view. (Since the new calculation is downstream of that, rewriteTargetView doesn't actually need to do anything; but before, there was a demonstrable bug there.) In v13 and HEAD, this leads to easily-visible bugs because (since commit c6679e4fc) we won't recalculate generated columns that aren't listed in extraUpdatedCols. In v12 this bitmap is mostly just used for trigger-firing decisions, so you'd only notice a problem if a trigger cared whether a generated column had been updated. I'd complained about this back in May, but then forgot about it until bug #16671 from Michael Paul Killian revived the issue. Back-patch to v12 where this field was introduced. If existing stored rules contain any extraUpdatedCols values, they'll be ignored because the rewriter will overwrite them, so the bug will be fixed even for existing rules. (But note that if someone were to update to 13.1 or 12.5, store some rules with UPDATEs on tables having generated columns, and then downgrade to a prior minor version, they might observe issues similar to what this patch fixes. That seems unlikely enough to not be worth going to a lot of effort to fix.) Discussion: https://postgr.es/m/10206.1588964727@sss.pgh.pa.us Discussion: https://postgr.es/m/16671-2fa55851859fb166@postgresql.org
* Fix use-after-free bug with event triggers and ALTER TABLE.Tom Lane2020-10-27
| | | | | | | | | | | | | | | | | EventTriggerAlterTableEnd neglected to make sure that it built its output list in the right context. In simple cases this was masked because the function is called in PortalContext which will be sufficiently long-lived anyway; but that doesn't make it not a bug. Commit ced138e8c fixed this in HEAD and v13, but mistakenly chose not to back-patch further. Back-patch the same code change all the way (I didn't bother with the test case though, as it would prove nothing in pre-v13 branches). Per report from Arseny Sher. Original fix by Jehan-Guillaume de Rorthais. Discussion: https://postgr.es/m/877drcyprb.fsf@ars-thinkpad Discussion: https://postgr.es/m/20200902193715.6e0269d4@firost
* Makefile comment: remove reference to tools/thread/thread_testBruce Momjian2020-10-27
| | | | | | | | | | You can't compile thread_test alone anymore, and the location moved too. Reported-by: Tom Lane Discussion: https://postgr.es/m/1062278.1603819969@sss.pgh.pa.us Backpatch-through: 9.5
* pg_dump: Lock all relations, not just plain tablesAlvaro Herrera2020-10-27
| | | | | | | | | | | | | | | | Now that LOCK TABLE can take any relation type, acquire lock on all relations that are to be dumped. This prevents schema changes or deadlock errors that could cause a dump to fail after expending much effort. The server is tested to have the capability and the feature disabled if it doesn't, so that a patched pg_dump doesn't fail when connecting to an unpatched server. Backpatch to 9.5. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Wells Oliver <wells.oliver@gmail.com> Discussion: https://postgr.es/m/20201021200659.GA32358@alvherre.pgsql
* Accept relations of any kind in LOCK TABLEAlvaro Herrera2020-10-27
| | | | | | | | | | | | | | | The restriction that only tables and views can be locked by LOCK TABLE is quite arbitrary, since the underlying mechanism can lock any relation type. Drop the restriction so that programs such as pg_dump can lock all relations they're interested in, preventing schema changes that could cause a dump to fail after expending much effort. Backpatch to 9.5. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Wells Oliver <wells.oliver@gmail.com> Discussion: https://postgr.es/m/20201021200659.GA32358@alvherre.pgsql
* Fix corner case for a BEFORE ROW UPDATE trigger returning OLD.Tom Lane2020-10-25
| | | | | | | | | | | | | | | | | | | | | | | | | If the old row has any "missing" attributes that are supposed to be retrieved from an associated tuple descriptor, the wrong things happened because the trigger result is shoved directly into an executor slot that lacks the missing-attribute data. Notably, CHECK-constraint verification would incorrectly see those columns as NULL, and so would RETURNING-list evaluation. Band-aid around this by forcibly expanding the tuple before passing it to the trigger function. (IMO it was a fundamental misdesign to put the missing-attribute data into tuple constraints, which so much of the system considers to be optional. But we're probably stuck with that now, and will have to continue to apply band-aids as we find other places with similar issues.) Back-patch to v12. v11 would also have the issue, except that commit 920311ab1 already applied a similar band-aid. That forced expansion in more cases than seem really necessary, though, so this isn't a directly equivalent fix. Amit Langote, with some cosmetic changes by me Discussion: https://postgr.es/m/16644-5da7ef98a7ac4545@postgresql.org
* Fix incorrect parameter name in a function header commentDavid Rowley2020-10-25
| | | | | | Author: Zhijie Hou Discussion: https://postgr.es/m/14cd74ea00204cc8a7ea5d738ac82cd1@G08CNEXMBPEKD05.g08.fujitsu.local Backpatch-through: 12, where the mistake was introduced
* Fix ancient bug in ecpg's pthread_once() emulation for Windows.Tom Lane2020-10-24
| | | | | | | | | | | | | | | | We must not set the "done" flag until after we've executed the initialization function. Otherwise, other threads can fall through the initial unlocked test before initialization is really complete. This has been seen to cause rare failures of ecpg's thread/descriptor test, and it could presumably cause other sorts of misbehavior in threaded ECPG-using applications, since ecpglib relies on pthread_once() in several places. Diagnosis and patch by me, based on investigation by Alexander Lakhin. Back-patch to all supported branches (the bug dates to 2007). Discussion: https://postgr.es/m/16685-d6cd241872c101d3@postgresql.org
* Update time zone data files to tzdata release 2020d.Tom Lane2020-10-22
| | | | | DST law changes in Palestine, with a whopping 120 hours' notice. Also some historical corrections for Palestine.
* Sync our copy of the timezone library with IANA release tzcode2020d.Tom Lane2020-10-22
| | | | | | | | There's no functional change at all here, but I'm curious to see whether this change successfully shuts up Coverity's warning about a useless strcmp(), which appeared with the previous update. Discussion: http://mm.icann.org/pipermail/tz/2020-October/029370.html
* Fix connection string handling in psql's \connect command.Tom Lane2020-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | psql's \connect claims to be able to re-use previous connection parameters, but in fact it only re-uses the database name, user name, host name (and possibly hostaddr, depending on version), and port. This is problematic for assorted use cases. Notably, pg_dump[all] emits "\connect databasename" commands which we would like to have re-use all other parameters. If such a script is loaded in a psql run that initially had "-d connstring" with some non-default parameters, those other parameters would be lost, potentially causing connection failure. (Thus, this is the same kind of bug addressed in commits a45bc8a4f and 8e5793ab6, although the details are much different.) To fix, redesign do_connect() so that it pulls out all properties of the old PGconn using PQconninfo(), and then replaces individual properties in that array. In the case where we don't wish to re-use anything, get libpq's default settings using PQconndefaults() and replace entries in that, so that we don't need different code paths for the two cases. This does result in an additional behavioral change for cases where the original connection parameters allowed multiple hosts, say "psql -h host1,host2", and the \connect request allows re-use of the host setting. Because the previous coding relied on PQhost(), it would only permit reconnection to the same host originally selected. Although one can think of scenarios where that's a good thing, there are others where it is not. Moreover, that behavior doesn't seem to meet the principle of least surprise, nor was it documented; nor is it even clear it was intended, since that coding long pre-dates the addition of multi-host support to libpq. Hence, this patch is content to drop it and re-use the host list as given. Per Peter Eisentraut's comments on bug #16604. Back-patch to all supported branches. Discussion: https://postgr.es/m/16604-933f4b8791227b15@postgresql.org
* Use fast checkpoint in PostgresNode::backup()Alvaro Herrera2020-10-21
| | | | Should cause tests to be a bit faster
* Fix ALTER TABLE .. ENABLE/DISABLE TRIGGER recursionAlvaro Herrera2020-10-20
| | | | | | | | | | | | | | | | More precisely, correctly handle the ONLY flag indicating not to recurse. This was implemented in 86f575948c77 by recursing in trigger.c, but that's the wrong place; use ATSimpleRecursion instead, which behaves properly. However, because legacy inheritance has never recursed in that situation, make sure to do that only for new-style partitioning. I noticed this problem while testing a fix for another bug in the vicinity. This has been wrong all along, so backpatch to 11. Discussion: https://postgr.es/m/20201016235925.GA29829@alvherre.pgsql
* Avoid invalid alloc size error in shm_mqPeter Eisentraut2020-10-20
| | | | | | | | | | | | In shm_mq_receive(), a huge payload could trigger an unjustified "invalid memory alloc request size" error due to the way the buffer size is increased. Add error checks (documenting the upper limit) and avoid the error by limiting the allocation size to MaxAllocSize. Author: Markus Wanner <markus.wanner@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/3bb363e7-ac04-0ac4-9fe8-db1148755bfa%402ndquadrant.com
* Fix connection string handling in src/bin/scripts/ programs.Tom Lane2020-10-19
| | | | | | | | | | | | | | | | | | | | | When told to process all databases, clusterdb, reindexdb, and vacuumdb would reconnect by replacing their --maintenance-db parameter with the name of the target database. If that parameter is a connstring (which has been allowed for a long time, though we failed to document that before this patch), we'd lose any other options it might specify, for example SSL or GSS parameters, possibly resulting in failure to connect. Thus, this is the same bug as commit a45bc8a4f fixed in pg_dump and pg_restore. We can fix it in the same way, by using libpq's rules for handling multiple "dbname" parameters to add the target database name separately. I chose to apply the same refactoring approach as in that patch, with a struct to handle the command line parameters that need to be passed through to connectDatabase. (Maybe someday we can unify the very similar functions here and in pg_dump/pg_restore.) Per Peter Eisentraut's comments on bug #16604. Back-patch to all supported branches. Discussion: https://postgr.es/m/16604-933f4b8791227b15@postgresql.org
* In libpq for Windows, call WSAStartup once and WSACleanup not at all.Tom Lane2020-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Windows documentation insists that every WSAStartup call should have a matching WSACleanup call. However, if that ever had actual relevance, it wasn't in this century. Every remotely-modern Windows kernel is capable of cleaning up when a process exits without doing that, and must be so to avoid resource leaks in case of a process crash. Moreover, Postgres backends have done WSAStartup without WSACleanup since commit 4cdf51e64 in 2004, and we've never seen any indication of a problem with that. libpq's habit of doing WSAStartup during connection start and WSACleanup during shutdown is also rather inefficient, since a series of non-overlapping connection requests leads to repeated, quite expensive DLL unload/reload cycles. We document a workaround for that (having the application call WSAStartup for itself), but that's just a kluge. It's also worth noting that it's far from uncommon for applications to exit without doing PQfinish, and we've not heard reports of trouble from that either. However, the real reason for acting on this is that recent experiments by Alexander Lakhin show that calling WSACleanup during PQfinish is triggering the symptom we occasionally see that a process using libpq fails to emit expected stdio output. Therefore, let's change libpq so that it calls WSAStartup only once per process, during the first connection attempt, and never calls WSACleanup at all. While at it, get rid of the only other WSACleanup call in our code tree, in pg_dump/parallel.c; that presumably is equally useless. Back-patch of HEAD commit 7d00a6b2d. Discussion: https://postgr.es/m/ac976d8c-03df-d6b8-025c-15a2de8d9af1@postgrespro.ru
* Relax some asserts in merge join costing codeDavid Rowley2020-10-20
| | | | | | | | | | | | | | | | | | | | | | | | In the planner, it was possible, given an extreme enough case containing a large number of joins for the number of estimated rows to become infinite. This could cause problems in initial_cost_mergejoin() where we perform some calculations based on those row estimates. A problem case, presented by Onder Kalaci showed an Assert failure from an Assert checking outerstartsel <= outerendsel. In his test case this was effectively NaN <= Inf, which is false. The NaN outerstartsel came from multiplying the infinite outer_path_rows by 0.0. In master, this problem was fixed by a90c950fc, however, that fix was too invasive for the backbranches. Here we just relax the Asserts to allow them to pass. The worst that appears to happen from this is that we show NaN cost values and infinite row estimates in EXPLAIN. add_path() would have had a hard time doing anything useful with such costs, but that does not really matter as if the row estimates were even close to accurate, such plan would not complete this side of the heat death of the universe. Reported-by: Onder Kalaci Backpatch: 9.5 to 13 Discussion: https://postgr.es/m/DM6PR21MB1211FF360183BCA901B27F04D80B0@DM6PR21MB1211.namprd21.prod.outlook.com
* Update time zone data files to tzdata release 2020c.Tom Lane2020-10-16
| | | | | | DST law changes in Morocco, Canadian Yukon, Fiji, Macquarie Island, Casey Station (Antarctica). Historical corrections for France, Hungary, Monaco.
* Sync our copy of the timezone library with IANA release tzcode2020c.Tom Lane2020-10-16
| | | | | | | | | | | | | | | This changes zic's default output format from "-b fat" to "-b slim". We were already using "slim" in v13/HEAD, so those branches drop the explicit -b switch in the Makefiles. Instead, add an explicit "-b fat" in v12 and before, so that we don't change the output file format in those branches. (This is perhaps excessively conservative, but we decided not to do so in a12079109, and I'll stick with that.) Other non-cosmetic changes are to drop support for zic's long-obsolete "-y" switch, and to ensure that strftime() does not change errno unless it fails. As usual with tzcode changes, back-patch to all supported branches.
* llvmjit: Work around bug in LLVM 3.9 causing crashes after 72559438f92.Andres Freund2020-10-15
| | | | | | | | | | | | | | Unfortunately in LLVM 3.9 LLVMGetAttributeCountAtIndex(func, index) crashes when called with an index that has 0 attributes. Since there's no way to work around this in the C API, add a small C++ wrapper doing so. The only reason this didn't fail before 72559438f92 is that there always are function attributes... Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20201016001254.w2nfj7gd74jmb5in@alap3.anarazel.de Backpatch: 11-, like 72559438f92
* pg_upgrade: remove C99 compiler req. from commit 3c0471b5fdBruce Momjian2020-10-15
| | | | | | | | | | | | | | This commit required support for inline variable definition, which is not a requirement. RELEASE NOTE AUTHOR: the author of commit 3c0471b5fd (pg_upgrade/tablespaces) was Justin Pryzby, not me. Reported-by: Andres Freund Discussion: https://postgr.es/m/20201016001959.h24fkywfubkv2pc5@alap3.anarazel.de Backpatch-through: 9.5
* pg_upgrade: generate check error for left-over new tablespaceBruce Momjian2020-10-15
| | | | | | | | | | | | | Previously, if pg_upgrade failed, and the user recreated the cluster but did not remove the new cluster tablespace directory, a later pg_upgrade would fail since the new tablespace directory would already exists. This adds error reporting for this during check. Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20200925005531.GJ23631@telsasoft.com Backpatch-through: 9.5
* llvmjit: Also copy parameter / return value attributes from template functions.Andres Freund2020-10-15
| | | | | | | | | | | | | | | | | | | | | Previously we only copied the function attributes. That caused problems at least on s390x: Because we didn't copy the 'zeroext' attribute for ExecAggTransReparent()'s *IsNull parameters, expressions invoking it didn't ensure that the upper bytes of the registers were zeroed. In the - relatively rare - cases where not, ExecAggTransReparent() wrongly ended up in the newValueIsNull branch due to the register not being zero. Subsequently causing a crash. It's quite possible that this would cause problems on other platforms, and in other places than just ExecAggTransReparent() on s390x. Thanks to Christoph (and the Debian project) for providing me with access to a s390x machine, allowing me to debug this. Reported-By: Christoph Berg Author: Andres Freund Discussion: https://postgr.es/m/20201015083246.kie5726xerdt3ael@alap3.anarazel.de Backpatch: 11-, where JIT was added
* doc: improve description of synchronous_commit modesBruce Momjian2020-10-15
| | | | | | | | | | | | Previously it wasn't clear exactly what each of the synchronous_commit modes accomplished. This clarifies that, and adds a table describing it. Only backpatched through 9.6 since 9.5 doesn't have all the options. Reported-by: kghost0@gmail.com Discussion: https://postgr.es/m/159741195522.14321.13812604195366728976@wrigleys.postgresql.org Backpatch-through: 9.6
* In the postmaster, rely on the signal infrastructure to block signals.Tom Lane2020-10-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POSIX sigaction(2) can be told to block a set of signals while a signal handler executes. Make use of that instead of manually blocking and unblocking signals in the postmaster's signal handlers. This should save a few cycles, but more importantly it prevents recursive invocation of signal handlers when many signals arrive in close succession. (Assuming that the platform's signal infrastructure is designed to avoid consuming stack space in that case, but this is demonstrably true at least on Linux.) The existing code has been seen to recurse to the point of stack overflow, either in the postmaster or in a forked-off child. Back-patch of commit 9abb2bfc0. At the time, we'd only seen excess postmaster stack consumption in the buildfarm; but we now have a user report of it, and that commit has aged enough to have a fair amount of confidence that it doesn't break anything. This still doesn't change anything about the way that it works on Windows. Perhaps someone else would like to fix that? Per bug #16673 from David Geier. Back-patch to 9.6. Although the problem exists in principle before that, we've only seen it actually materialize in connection with heavy use of parallel workers, so it doesn't seem necessary to do anything in 9.5; and the relevant code is different there, too. Discussion: https://postgr.es/m/16673-d278c604f8e34ec0@postgresql.org Discussion: https://postgr.es/m/14878.1570820201@sss.pgh.pa.us
* Paper over regression failures in infinite_recurse() on PPC64 Linux.Tom Lane2020-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our infinite_recurse() test to verify sane stack-overrun behavior is affected by a bug of the Linux kernel on PPC64: it will get SIGSEGV if it receives a signal when the stack depth is (a) over 1MB and (b) within a few kB of filling the current physical stack allocation. See https://bugzilla.kernel.org/show_bug.cgi?id=205183. Since this test is a bit time-consuming and we run it in parallel with test scripts that do a lot of DDL, it can be expected to get an sinval catchup interrupt at some point, leading to failure if the timing is wrong. This has caused more than 100 buildfarm failures over the past year or so. While a fix exists for the kernel bug, it might be years before that propagates into all production kernels, particularly in some of the older distros we have in the buildfarm. For now, let's just back off and not run this test on Linux PPC64; that loses nothing in test coverage so far as our own code is concerned. To do that, split this test into a new script infinite_recurse.sql and skip the test when the platform name is powerpc64...-linux-gnu. Back-patch to v12. Branches before that have not been seen to get this failure. No doubt that's because the "errors" test was not run in parallel with other tests before commit 798070ec0, greatly reducing the odds of an sinval catchup being necessary. I also back-patched 3c8553547 into v12, just so the new regression script would look the same in all branches having it. Discussion: https://postgr.es/m/3479046.1602607848@sss.pgh.pa.us Discussion: https://postgr.es/m/20190723162703.GM22387%40telsasoft.com
* Fix GiST buffering build to work when there are included columns.Tom Lane2020-10-12
| | | | | | | | | | | | | | gistRelocateBuildBuffersOnSplit did not get the memo about which attribute count to use. This could lead to a crash if there were included columns and buffering build was chosen. (Because there are random page-split decisions elsewhere in GiST index build, the crashes are not entirely deterministic.) Back-patch to v12 where GiST gained support for included columns. Pavel Borisov Discussion: https://postgr.es/m/CALT9ZEECCV5m7wvxg46PC-7x-EybUmnpupBGhSFMoAAay+r6HQ@mail.gmail.com
* Fix memory leak when guc.c decides a setting can't be applied now.Tom Lane2020-10-12
| | | | | | | | | | | | | | | | | | | | The prohibitValueChange code paths in set_config_option(), which are executed whenever we re-read a PGC_POSTMASTER variable from postgresql.conf, neglected to free anything before exiting. Thus we'd leak the proposed new value of a PGC_STRING variable, as noted by BoChen in bug #16666. For all variable types, if the check hook creates an "extra" chunk, we'd also leak that. These are malloc not palloc chunks, so there is no mechanism for recovering the leaks before process exit. Fortunately, the values are typically not very large, meaning you'd have to go through an awful lot of SIGHUP configuration-reload cycles to make the leakage amount to anything. Still, for a long-lived postmaster process it could potentially be a problem. Oversight in commit 2594cf0e8. Back-patch to all supported branches. Discussion: https://postgr.es/m/16666-2c41a4eec61b03e1@postgresql.org
* Fix optimization hazard in gram.y's makeOrderedSetArgs(), redux.Tom Lane2020-10-07
| | | | | | | | | | | | | | | | It appears that commit cf63c641c, which intended to prevent misoptimization of the result-building step in makeOrderedSetArgs, didn't go far enough: buildfarm member hornet's version of xlc is now optimizing back to the old, broken behavior in which list_length(directargs) is fetched only after list_concat() has changed that value. I'm not entirely convinced whether that's an undeniable compiler bug or whether it can be justified by a sufficiently aggressive interpretation of C sequence points. So let's just change the code to make it harder to misinterpret. Back-patch to all supported versions, just in case. Discussion: https://postgr.es/m/1830491.1601944935@sss.pgh.pa.us
* Rethink recent fix for pg_dump's handling of extension config tables.Tom Lane2020-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3eb3d3e78 was a few bricks shy of a load: while it correctly set the table's "interesting" flag when deciding to dump the data of an extension config table, it was not correct to clear that flag if we concluded we shouldn't dump the data. This led to the crash reported in bug #16655, because in fact we'll traverse dumpTableSchema anyway for all extension tables (to see if they have user-added seclabels or RLS policies). The right thing to do is to force "interesting" true in makeTableDataInfo, and otherwise leave the flag alone. (Doing it there is more future-proof in case additional calls are added, and it also avoids setting the flag unnecessarily if that function decides the table is non-dumpable.) This investigation also showed that while only the --inserts code path had an obvious failure in the case considered by 3eb3d3e78, the COPY code path also has a problem with not having loaded table subsidiary data. That causes fmtCopyColumnList to silently return an empty string instead of the correct column list. That accidentally mostly works, which perhaps is why we didn't notice this before. It would only fail if the restore column order is different from the dump column order, which only happens in weird inheritance cases, so it's not surprising nobody had hit the case with an extension config table. Nonetheless, it's a bug, and it goes a long way back, not just to v12 where the --inserts code path started to have a problem with this. In hopes of catching such cases a bit sooner in future, add some Asserts that "interesting" has been set in both dumpTableData and dumpTableSchema. Adjust the test case added by 3eb3d3e78 so that it checks the COPY rather than INSERT form of that bug, allowing it to detect the longer-standing symptom. Per bug #16655 from Cameron Daniel. Back-patch to all supported branches. Discussion: https://postgr.es/m/16655-5c92d6b3a9438137@postgresql.org Discussion: https://postgr.es/m/18048b44-3414-b983-8c7c-9165b177900d@2ndQuadrant.com
* pg_upgrade: remove pre-8.4 code and >= 8.4 checkBruce Momjian2020-10-06
| | | | | | | | | | We only support upgrading from >= 8.4 so no need for this code or tests. Reported-by: Magnus Hagander Discussion: https://postgr.es/m/CABUevEx-D0PNVe00tkeQRGennZQwDtBJn=493MJt-x6sppbUxA@mail.gmail.com Backpatch-through: 9.5
* pg_upgrade; change major version comparisons to use <=, not <Bruce Momjian2020-10-06
| | | | | | This makes checking for older major versions more consistent. Backpatch-through: 9.5
* Build EC members for child join rels in the right memory context.Tom Lane2020-10-06
| | | | | | | | | | | | | | | | | This patch prevents crashes or wrong plans when partition-wise joins are considered during GEQO planning, as a consequence of the EquivalenceClass data structures becoming corrupt after a GEQO context reset. A remaining problem is that successive GEQO cycles will make multiple copies of the required EC members, since add_child_join_rel_equivalences has no idea that such members might exist already. For now we'll just live with that. The lack of field complaints of crashes suggests that this is a mighty little-used situation. Back-patch to v12 where this code was introduced. Discussion: https://postgr.es/m/1683100.1601860653@sss.pgh.pa.us
* Fix two latent(?) bugs in equivclass.c.Tom Lane2020-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | get_eclass_for_sort_expr() computes expr_relids and nullable_relids early on, even though they won't be needed unless we make a new EquivalenceClass, which we often don't. Aside from the probably-minor inefficiency, there's a memory management problem: these bitmapsets will be built in the caller's context, leading to dangling pointers if that is shorter-lived than root->planner_cxt. This would be a live bug if get_eclass_for_sort_expr() could be called with create_it = true during GEQO join planning. So far as I can find, the core code never does that, but it's hard to be sure that no extensions do, especially since the comments make it clear that that's supposed to be a supported case. Fix by not computing these values until we've switched into planner_cxt to build the new EquivalenceClass. generate_join_implied_equalities() uses inner_rel->relids to look up relevant eclasses, but it ought to be using nominal_inner_relids. This is presently harmless because a child RelOptInfo will always have exactly the same eclass_indexes as its topmost parent; but that might not be true forever, and anyway it makes the code confusing. The first of these is old (introduced by me in f3b3b8d5b), so back-patch to all supported branches. The second only dates to v13, but we might as well back-patch it to keep the code looking similar across branches. Discussion: https://postgr.es/m/1508010.1601832581@sss.pgh.pa.us
* Improve stability of identity.sql regression test.Tom Lane2020-10-04
| | | | | | | | | | | | | | I noticed while trying to run the regression tests under a low geqo_threshold that one query on information_schema.columns had unstable (as in, variable from one run to the next) output order. This is pretty unsurprising given the complexity of the underlying plan. Interestingly, of this test's three nigh-identical queries on information_schema.columns, the other two already had ORDER BY clauses guaranteeing stable output. Let's make this one look the same. Back-patch to v10 where this test was added. We've not heard field reports of the test failing, but this experience shows that it can happen when testing under even slightly unusual conditions.
* Put back explicit setting of replication values within TAP tests.Tom Lane2020-10-01
| | | | | | | | | | | | | | Commit 151c0c5f7 neglected the possibility that a TEMP_CONFIG file would explicitly set max_wal_senders=0; as indeed buildfarm member thorntail does, so that it can test wal_level=minimal in other test suites. Hence, rather than assuming that max_wal_senders=10 will prevail if we say nothing, set it explicitly. Set max_replication_slots=10 explicitly too, just to be safe. Back-patch to v10, like the previous patch. Discussion: https://postgr.es/m/723911.1601417626@sss.pgh.pa.us
* Fix incorrect assertion on number of array dimensions.Heikki Linnakangas2020-10-01
| | | | | | | | | This has been wrong ever since the support for multi-dimensional arrays as PL/python function arguments and return values was introduced in commit 94aceed317. Backpatch-through: 10 Discussion: https://www.postgresql.org/message-id/61647b8e-961c-0362-d5d3-c8a18f4a7ec6%40iki.fi
* Reword partitioning error messageAlvaro Herrera2020-09-30
| | | | | | | | | | The error message about columns in the primary key not including all of the partition key was unclear; reword it. Backpatch all the way to pg11, where it appeared. Reported-by: Nagaraj Raj <nagaraj.sf@yahoo.com> Discussion: https://postgr.es/m/64062533.78364.1601415362244@mail.yahoo.com
* Fix handling of BC years in to_date/to_timestamp.Tom Lane2020-09-30
| | | | | | | | | | | | | | | | | | | | | | | Previously, a conversion such as to_date('-44-02-01','YYYY-MM-DD') would result in '0045-02-01 BC', as the code attempted to interpret the negative year as BC, but failed to apply the correction needed for our internal handling of BC years. Fix the off-by-one problem. Also, arrange for the combination of a negative year and an explicit "BC" marker to cancel out and produce AD. This is how the negative-century case works, so it seems sane to do likewise. Continue to read "year 0000" as 1 BC. Oracle would throw an error, but we've accepted that case for a long time so I'm hesitant to change it in a back-patch. Per bug #16419 from Saeed Hubaishan. Back-patch to all supported branches. Dar Alathar-Yemen and Tom Lane Discussion: https://postgr.es/m/16419-d8d9db0a7553f01b@postgresql.org
* Remove obsolete replication settings within TAP tests.Tom Lane2020-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PostgresNode.pm set "max_wal_senders = 5" for replication testing, but this seems to be slightly too low for our current test suite. Slower buildfarm members frequently report "number of requested standby connections exceeds max_wal_senders" failures, due to old walsenders not exiting instantaneously. Usually, the test does not fail overall because of automatic walreceiver restart, but sometimes the failure becomes visible; and in any case such retries slow down the test. That value came in with commit 89ac7004d, but was soon obsoleted by f6d6d2920, which raised the built-in default from zero to 10; so that PostgresNode.pm is actually setting it to less than the conservative built-in default. That seems pretty pointless, so let's remove the special setting and let the default prevail, in hopes of making the TAP tests more robust. Likewise, the setting "max_replication_slots = 5" is obsolete and can be removed. While here, reverse-engineer a comment about why we're choosing less-than-default values for some other settings. (Note: before v12, max_wal_senders counted against max_connections so that the latter setting also needs some fiddling with.) Back-patch to v10 where the subscription tests were added. It's likely that the older branches aren't pushing the boundaries of max_wal_senders, but I'm disinclined to spend time trying to figure out exactly when it started to be a problem. Discussion: https://postgr.es/m/723911.1601417626@sss.pgh.pa.us
* Fix memory leak in plpgsql's CALL processing.Tom Lane2020-09-29
| | | | | | | | | | | | | | | | | | | | | When executing a CALL or DO in a non-atomic context (i.e., not inside a function or query), plpgsql creates a new plan each time through, as a rather hacky solution to some resource management issues. But it failed to free this plan until exit of the current procedure or DO block, resulting in serious memory bloat in procedures that called other procedures many times. Fix by remembering to free the plan, and by being more honest about restoring the previous state (otherwise, recursive procedure calls have a problem). There was also a smaller leak associated with recalculation of the "target" list of output variables. Fix that by using the statement- lifespan context to hold non-permanent values. Back-patch to v11 where procedures were introduced. Pavel Stehule and Tom Lane Discussion: https://postgr.es/m/CAFj8pRDiiU1dqym+_P4_GuTWm76knJu7z9opWayBJTC0nQGUUA@mail.gmail.com