aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
* Remove useless bms_copy step in RelationGetIndexAttrBitmap.Tom Lane2019-01-21
| | | | | | | | | | Seems to be from a bad case of copy-and-paste-itis in commit 665d1fad9. It wouldn't be quite so annoying if it didn't contradict the comment half a dozen lines above. David Rowley Discussion: https://postgr.es/m/CAKJS1f95Dyf8Qkdz4W+PbCmT-HTb54tkqUCC8isa2RVgSJ_pXQ@mail.gmail.com
* Flush relcache entries when their FKs are meddled withAlvaro Herrera2019-01-21
| | | | | | | | | | | | | | | Back in commit 100340e2dcd0, we made relcache entries keep lists of the foreign keys applying to the relation -- but we forgot to update CacheInvalidateHeapTuple to flush those entries when new FKs got created or existing ones updated/deleted. No bugs appear to have been reported that would be explained by this ommission, but I noticed the problem while working on an unrelated bugfix which clearly showed it. Fix by adding relcache flush on relevant foreign key changes. Backpatch to 9.6, like the aforementioned commit. Discussion: https://postgr.es/m/201901211927.7mmhschxlejh@alvherre.pgsql Reviewed-by: Tom Lane
* Revert "Add valgrind suppressions for wcsrtombs optimizations"Tomas Vondra2019-01-19
| | | | | | | | | | | This reverts commit 5b16a353543ecec36ffda68269defb7b1b002f60. Per discussion, it's not desirable to add valgrind suppressions for outside our own code base (e.g. glibc in this case), especially when the suppressions may be platform-specific. There are better ways to deal with that, e.g. by providing local suppressions. Discussion: https://www.postgresql.org/message-id/flat/90ac0452-e907-e7a4-b3c8-15bd33780e62%402ndquadrant.com
* Fix outdated commentPeter Eisentraut2019-01-19
| | | | | The issue the comment is referring to was fixed by 08859bb5c2cebc132629ca838113d27bb31b990c.
* Use our own getopt() on OpenBSD.Tom Lane2019-01-18
| | | | | | | | | | | | | | | | Recent OpenBSD (at least 5.9 and up) has a version of getopt(3) that will not cope with the "-:" spec we use to accept double-dash options in postgres.c and postmaster.c. Admittedly, that's a hack because POSIX only requires getopt() to allow alphanumeric option characters. I have no desire to find another way, however, so let's just do what we were already doing on Solaris: force use of our own src/port/getopt.c implementation. In passing, improve some of the comments around said implementation. Per buildfarm and local testing. Back-patch to all supported branches. Discussion: https://postgr.es/m/30197.1547835700@sss.pgh.pa.us
* Enforce non-parallel plan when calling current_schema() in newly-added testMichael Paquier2019-01-18
| | | | | | | | | | current_schema() gets called in the recently-added regression test from c5660e0, and can be used in a parallel context, causing its call to fail when creating a temporary schema. Per buildfarm members crake and lapwing. Discussion: https://postgr.es/m/20190118005949.GD1883@paquier.xyz
* Restrict the use of temporary namespace in two-phase transactionsMichael Paquier2019-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attempting to use a temporary table within a two-phase transaction is forbidden for ages. However, there have been uncovered grounds for a couple of other object types and commands which work on temporary objects with two-phase commit. In short, trying to create, lock or drop an object on a temporary schema should not be authorized within a two-phase transaction, as it would cause its state to create dependencies with other sessions, causing all sorts of side effects with the existing session or other sessions spawned later on trying to use the same temporary schema name. Regression tests are added to cover all the grounds found, the original report mentioned function creation, but monitoring closer there are many other patterns with LOCK, DROP or CREATE EXTENSION which are involved. One of the symptoms resulting in combining both is that the session which used the temporary schema is not able to shut down completely, waiting for being able to drop the temporary schema, something that it cannot complete because of the two-phase transaction involved with temporary objects. In this case the client is able to disconnect but the session remains alive on the backend-side, potentially blocking connection backend slots from being used. Other problems reported could also involve server crashes. This is back-patched down to v10, which is where 9b013dc has introduced MyXactFlags, something that this patch relies on. Reported-by: Alexey Bashtanov Author: Michael Paquier Reviewed-by: Masahiko Sawada Discussion: https://postgr.es/m/5d910e2e-0db8-ec06-dd5f-baec420513c3@imap.cc Backpatch-through: 10
* Postpone aggregate checks until after collation is assigned.Andrew Gierth2019-01-17
| | | | | | | | | | | | | | | | | | | | | | | Previously, parseCheckAggregates was run before assign_query_collations, but this causes problems if any expression has already had a collation assigned by some transform function (e.g. transformCaseExpr) before parseCheckAggregates runs. The differing collations would cause expressions not to be recognized as equal to the ones in the GROUP BY clause, leading to spurious errors about unaggregated column references. The result was that CASE expr WHEN val ... would fail when "expr" contained a GROUPING() expression or matched one of the group by expressions, and where collatable types were involved; whereas the supposedly identical CASE WHEN expr = val ... would succeed. Backpatch all the way; this appears to have been wrong ever since collations were introduced. Per report from Guillaume Lelarge, analysis and patch by me. Discussion: https://postgr.es/m/CAECtzeVSO_US8C2Khgfv54ZMUOBR4sWq+6_bLrETnWExHT=rFg@mail.gmail.com Discussion: https://postgr.es/m/87muo0k0c7.fsf@news-spur.riddles.org.uk
* Fix typos in documentation and for one wait eventMichael Paquier2019-01-15
| | | | | | | | | These have been found while cross-checking for the use of unique words in the documentation, and a wait event was not getting generated in a way consistent to what the documentation provided. Author: Alexander Lakhin Discussion: https://postgr.es/m/9b5a3a85-899a-ae62-dbab-1e7943aa5ab1@gmail.com
* fix typoAndrew Dunstan2019-01-13
|
* Make DLSUFFIX easily discoverable by build scriptsAndrew Dunstan2019-01-13
| | | | | | | | | This will enable things like the buildfarm client to discover more reliably if certain libraries have been installed. Discussion: https://postgr.es/m/859e7c91-7ef4-d4b4-2ca2-8046e0cbee09@2ndQuadrant.com Backpatch to all live branches.
* Fix up confusion over how to use EXTRA_INSTALL.Tom Lane2019-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | Some makefiles were trying to do this: temp-install: EXTRA_INSTALL=contrib/test_decoding but that no longer works as of commit aa019da52: the macro is now consulted by the checkprep target, one level down, and apparently gmake doesn't propagate such macro settings recursively. The problem is masked since 42e61c774 because pgxs.mk also sets up EXTRA_INSTALL, and correctly applies it to the checkprep target. Unfortunately I'd not risked back-patching that to before v11. Since aa019da52 was pushed back to v10, it broke test_decoding there (the only module for which this actually makes a difference at present). Hence, back-patch 42e61c774 to v10. Also, remove some demonstrably useless settings of EXTRA_INSTALL in v10 and v11 (they'd already been cleaned up in HEAD). Per buildfarm. Discussion: https://postgr.es/m/CAEepm=1pEJdwv6DSGmOfpX0EaX7L7sT28c1nXpqvQvmLfEWb1g@mail.gmail.com
* Avoid sharing PARAM_EXEC slots between different levels of NestLoop.Tom Lane2019-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now, createplan.c attempted to share PARAM_EXEC slots for NestLoopParams across different plan levels, if the same underlying Var was being fed down to different righthand-side subplan trees by different NestLoops. This was, I think, more of an artifact of using subselect.c's PlannerParamItem infrastructure than an explicit design goal, but anyway that was the end result. This works well enough as long as the plan tree is executing synchronously, but the feature whereby Gather can execute the parallelized subplan locally breaks it. An upper NestLoop node might execute for a row retrieved from a parallel worker, and assign a value for a PARAM_EXEC slot from that row, while the leader's copy of the parallelized subplan is suspended with a different active value of the row the Var comes from. When control eventually returns to the leader's subplan, it gets the wrong answers if the same PARAM_EXEC slot is being used within the subplan, as reported in bug #15577 from Bartosz Polnik. This is pretty reminiscent of the problem fixed in commit 46c508fbc, and the proper fix seems to be the same: don't try to share PARAM_EXEC slots across different levels of controlling NestLoop nodes. This requires decoupling NestLoopParam handling from PlannerParamItem handling, although the logic remains somewhat similar. To avoid bizarre division of labor between subselect.c and createplan.c, I decided to move all the param-slot-assignment logic for both cases out of those files and put it into a new file paramassign.c. Hopefully it's a bit better documented now, too. A regression test case for this might be nice, but we don't know a test case that triggers the problem with a suitably small amount of data. Back-patch to 9.6 where we added Gather nodes. It's conceivable that related problems exist in older branches; but without some evidence for that, I'll leave the older branches alone. Discussion: https://postgr.es/m/15577-ca61ab18904af852@postgresql.org
* Improve ANALYZE's handling of concurrent-update scenarios.Tom Lane2019-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the rule for whether or not a tuple seen by ANALYZE should be included in its sample. When we last touched this logic, in commit 51e1445f1, we weren't thinking very hard about tuples being UPDATEd by a long-running concurrent transaction. In such a case, we might see the pre-image as either LIVE or DELETE_IN_PROGRESS depending on timing; and we might see the post-image not at all, or as INSERT_IN_PROGRESS. Since the existing code will not sample either DELETE_IN_PROGRESS or INSERT_IN_PROGRESS tuples, this leads to concurrently-updated rows being omitted from the sample entirely. That's not very helpful, and it's especially the wrong thing if the concurrent transaction ends up rolling back. The right thing seems to be to sample DELETE_IN_PROGRESS rows just as if they were live. This makes the "sample it" and "count it" decisions the same, which seems good for consistency. It's clearly the right thing if the concurrent transaction ends up rolling back; in effect, we are sampling as though IN_PROGRESS transactions haven't happened yet. Also, this combination of choices ensures maximum robustness against the different combinations of whether and in which state we might see the pre- and post-images of an update. It's slightly annoying that we end up recording immediately-out-of-date stats in the case where the transaction does commit, but on the other hand the stats are fine for columns that didn't change in the update. And the alternative of sampling INSERT_IN_PROGRESS rows instead seems like a bad idea, because then the sampling would be inconsistent with the way rows are counted for the stats report. Per report from Mark Chambers; thanks to Jeff Janes for diagnosing what was happening. Back-patch to all supported versions. Discussion: https://postgr.es/m/CAFh58O_Myr6G3tcH3gcGrF-=OExB08PJdWZcSBcEcovaiPsrHA@mail.gmail.com
* Update ssl test certificates and keysPeter Eisentraut2019-01-03
| | | | | | | | | | | | | | | Debian testing and newer now require that RSA and DHE keys are at least 2048 bit long and no longer allow SHA-1 for signatures in certificates. This is currently causing the ssl tests to fail there because the test certificates and keys have been created in violation of those conditions. Update the parameters to create the test files and create a new set of test files. Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> Reported-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/20180917131340.GE31460%40paquier.xyz
* Don't believe MinMaxExpr is leakproof without checking.Tom Lane2019-01-02
| | | | | | | | | | | | | | | | MinMaxExpr invokes the btree comparison function for its input datatype, so it's only leakproof if that function is. Many such functions are indeed leakproof, but others are not, and we should not just assume that they are. Hence, adjust contain_leaked_vars to verify the leakproofness of the referenced function explicitly. I didn't add a regression test because it would need to depend on some particular comparison function being leaky, and that's a moving target, per discussion. This has been wrong all along, so back-patch to supported branches. Discussion: https://postgr.es/m/31042.1546194242@sss.pgh.pa.us
* Process EXTRA_INSTALL serially, during the first temp-install.Noah Misch2018-12-31
| | | | | | | | This closes a race condition in "make -j check-world"; the symptom was EEXIST errors. Back-patch to v10, before which parallel check-world had worse problems. Discussion: https://postgr.es/m/20181224221601.GA3227827@rfd.leadboat.com
* Send EXTRA_INSTALL errors to install.log, not stderr.Noah Misch2018-12-31
| | | | | | | We already redirected other temp-install stderr and all temp-install stdout in this way. Back-patch to v10, like the next commit. Discussion: https://postgr.es/m/20181224221601.GA3227827@rfd.leadboat.com
* pg_regress: Promptly detect failed postmaster startup.Noah Misch2018-12-31
| | | | | | | | | | | Detect it the way pg_ctl's wait_for_postmaster() does. When pg_regress spawned a postmaster that failed startup, we were detecting that only with "pg_regress: postmaster did not respond within 60 seconds". Back-patch to 9.4 (all supported versions). Reviewed by Tom Lane. Discussion: https://postgr.es/m/20181231172922.GA199150@gust.leadboat.com
* pg_rewind: Add missing newline to error messagePeter Eisentraut2018-12-29
|
* Fix latent problem with pg_jrand48().Tom Lane2018-12-28
| | | | | | | | | | | | | | | | | | | POSIX specifies that jrand48() returns a signed 32-bit value (in the range [-2^31, 2^31)), but our code was returning an unsigned 32-bit value (in the range [0, 2^32)). This doesn't actually matter to any existing call site, because they all cast the "long" result to int32 or uint32; but it will doubtless bite somebody in the future. To fix, cast the arithmetic result to int32 explicitly before the compiler widens it to long (if widening is needed). While at it, upgrade this file's far-short-of-project-style comments. Had there been some peer pressure to document pg_jrand48() properly, maybe this thinko wouldn't have gotten committed to begin with. Backpatch to v10 where pg_jrand48() was added, just in case somebody back-patches a fix that uses it and depends on the standard behavior. Discussion: https://postgr.es/m/17235.1545951602@sss.pgh.pa.us
* Have DISCARD ALL/TEMP remove leftover temp tablesAlvaro Herrera2018-12-27
| | | | | | | | | | | | | | | | | | | | Previously, it would only remove temp tables created in the same session; but if the session uses the BackendId of a previously crashed backend that left temp tables around, those would not get removed. Since autovacuum would not drop them either (because it sees that the BackendId is in use by the current session) these can cause annoying xid-wraparound warnings. Apply to branches 9.4 to 10. This is not a problem since version 11, because commit 943576bddcb5 added state tracking that makes autovacuum realize that those temp tables are not ours, so it removes them. This is useful to handle in DISCARD, because even though it does not handle all situations, it does handle the common one where a connection pooler keeps the same session open for an indefinitely long time. Discussion: https://postgr.es/m/20181226190834.wsk2wzott5yzrjiq@alvherre.pgsql Reviewed-by: Takayuki Tsunakawa, Michaël Paquier
* Make autovacuum more selective about temp tables to keepAlvaro Herrera2018-12-27
| | | | | | | | | | | | | | | | | | | When temp tables are in danger of XID wraparound, autovacuum drops them; however, it preserves those that are owned by a working session. This is desirable, except when the session is connected to a different database (because the temp tables cannot be from that session), so make it only keep the temp tables only if the backend is in the same database as the temp tables. This is not bulletproof: it fails to detect temp tables left by a session whose backend ID is reused in the same database but the new session does not use temp tables. Commit 943576bddcb5 fixes that case too, for branches 11 and up (which is why we don't apply this fix to those branches), but back-patching that one is not universally agreed on. Discussion: https://postgr.es/m/20181214162843.37g6h3txto43akrb@alvherre.pgsql Reviewed-by: Takayuki Tsunakawa, Michaël Paquier
* Ignore inherited temp relations from other sessions when truncatingMichael Paquier2018-12-27
| | | | | | | | | | | | | | | | | | Inheritance trees can include temporary tables if the parent is permanent, which makes possible the presence of multiple temporary children from different sessions. Trying to issue a TRUNCATE on the parent in this scenario causes a failure, so similarly to any other queries just ignore such cases, which makes TRUNCATE work transparently. This makes truncation behave similarly to any other DML query working on the parent table with queries which need to be issues on children. A set of isolation tests is added to cover basic cases. Reported-by: Zhou Digoal Author: Amit Langote, Michael Paquier Discussion: https://postgr.es/m/15565-ce67a48d0244436a@postgresql.org Backpatch-through: 9.4
* Fix portability failure introduced in commits d2b0b60e7 et al.Tom Lane2018-12-26
| | | | | | | | | | | | I made a frontend fprintf() format use %m, forgetting that that's only safe in HEAD not the back branches; prior to 96bf88d52 and d6c55de1f, it would work on glibc platforms but not elsewhere. Revert to using %s ... strerror(errno) as the code did before. We could have left HEAD as-is, but for code consistency across branches, I chose to apply this patch there too. Per Coverity and a few buildfarm members.
* Prioritize history files when archivingMichael Paquier2018-12-24
| | | | | | | | | | | | | | | | | | | | | | | | At the end of recovery for the post-promotion process, a new history file is created followed by the last partial segment of the previous timeline. Based on the timing, the archiver would first try to archive the last partial segment and then the history file. This can delay the detection of a new timeline taken, particularly depending on the time it takes to transfer the last partial segment as it delays the moment the history file of the new timeline gets archived. This can cause promoted standbys to use the same timeline as one already taken depending on the circumstances if multiple instances look at archives at the same location. This commit changes the order of archiving so as history files are archived in priority over other file types, which reduces the likelihood of the same timeline being taken (still not reducing the window to zero), and it makes the archiver behave more consistently with the startup process doing its post-promotion business. Author: David Steele Reviewed-by: Michael Paquier, Kyotaro Horiguchi Discussion: https://postgr.es/m/929068cf-69e1-bba2-9dc0-e05986aed471@pgmasters.net Backpatch-through: 9.5
* Disable WAL-skipping optimization for COPY on viewsMichael Paquier2018-12-23
| | | | | | | | | | | | | | | | | | | | COPY can skip writing WAL when loading data on a table which has been created in the same transaction as the one loading the data, however this cannot work on views as this would result in trying to flush relation files which do not exist. So disable the optimization so as commands are able to work the same way with any configuration of wal_level. A test is added to cover this case, which needs to have wal_level set to minimal to allow the problem to show up, and that is not the default configuration. Reported-by: Etsuro Fujita Author: Michael Paquier Reviewed-by: Etsuro Fujita Discussion: https://postgr.es/m/15552-c64aa14c5c22f63c@postgresql.org Backpatch-through: 10, where support for COPY on views has been added, while v11 has added support for COPY on foreign tables.
* Fix ancient compiler warnings and typos in !HAVE_SYMLINK codePeter Eisentraut2018-12-22
| | | | This has never been correct since this code was introduced.
* Check for conflicting queries during replay of gistvacuumpage()Alexander Korotkov2018-12-21
| | | | | | | | | | | | | | | | | | | | | | | 013ebc0a7b implements so-called GiST microvacuum. That is gistgettuple() marks index tuples as dead when kill_prior_tuple is set. Later, when new tuple insertion claims page space, those dead index tuples are physically deleted from page. When this deletion is replayed on standby, it might conflict with read-only queries. But 013ebc0a7b doesn't handle this. That may lead to disappearance of some tuples from read-only snapshots on standby. This commit implements resolving of conflicts between replay of GiST microvacuum and standby queries. On the master we implement new WAL record type XLOG_GIST_DELETE, which comprises necessary information. On stable releases we've to be tricky to keep WAL compatibility. Information required for conflict processing is just appended to data of XLOG_GIST_PAGE_UPDATE record. So, PostgreSQL version, which doesn't know about conflict processing, will just ignore that. Reported-by: Andres Freund Diagnosed-by: Andres Freund Discussion: https://postgr.es/m/20181212224524.scafnlyjindmrbe6%40alap3.anarazel.de Author: Alexander Korotkov Backpatch-through: 9.6
* Fix lock level used for partition when detaching itAlvaro Herrera2018-12-20
| | | | | | | | | | | | | | For probably bogus reasons, we acquire only AccessShareLock on the partition when we try to detach it from its parent partitioned table. This can cause ugly things to happen if another transaction is doing any sort of DDL to the partition concurrently. Upgrade that lock to ShareUpdateExclusiveLock, which per discussion seems to be the minimum needed. Reported by Robert Haas. Discussion: https://postgr.es/m/CA+TgmoYruJQ+2qnFLtF1xQtr71pdwgfxy3Ziy-TxV28M6pEmyA@mail.gmail.com
* Fix ADD IF NOT EXISTS used in conjunction with ALTER TABLE ONLYGreg Stark2018-12-19
| | | | | | The flag for IF NOT EXISTS was only being passed down in the normal recursing case. It's been this way since originally added in 9.6 in commit 2cd40adb85 so backpatch back to 9.6.
* Fix ancient thinko in mergejoin cost estimation.Tom Lane2018-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | "rescanratio" was computed as 1 + rescanned-tuples / total-inner-tuples, which is sensible if it's to be multiplied by total-inner-tuples or a cost value corresponding to scanning all the inner tuples. But in reality it was (mostly) multiplied by inner_rows or a related cost, numbers that take into account the possibility of stopping short of scanning the whole inner relation thanks to a limited key range in the outer relation. This'd still make sense if we could expect that stopping short would result in a proportional decrease in the number of tuples that have to be rescanned. It does not, however. The argument that establishes the validity of our estimate for that number is independent of whether we scan all of the inner relation or stop short, and experimentation also shows that stopping short doesn't reduce the number of rescanned tuples. So the correct calculation is 1 + rescanned-tuples / inner_rows, and we should be sure to multiply that by inner_rows or a corresponding cost value. Most of the time this doesn't make much difference, but if we have both a high rescan rate (due to lots of duplicate values) and an outer key range much smaller than the inner key range, then the error can be significant, leading to a large underestimate of the cost associated with rescanning. Per report from Vijaykumar Jain. This thinko appears to go all the way back to the introduction of the rescan estimation logic in commit 70fba7043, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAE7uO5hMb_TZYJcZmLAgO6iD68AkEK6qCe7i=vZUkCpoKns+EQ@mail.gmail.com
* Remove extra semicolons.Amit Kapila2018-12-17
| | | | | | | | Reported-by: David Rowley Author: David Rowley Reviewed-by: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/CAKJS1f8EneeYyzzvdjahVZ6gbAHFkHbSFB5m_C0Y6TUJs9Dgdg@mail.gmail.com
* Fix use-after-free bug when renaming constraintsMichael Paquier2018-12-17
| | | | | | | | | This is an oversight from recent commit b13fd344. While on it, tweak the previous test with a better name for the renamed primary key. Detected by buildfarm member prion which forces relation cache release with -DRELCACHE_FORCE_RELEASE. Back-patch down to 9.4 as the previous commit.
* Make constraint rename issue relcache invalidation on target relationMichael Paquier2018-12-17
| | | | | | | | | | | | | | | When a constraint gets renamed, it may have associated with it a target relation (for example domain constraints don't have one). Not invalidating the target relation cache when issuing the renaming can result in issues with subsequent commands that refer to the old constraint name using the relation cache, causing various failures. One pattern spotted was using CREATE TABLE LIKE after a constraint renaming. Reported-by: Stuart <sfbarbee@gmail.com> Author: Amit Langote Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/2047094.V130LYfLq4@station53.ousa.org
* Make error handling in parallel pg_upgrade less bogus.Tom Lane2018-12-16
| | | | | | | | | | | | reap_child() basically ignored the possibility of either an error in waitpid() itself or a child process failure on signal. We don't really need to do more than report and crash hard, but proceeding as though nothing is wrong is definitely Not Acceptable. The error report for nonzero child exit status was pretty off-point, as well. Noted while fooling around with child-process failure detection logic elsewhere. It's been like this a long time, so back-patch to all supported branches.
* Improve detection of child-process SIGPIPE failures.Tom Lane2018-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ffa4cbd62 added logic to detect SIGPIPE failure of a COPY child process, but it only worked correctly if the SIGPIPE occurred in the immediate child process. Depending on the shell in use and the complexity of the shell command string, we might instead get back an exit code of 128 + SIGPIPE, representing a shell error exit reporting SIGPIPE in the child process. We could just hack up ClosePipeToProgram() to add the extra case, but it seems like this is a fairly general issue deserving a more general and better-documented solution. I chose to add a couple of functions in src/common/wait_error.c, which is a natural place to know about wait-result encodings, that will test for either a specific child-process signal type or any child-process signal failure. Then, adjust other places that were doing ad-hoc tests of this type to use the common functions. In RestoreArchivedFile, this fixes a race condition affecting whether the process will report an error or just silently proc_exit(1): before, that depended on whether the intermediate shell got SIGTERM'd itself or reported a child process failing on SIGTERM. Like the previous patch, back-patch to v10; we could go further but there seems no real need to. Per report from Erik Rijkers. Discussion: https://postgr.es/m/f3683f87ab1701bea5d86a7742b22432@xs4all.nl
* Fix bogus logic for skipping unnecessary partcollation dependencies.Tom Lane2018-12-13
| | | | | | | | | The idea here is to not call recordDependencyOn for the default collation, since we know that's pinned. But what the code actually did was to record the partition key's dependency on the opclass twice, instead. Evidently introduced by sloppy coding in commit 2186b608b. Back-patch to v10 where that came in.
* Prevent GIN deleted pages from being reclaimed too earlyAlexander Korotkov2018-12-13
| | | | | | | | | | | | | | | | | | When GIN vacuum deletes a posting tree page, it assumes that no concurrent searchers can access it, thanks to ginStepRight() locking two pages at once. However, since 9.4 searches can skip parts of posting trees descending from the root. That leads to the risk that page is deleted and reclaimed before concurrent search can access it. This commit prevents the risk of above by waiting for every transaction, which might wait to reference this page, to finish. Due to binary compatibility we can't change GinPageOpaqueData to store corresponding transaction id. Instead we reuse page header pd_prune_xid field, which is unused in index pages. Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Andrey Borodin, Alexander Korotkov Reviewed-by: Alexander Korotkov Backpatch-through: 9.4
* Prevent deadlock in ginRedoDeletePage()Alexander Korotkov2018-12-13
| | | | | | | | | | | | | | | | | | | | On standby ginRedoDeletePage() can work concurrently with read-only queries. Those queries can traverse posting tree in two ways. 1) Using rightlinks by ginStepRight(), which locks the next page before unlocking its left sibling. 2) Using downlinks by ginFindLeafPage(), which locks at most one page at time. Original lock order was: page, parent, left sibling. That lock order can deadlock with ginStepRight(). In order to prevent deadlock this commit changes lock order to: left sibling, page, parent. Note, that position of parent in locking order seems insignificant, because we only lock one page at time while traversing downlinks. Reported-by: Chen Huajun Diagnosed-by: Chen Huajun, Peter Geoghegan, Andrey Borodin Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Alexander Korotkov Backpatch-through: 9.4
* Fix deadlock in GIN vacuum introduced by 218f51584d5Alexander Korotkov2018-12-13
| | | | | | | | | | | | | | | | | | | | | Before 218f51584d5 if posting tree page is about to be deleted, then the whole posting tree is locked by LockBufferForCleanup() on root preventing all the concurrent inserts. 218f51584d5 reduced locking to the subtree containing page to be deleted. However, due to concurrent parent split, inserter doesn't always holds pins on all the pages constituting path from root to the target leaf page. That could cause a deadlock between GIN vacuum process and GIN inserter. And we didn't find non-invasive way to fix this. This commit reverts VACUUM behavior to lock the whole posting tree before delete any page. However, we keep another useful change by 218f51584d5: the tree is locked only if there are pages to be deleted. Reported-by: Chen Huajun Diagnosed-by: Chen Huajun, Andrey Borodin, Peter Geoghegan Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Alexander Korotkov, based on ideas from Andrey Borodin and Peter Geoghegan Reviewed-by: Andrey Borodin Backpatch-through: 10
* Repair bogus EPQ plans generated for postgres_fdw foreign joins.Tom Lane2018-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | postgres_fdw's postgresGetForeignPlan() assumes without checking that the outer_plan it's given for a join relation must have a NestLoop, MergeJoin, or HashJoin node at the top. That's been wrong at least since commit 4bbf6edfb (which could cause insertion of a Sort node on top) and it seems like a pretty unsafe thing to Just Assume even without that. Through blind good fortune, this doesn't seem to have any worse consequences today than strange EXPLAIN output, but it's clearly trouble waiting to happen. To fix, test the node type explicitly before touching Join-specific fields, and avoid jamming the new tlist into a node type that can't do projection. Export a new support function from createplan.c to avoid building low-level knowledge about the latter into FDWs. Back-patch to 9.6 where the faulty coding was added. Note that the associated regression test cases don't show any changes before v11, apparently because the tests back-patched with 4bbf6edfb don't actually exercise the problem case before then (there's no top-level Sort in those plans). Discussion: https://postgr.es/m/8946.1544644803@sss.pgh.pa.us
* Repair bogus handling of multi-assignment Params in upper plan levels.Tom Lane2018-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our support for multiple-set-clauses in UPDATE assumes that the Params referencing a MULTIEXPR_SUBLINK SubPlan will appear before that SubPlan in the targetlist of the plan node that calculates the updated row. (Yeah, it's a hack...) In some PG branches it's possible that a Result node gets inserted between the primary calculation of the update tlist and the ModifyTable node. setrefs.c did the wrong thing in this case and left the upper-level Params as Params, causing a crash at runtime. What it should do is replace them with "outer" Vars referencing the child plan node's output. That's a result of careless ordering of operations in fix_upper_expr_mutator, so we can fix it just by reordering the code. Fix fix_join_expr_mutator similarly for consistency, even though join nodes could never appear in such a context. (In general, it seems likely to be a bit cheaper to use Vars than Params in such situations anyway, so this patch might offer a tiny performance improvement.) The hazard extends back to 9.5 where the MULTIEXPR_SUBLINK stuff was introduced, so back-patch that far. However, this may be a live bug only in 9.6.x and 10.x, as the other branches don't seem to want to calculate the final tlist below the Result node. (That plan shape change between branches might be a mini-bug in itself, but I'm not really interested in digging into the reasons for that right now. Still, add a regression test memorializing what we expect there, so we'll notice if it changes again.) Per bug report from Eduards Bezverhijs. Discussion: https://postgr.es/m/b6cd572a-3e44-8785-75e9-c512a5a17a73@tieto.com
* Fix test_rls_hooks to assign expression collations properly.Tom Lane2018-12-11
| | | | | | | | | | | | This module overlooked this necessary fixup step on the results of transformWhereClause(). It accidentally worked anyway, because the constructed expression involved type "name" which is not collatable, but it fell over while I was experimenting with changing "name" to be collatable. Back-patch, not because there's any live bug here in back branches, but because somebody might use this code as a model for some real application and then not understand why it doesn't work.
* Raise some timeouts to 180s, in test code.Noah Misch2018-12-10
| | | | | | | | | | | | Slow runs of buildfarm members chipmunk, hornet and mandrill saw the shorter timeouts expire. The 180s timeout in poll_query_until has been trouble-free since 2a0f89cd717ce6d49cdc47850577823682167e87 introduced it two years ago, so use 180s more widely. Back-patch to 9.6, where the first of these timeouts was introduced. Reviewed by Michael Paquier. Discussion: https://postgr.es/m/20181209001601.GC2973271@rfd.leadboat.com
* Add stack depth checks to key recursive functions in backend/nodes/*.c.Tom Lane2018-12-10
| | | | | | | | | | Although copyfuncs.c has a check_stack_depth call in its recursion, equalfuncs.c, outfuncs.c, and readfuncs.c lacked one. This seems unwise. Likewise fix planstate_tree_walker(), in branches where that exists. Discussion: https://postgr.es/m/30253.1544286631@sss.pgh.pa.us
* Make TupleDescInitBuiltinEntry throw error for unsupported types.Tom Lane2018-12-10
| | | | | | | | | Previously, it would just pass back a partially-uninitialized tupdesc, which doesn't seem like a safe or useful behavior. Backpatch to v10 where this code came in. Discussion: https://postgr.es/m/30830.1544384975@sss.pgh.pa.us
* Fix misapplication of pgstat_count_truncate to wrong relation.Tom Lane2018-12-07
| | | | | | | | | | | | | | | | | | | | | | | | The stanza of ExecuteTruncate[Guts] that truncates a target table's toast relation re-used the loop local variable "rel" to reference the toast rel. This was safe enough when written, but commit d42358efb added code below that that supposed "rel" still pointed to the parent table. Therefore, the stats counter update was applied to the wrong relcache entry (the toast rel not the user rel); and if we were unlucky and that relcache entry had been flushed during reindex_relation, very bad things could ensue. (I'm surprised that CLOBBER_CACHE_ALWAYS testing hasn't found this. I'm even more surprised that the problem wasn't detected during the development of d42358efb; it must not have been tested in any case with a toast table, as the incorrect stats counts are very obvious.) To fix, replace use of "rel" in that code branch with a more local variable. Adjust test cases added by d42358efb so that some of them use tables with toast tables. Per bug #15540 from Pan Bian. Back-patch to 9.5 where d42358efb came in. Discussion: https://postgr.es/m/15540-01078812338195c0@postgresql.org
* Clean up sloppy coding in publicationcmds.c's OpenTableList().Tom Lane2018-12-07
| | | | | | | | | | | | | | Remove dead code (which would be incorrect if it weren't dead), per report from Pan Bian. Add a CHECK_FOR_INTERRUPTS in the inner loop over child relations, because there's little point in having one in the outer loop if there's not one here too. Minor stylistic adjustments and comment improvements. Seems to be aboriginal to this code (cf commit 665d1fad9). Back-patch to v10 where that came in, not because any of this is significant, but just to keep the branches looking similar. Discussion: https://postgr.es/m/15539-06d00ef6b1e2e1bb@postgresql.org
* Improve our response to invalid format strings, and detect more cases.Tom Lane2018-12-06
| | | | | | | | | | | | | | | | | | | | | | | Places that are testing for *printf failure ought to include the format string in their error reports, since bad-format-string is one of the more likely causes of such failure. This both makes it easier to find and repair the mistake, and provides at least some useful info to the user who stumbles across such a problem. Also, tighten snprintf.c to report EINVAL for an invalid flag or final character in a format %-spec (including the case where the %-spec is missing a final character altogether). This seems like better project policy, and it also allows removing an instruction or two from the hot code path. Back-patch the error reporting change in pvsnprintf, since it should be harmless and may be helpful; but not the snprintf.c change. Per discussion of bug #15511 from Ertuğrul Kahveci, which reported an invalid translated format string. These changes don't fix that error, but they should improve matters next time we make such a mistake. Discussion: https://postgr.es/m/15511-1d8b6a0bc874112f@postgresql.org