aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Create action triggers when partitions are detachedAlvaro Herrera2019-01-21
| | | | | | | | | | | | | | | | Detaching a partition from a partitioned table that's constrained by foreign keys requires additional action triggers on the referenced side; otherwise, DELETE/UPDATE actions there fail to notice rows in the table that was partition, and so are incorrectly allowed through. With this commit, those triggers are now created. Conversely, when a table that has a foreign key is attached as a partition to a table that also has the same foreign key, those action triggers are no longer needed, so we remove them. Add a minimal test case verifying (part of) this. Authors: Amit Langote, Álvaro Herrera Discussion: https://postgr.es/m/f2b8ead5-4131-d5a8-8016-2ea0a31250af@lab.ntt.co.jp
* Flush relcache entries when their FKs are meddled withAlvaro Herrera2019-01-21
| | | | | | | | | | | | | | | Back in commit 100340e2dcd0, we made relcache entries keep lists of the foreign keys applying to the relation -- but we forgot to update CacheInvalidateHeapTuple to flush those entries when new FKs got created or existing ones updated/deleted. No bugs appear to have been reported that would be explained by this ommission, but I noticed the problem while working on an unrelated bugfix which clearly showed it. Fix by adding relcache flush on relevant foreign key changes. Backpatch to 9.6, like the aforementioned commit. Discussion: https://postgr.es/m/201901211927.7mmhschxlejh@alvherre.pgsql Reviewed-by: Tom Lane
* Postpone generating tlists and EC members for inheritance dummy children.Etsuro Fujita2019-01-21
| | | | | | | | | | | | | | | | | | | | | Previously, in set_append_rel_size(), we generated tlists and EC members for dummy children for possible use by partition-wise join, even if partition-wise join was disabled or the top parent was not a partitioned table, but adding such EC members causes noticeable planning speed degradation for queries with certain kinds of join quals like "(foo.x + bar.y) = constant" where foo and bar are partitioned tables in cases where there are lots of dummy children, as the EC members lists grow huge, especially for the ECs derived from such join quals, which makes the search for the parent EC members in add_child_rel_equivalences() very time-consuming. Postpone the work until such children are actually involved in a partition-wise join. Reported-by: Sanyo Capobiango Analyzed-by: Justin Pryzby and Alvaro Herrera Author: Amit Langote, with a few additional changes by me Reviewed-by: Ashutosh Bapat Backpatch-through: v11 where partition-wise join was added Discussion: https://postgr.es/m/CAO698qZnrxoZu7MEtfiJmpmUtz3AVYFVnwzR%2BpqjF%3DrmKBTgpw%40mail.gmail.com
* Fix outdated commentPeter Eisentraut2019-01-19
| | | | | The issue the comment is referring to was fixed by 08859bb5c2cebc132629ca838113d27bb31b990c.
* Fix creation of duplicate foreign keys on partitionsAlvaro Herrera2019-01-18
| | | | | | | | | | | | | | | | | | When creating a foreign key in a partitioned table, if some partitions already have equivalent constraints, we wastefully create duplicates of the constraints instead of attaching to the existing ones. That's inconsistent with the de-duplication that is applied when a table is attached as a partition. To fix, reuse the FK-cloning code instead of having a separate code path. Backpatch to Postgres 11. This is a subtle behavior change, but surely a welcome one since there's no use in having duplicate foreign keys. Discovered by Álvaro Herrera while thinking about a different problem reported by Jesper Pedersen (bug #15587). Author: Álvaro Herrera Discussion: https://postgr.es/m/201901151935.zfadrzvyof4k@alvherre.pgsql
* Move CloneForeignKeyConstraints to tablecmds.cAlvaro Herrera2019-01-18
| | | | | | | | | | | | | | | My commit 3de241dba86f introduced some code to create a clone of a foreign key to a partition, but I put it in pg_constraint.c because it was too close to the contents of the pg_constraint row. With the previous commit that split out the constraint tuple deconstruction into its own routine, it makes more sense to have the FK-cloning function in tablecmds.c, mostly because its static subroutine can then be used by a future bugfix. My initial posting of this patch had this routine as static in tablecmds.c, but sadly this function is already part of the Postgres 11 ABI as exported from pg_constraint.c, so keep it as exported also just to avoid breaking any possible users of it.
* Refactor duplicate code into DeconstructFkConstraintRowAlvaro Herrera2019-01-18
| | | | | | | | | | | My commit 3de241dba86f introduced some code (in tablecmds.c) to obtain data from a pg_constraint row for a foreign key, that already existed in ri_triggers.c. Split it out into its own routine in pg_constraint.c, where it naturally belongs. No functional code changes, only code movement. Backpatch to pg11, because a future bugfix is simpler after this.
* Restrict the use of temporary namespace in two-phase transactionsMichael Paquier2019-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attempting to use a temporary table within a two-phase transaction is forbidden for ages. However, there have been uncovered grounds for a couple of other object types and commands which work on temporary objects with two-phase commit. In short, trying to create, lock or drop an object on a temporary schema should not be authorized within a two-phase transaction, as it would cause its state to create dependencies with other sessions, causing all sorts of side effects with the existing session or other sessions spawned later on trying to use the same temporary schema name. Regression tests are added to cover all the grounds found, the original report mentioned function creation, but monitoring closer there are many other patterns with LOCK, DROP or CREATE EXTENSION which are involved. One of the symptoms resulting in combining both is that the session which used the temporary schema is not able to shut down completely, waiting for being able to drop the temporary schema, something that it cannot complete because of the two-phase transaction involved with temporary objects. In this case the client is able to disconnect but the session remains alive on the backend-side, potentially blocking connection backend slots from being used. Other problems reported could also involve server crashes. This is back-patched down to v10, which is where 9b013dc has introduced MyXactFlags, something that this patch relies on. Reported-by: Alexey Bashtanov Author: Michael Paquier Reviewed-by: Masahiko Sawada Discussion: https://postgr.es/m/5d910e2e-0db8-ec06-dd5f-baec420513c3@imap.cc Backpatch-through: 10
* Postpone aggregate checks until after collation is assigned.Andrew Gierth2019-01-17
| | | | | | | | | | | | | | | | | | | | | | | Previously, parseCheckAggregates was run before assign_query_collations, but this causes problems if any expression has already had a collation assigned by some transform function (e.g. transformCaseExpr) before parseCheckAggregates runs. The differing collations would cause expressions not to be recognized as equal to the ones in the GROUP BY clause, leading to spurious errors about unaggregated column references. The result was that CASE expr WHEN val ... would fail when "expr" contained a GROUPING() expression or matched one of the group by expressions, and where collatable types were involved; whereas the supposedly identical CASE WHEN expr = val ... would succeed. Backpatch all the way; this appears to have been wrong ever since collations were introduced. Per report from Guillaume Lelarge, analysis and patch by me. Discussion: https://postgr.es/m/CAECtzeVSO_US8C2Khgfv54ZMUOBR4sWq+6_bLrETnWExHT=rFg@mail.gmail.com Discussion: https://postgr.es/m/87muo0k0c7.fsf@news-spur.riddles.org.uk
* Fix typos in documentation and for one wait eventMichael Paquier2019-01-15
| | | | | | | | | These have been found while cross-checking for the use of unique words in the documentation, and a wait event was not getting generated in a way consistent to what the documentation provided. Author: Alexander Lakhin Discussion: https://postgr.es/m/9b5a3a85-899a-ae62-dbab-1e7943aa5ab1@gmail.com
* Fix unique INCLUDE indexes on partitioned tablesAlvaro Herrera2019-01-14
| | | | | | | | | | We were considering the INCLUDE columns as part of the key, allowing unicity-violating rows to be inserted in different partitions. Concurrent development conflict in eb7ed3f30634 and 8224de4f42cc. Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20190109065109.GA4285@telsasoft.com
* Free pre-modification HeapTuple in ALTER TABLE ... TYPE ...Andrew Dunstan2019-01-11
| | | | | | | | This was an oversight in commit 3b174b1a3. Per offline gripe from Alvaro Herrera Backpatch to release 11.
* Avoid sharing PARAM_EXEC slots between different levels of NestLoop.Tom Lane2019-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now, createplan.c attempted to share PARAM_EXEC slots for NestLoopParams across different plan levels, if the same underlying Var was being fed down to different righthand-side subplan trees by different NestLoops. This was, I think, more of an artifact of using subselect.c's PlannerParamItem infrastructure than an explicit design goal, but anyway that was the end result. This works well enough as long as the plan tree is executing synchronously, but the feature whereby Gather can execute the parallelized subplan locally breaks it. An upper NestLoop node might execute for a row retrieved from a parallel worker, and assign a value for a PARAM_EXEC slot from that row, while the leader's copy of the parallelized subplan is suspended with a different active value of the row the Var comes from. When control eventually returns to the leader's subplan, it gets the wrong answers if the same PARAM_EXEC slot is being used within the subplan, as reported in bug #15577 from Bartosz Polnik. This is pretty reminiscent of the problem fixed in commit 46c508fbc, and the proper fix seems to be the same: don't try to share PARAM_EXEC slots across different levels of controlling NestLoop nodes. This requires decoupling NestLoopParam handling from PlannerParamItem handling, although the logic remains somewhat similar. To avoid bizarre division of labor between subselect.c and createplan.c, I decided to move all the param-slot-assignment logic for both cases out of those files and put it into a new file paramassign.c. Hopefully it's a bit better documented now, too. A regression test case for this might be nice, but we don't know a test case that triggers the problem with a suitably small amount of data. Back-patch to 9.6 where we added Gather nodes. It's conceivable that related problems exist in older branches; but without some evidence for that, I'll leave the older branches alone. Discussion: https://postgr.es/m/15577-ca61ab18904af852@postgresql.org
* Fix missing values when doing ALTER TABLE ALTER COLUMN TYPEAndrew Dunstan2019-01-10
| | | | | | | | | | | | | | This was an oversight in commit 16828d5c. If the table is going to be rewritten, we simply clear all the missing values from all the table's attributes, since there will no longer be any rows with the attributes missing. Otherwise, we repackage the missing value in an array constructed with the new type specifications. Backpatch to release 11. This fixes bug #15446, reported by Dmitry Molotkov Reviewed by Dean Rasheed
* Improve ANALYZE's handling of concurrent-update scenarios.Tom Lane2019-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the rule for whether or not a tuple seen by ANALYZE should be included in its sample. When we last touched this logic, in commit 51e1445f1, we weren't thinking very hard about tuples being UPDATEd by a long-running concurrent transaction. In such a case, we might see the pre-image as either LIVE or DELETE_IN_PROGRESS depending on timing; and we might see the post-image not at all, or as INSERT_IN_PROGRESS. Since the existing code will not sample either DELETE_IN_PROGRESS or INSERT_IN_PROGRESS tuples, this leads to concurrently-updated rows being omitted from the sample entirely. That's not very helpful, and it's especially the wrong thing if the concurrent transaction ends up rolling back. The right thing seems to be to sample DELETE_IN_PROGRESS rows just as if they were live. This makes the "sample it" and "count it" decisions the same, which seems good for consistency. It's clearly the right thing if the concurrent transaction ends up rolling back; in effect, we are sampling as though IN_PROGRESS transactions haven't happened yet. Also, this combination of choices ensures maximum robustness against the different combinations of whether and in which state we might see the pre- and post-images of an update. It's slightly annoying that we end up recording immediately-out-of-date stats in the case where the transaction does commit, but on the other hand the stats are fine for columns that didn't change in the update. And the alternative of sampling INSERT_IN_PROGRESS rows instead seems like a bad idea, because then the sampling would be inconsistent with the way rows are counted for the stats report. Per report from Mark Chambers; thanks to Jeff Janes for diagnosing what was happening. Back-patch to all supported versions. Discussion: https://postgr.es/m/CAFh58O_Myr6G3tcH3gcGrF-=OExB08PJdWZcSBcEcovaiPsrHA@mail.gmail.com
* Don't believe MinMaxExpr is leakproof without checking.Tom Lane2019-01-02
| | | | | | | | | | | | | | | | MinMaxExpr invokes the btree comparison function for its input datatype, so it's only leakproof if that function is. Many such functions are indeed leakproof, but others are not, and we should not just assume that they are. Hence, adjust contain_leaked_vars to verify the leakproofness of the referenced function explicitly. I didn't add a regression test because it would need to depend on some particular comparison function being leaky, and that's a moving target, per discussion. This has been wrong all along, so back-patch to supported branches. Discussion: https://postgr.es/m/31042.1546194242@sss.pgh.pa.us
* Remove entry tree root conflict checking from GIN predicate lockingAlexander Korotkov2018-12-27
| | | | | | | | | | | | | | According to README we acquire predicate locks on entry tree leafs and posting tree roots. However, when ginFindLeafPage() is going to lock leaf in exclusive mode, then it checks root for conflicts regardless whether it's a entry or posting tree. Assuming that we never place predicate lock on entry tree root (excluding corner case when root is leaf), this check is redundant. This commit removes this check. Now, root conflict checking is controlled by separate argument of ginFindLeafPage(). Discussion: https://postgr.es/m/CAPpHfdv7rrDyy%3DMgsaK-L9kk0AH7az0B-mdC3w3p0FSb9uoyEg%40mail.gmail.com Author: Alexander Korotkov Backpatch-through: 11
* Ignore inherited temp relations from other sessions when truncatingMichael Paquier2018-12-27
| | | | | | | | | | | | | | | | | | Inheritance trees can include temporary tables if the parent is permanent, which makes possible the presence of multiple temporary children from different sessions. Trying to issue a TRUNCATE on the parent in this scenario causes a failure, so similarly to any other queries just ignore such cases, which makes TRUNCATE work transparently. This makes truncation behave similarly to any other DML query working on the parent table with queries which need to be issues on children. A set of isolation tests is added to cover basic cases. Reported-by: Zhou Digoal Author: Amit Langote, Michael Paquier Discussion: https://postgr.es/m/15565-ce67a48d0244436a@postgresql.org Backpatch-through: 9.4
* Prioritize history files when archivingMichael Paquier2018-12-24
| | | | | | | | | | | | | | | | | | | | | | | | At the end of recovery for the post-promotion process, a new history file is created followed by the last partial segment of the previous timeline. Based on the timing, the archiver would first try to archive the last partial segment and then the history file. This can delay the detection of a new timeline taken, particularly depending on the time it takes to transfer the last partial segment as it delays the moment the history file of the new timeline gets archived. This can cause promoted standbys to use the same timeline as one already taken depending on the circumstances if multiple instances look at archives at the same location. This commit changes the order of archiving so as history files are archived in priority over other file types, which reduces the likelihood of the same timeline being taken (still not reducing the window to zero), and it makes the archiver behave more consistently with the startup process doing its post-promotion business. Author: David Steele Reviewed-by: Michael Paquier, Kyotaro Horiguchi Discussion: https://postgr.es/m/929068cf-69e1-bba2-9dc0-e05986aed471@pgmasters.net Backpatch-through: 9.5
* Disable WAL-skipping optimization for COPY on views and foreign tablesMichael Paquier2018-12-23
| | | | | | | | | | | | | | | | | | | | COPY can skip writing WAL when loading data on a table which has been created in the same transaction as the one loading the data, however this cannot work on views or foreign table as this would result in trying to flush relation files which do not exist. So disable the optimization so as commands are able to work the same way with any configuration of wal_level. Tests are added to cover the different cases, which need to have wal_level set to minimal to allow the problem to show up, and that is not the default configuration. Reported-by: Luis M. Carril, Etsuro Fujita Author: Amit Langote, Michael Paquier Reviewed-by: Etsuro Fujita Discussion: https://postgr.es/m/15552-c64aa14c5c22f63c@postgresql.org Backpatch-through: 10, where support for COPY on views has been added, while v11 has added support for COPY on foreign tables.
* Check for conflicting queries during replay of gistvacuumpage()Alexander Korotkov2018-12-21
| | | | | | | | | | | | | | | | | | | | | | | 013ebc0a7b implements so-called GiST microvacuum. That is gistgettuple() marks index tuples as dead when kill_prior_tuple is set. Later, when new tuple insertion claims page space, those dead index tuples are physically deleted from page. When this deletion is replayed on standby, it might conflict with read-only queries. But 013ebc0a7b doesn't handle this. That may lead to disappearance of some tuples from read-only snapshots on standby. This commit implements resolving of conflicts between replay of GiST microvacuum and standby queries. On the master we implement new WAL record type XLOG_GIST_DELETE, which comprises necessary information. On stable releases we've to be tricky to keep WAL compatibility. Information required for conflict processing is just appended to data of XLOG_GIST_PAGE_UPDATE record. So, PostgreSQL version, which doesn't know about conflict processing, will just ignore that. Reported-by: Andres Freund Diagnosed-by: Andres Freund Discussion: https://postgr.es/m/20181212224524.scafnlyjindmrbe6%40alap3.anarazel.de Author: Alexander Korotkov Backpatch-through: 9.6
* Fix lock level used for partition when detaching itAlvaro Herrera2018-12-20
| | | | | | | | | | | | | | For probably bogus reasons, we acquire only AccessShareLock on the partition when we try to detach it from its parent partitioned table. This can cause ugly things to happen if another transaction is doing any sort of DDL to the partition concurrently. Upgrade that lock to ShareUpdateExclusiveLock, which per discussion seems to be the minimum needed. Reported by Robert Haas. Discussion: https://postgr.es/m/CA+TgmoYruJQ+2qnFLtF1xQtr71pdwgfxy3Ziy-TxV28M6pEmyA@mail.gmail.com
* DETACH PARTITION: hold locks on indexes until end of transactionAlvaro Herrera2018-12-20
| | | | | | | | | | | | | | | | | | | | When a partition is detached from its parent, we acquire locks on all attached indexes to also detach them ... but we release those locks immediately. This is a violation of the policy of keeping locks on user objects to the end of the transaction. Bug introduced in 8b08f7d4820f. It's unclear that there are any ill effects possible, but it's clearly wrong nonetheless. It's likely that bad behavior *is* possible, but mostly because the relation that the index is for is only locked with AccessShareLock, which is an older bug that shall be fixed separately. While touching that line of code, close the index opened with index_open() using index_close() instead of relation_close(). No difference in practice, but let's be consistent. Unearthed by Robert Haas. Discussion: https://postgr.es/m/CA+TgmoYruJQ+2qnFLtF1xQtr71pdwgfxy3Ziy-TxV28M6pEmyA@mail.gmail.com
* Fix ADD IF NOT EXISTS used in conjunction with ALTER TABLE ONLYGreg Stark2018-12-19
| | | | | | The flag for IF NOT EXISTS was only being passed down in the normal recursing case. It's been this way since originally added in 9.6 in commit 2cd40adb85 so backpatch back to 9.6.
* Correct obsolete nbtree recovery comments.Peter Geoghegan2018-12-18
| | | | | | | | | | | | | | | Commit 40dae7ec537, which made the handling of interrupted nbtree page splits more robust, removed an nbtree-specific end-of-recovery cleanup step. This meant that it was no longer possible to complete an interrupted page split during recovery. However, a reference to recovery as a reason for using a NULL stack while inserting into a parent page was missed. Remove the reference. Remove a similar obsolete reference to recovery that was introduced much more recently, as part of the btree fastpath optimization enhancement that made it into Postgres 11 (commit 2b272734, and follow-up commits). Backpatch: 11-, where the fastpath optimization was introduced.
* Fix ancient thinko in mergejoin cost estimation.Tom Lane2018-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | "rescanratio" was computed as 1 + rescanned-tuples / total-inner-tuples, which is sensible if it's to be multiplied by total-inner-tuples or a cost value corresponding to scanning all the inner tuples. But in reality it was (mostly) multiplied by inner_rows or a related cost, numbers that take into account the possibility of stopping short of scanning the whole inner relation thanks to a limited key range in the outer relation. This'd still make sense if we could expect that stopping short would result in a proportional decrease in the number of tuples that have to be rescanned. It does not, however. The argument that establishes the validity of our estimate for that number is independent of whether we scan all of the inner relation or stop short, and experimentation also shows that stopping short doesn't reduce the number of rescanned tuples. So the correct calculation is 1 + rescanned-tuples / inner_rows, and we should be sure to multiply that by inner_rows or a corresponding cost value. Most of the time this doesn't make much difference, but if we have both a high rescan rate (due to lots of duplicate values) and an outer key range much smaller than the inner key range, then the error can be significant, leading to a large underestimate of the cost associated with rescanning. Per report from Vijaykumar Jain. This thinko appears to go all the way back to the introduction of the rescan estimation logic in commit 70fba7043, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAE7uO5hMb_TZYJcZmLAgO6iD68AkEK6qCe7i=vZUkCpoKns+EQ@mail.gmail.com
* Remove extra semicolons.Amit Kapila2018-12-17
| | | | | | | | Reported-by: David Rowley Author: David Rowley Reviewed-by: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/CAKJS1f8EneeYyzzvdjahVZ6gbAHFkHbSFB5m_C0Y6TUJs9Dgdg@mail.gmail.com
* Fix use-after-free bug when renaming constraintsMichael Paquier2018-12-17
| | | | | | | | | This is an oversight from recent commit b13fd344. While on it, tweak the previous test with a better name for the renamed primary key. Detected by buildfarm member prion which forces relation cache release with -DRELCACHE_FORCE_RELEASE. Back-patch down to 9.4 as the previous commit.
* Make constraint rename issue relcache invalidation on target relationMichael Paquier2018-12-17
| | | | | | | | | | | | | | | When a constraint gets renamed, it may have associated with it a target relation (for example domain constraints don't have one). Not invalidating the target relation cache when issuing the renaming can result in issues with subsequent commands that refer to the old constraint name using the relation cache, causing various failures. One pattern spotted was using CREATE TABLE LIKE after a constraint renaming. Reported-by: Stuart <sfbarbee@gmail.com> Author: Amit Langote Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/2047094.V130LYfLq4@station53.ousa.org
* Improve detection of child-process SIGPIPE failures.Tom Lane2018-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ffa4cbd62 added logic to detect SIGPIPE failure of a COPY child process, but it only worked correctly if the SIGPIPE occurred in the immediate child process. Depending on the shell in use and the complexity of the shell command string, we might instead get back an exit code of 128 + SIGPIPE, representing a shell error exit reporting SIGPIPE in the child process. We could just hack up ClosePipeToProgram() to add the extra case, but it seems like this is a fairly general issue deserving a more general and better-documented solution. I chose to add a couple of functions in src/common/wait_error.c, which is a natural place to know about wait-result encodings, that will test for either a specific child-process signal type or any child-process signal failure. Then, adjust other places that were doing ad-hoc tests of this type to use the common functions. In RestoreArchivedFile, this fixes a race condition affecting whether the process will report an error or just silently proc_exit(1): before, that depended on whether the intermediate shell got SIGTERM'd itself or reported a child process failing on SIGTERM. Like the previous patch, back-patch to v10; we could go further but there seems no real need to. Per report from Erik Rijkers. Discussion: https://postgr.es/m/f3683f87ab1701bea5d86a7742b22432@xs4all.nl
* Fix bogus logic for skipping unnecessary partcollation dependencies.Tom Lane2018-12-13
| | | | | | | | | The idea here is to not call recordDependencyOn for the default collation, since we know that's pinned. But what the code actually did was to record the partition key's dependency on the opclass twice, instead. Evidently introduced by sloppy coding in commit 2186b608b. Back-patch to v10 where that came in.
* Prevent GIN deleted pages from being reclaimed too earlyAlexander Korotkov2018-12-13
| | | | | | | | | | | | | | | | | | When GIN vacuum deletes a posting tree page, it assumes that no concurrent searchers can access it, thanks to ginStepRight() locking two pages at once. However, since 9.4 searches can skip parts of posting trees descending from the root. That leads to the risk that page is deleted and reclaimed before concurrent search can access it. This commit prevents the risk of above by waiting for every transaction, which might wait to reference this page, to finish. Due to binary compatibility we can't change GinPageOpaqueData to store corresponding transaction id. Instead we reuse page header pd_prune_xid field, which is unused in index pages. Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Andrey Borodin, Alexander Korotkov Reviewed-by: Alexander Korotkov Backpatch-through: 9.4
* Prevent deadlock in ginRedoDeletePage()Alexander Korotkov2018-12-13
| | | | | | | | | | | | | | | | | | | | On standby ginRedoDeletePage() can work concurrently with read-only queries. Those queries can traverse posting tree in two ways. 1) Using rightlinks by ginStepRight(), which locks the next page before unlocking its left sibling. 2) Using downlinks by ginFindLeafPage(), which locks at most one page at time. Original lock order was: page, parent, left sibling. That lock order can deadlock with ginStepRight(). In order to prevent deadlock this commit changes lock order to: left sibling, page, parent. Note, that position of parent in locking order seems insignificant, because we only lock one page at time while traversing downlinks. Reported-by: Chen Huajun Diagnosed-by: Chen Huajun, Peter Geoghegan, Andrey Borodin Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Alexander Korotkov Backpatch-through: 9.4
* Fix deadlock in GIN vacuum introduced by 218f51584d5Alexander Korotkov2018-12-13
| | | | | | | | | | | | | | | | | | | | | Before 218f51584d5 if posting tree page is about to be deleted, then the whole posting tree is locked by LockBufferForCleanup() on root preventing all the concurrent inserts. 218f51584d5 reduced locking to the subtree containing page to be deleted. However, due to concurrent parent split, inserter doesn't always holds pins on all the pages constituting path from root to the target leaf page. That could cause a deadlock between GIN vacuum process and GIN inserter. And we didn't find non-invasive way to fix this. This commit reverts VACUUM behavior to lock the whole posting tree before delete any page. However, we keep another useful change by 218f51584d5: the tree is locked only if there are pages to be deleted. Reported-by: Chen Huajun Diagnosed-by: Chen Huajun, Andrey Borodin, Peter Geoghegan Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Alexander Korotkov, based on ideas from Andrey Borodin and Peter Geoghegan Reviewed-by: Andrey Borodin Backpatch-through: 10
* Repair bogus EPQ plans generated for postgres_fdw foreign joins.Tom Lane2018-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | postgres_fdw's postgresGetForeignPlan() assumes without checking that the outer_plan it's given for a join relation must have a NestLoop, MergeJoin, or HashJoin node at the top. That's been wrong at least since commit 4bbf6edfb (which could cause insertion of a Sort node on top) and it seems like a pretty unsafe thing to Just Assume even without that. Through blind good fortune, this doesn't seem to have any worse consequences today than strange EXPLAIN output, but it's clearly trouble waiting to happen. To fix, test the node type explicitly before touching Join-specific fields, and avoid jamming the new tlist into a node type that can't do projection. Export a new support function from createplan.c to avoid building low-level knowledge about the latter into FDWs. Back-patch to 9.6 where the faulty coding was added. Note that the associated regression test cases don't show any changes before v11, apparently because the tests back-patched with 4bbf6edfb don't actually exercise the problem case before then (there's no top-level Sort in those plans). Discussion: https://postgr.es/m/8946.1544644803@sss.pgh.pa.us
* Repair bogus handling of multi-assignment Params in upper plan levels.Tom Lane2018-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our support for multiple-set-clauses in UPDATE assumes that the Params referencing a MULTIEXPR_SUBLINK SubPlan will appear before that SubPlan in the targetlist of the plan node that calculates the updated row. (Yeah, it's a hack...) In some PG branches it's possible that a Result node gets inserted between the primary calculation of the update tlist and the ModifyTable node. setrefs.c did the wrong thing in this case and left the upper-level Params as Params, causing a crash at runtime. What it should do is replace them with "outer" Vars referencing the child plan node's output. That's a result of careless ordering of operations in fix_upper_expr_mutator, so we can fix it just by reordering the code. Fix fix_join_expr_mutator similarly for consistency, even though join nodes could never appear in such a context. (In general, it seems likely to be a bit cheaper to use Vars than Params in such situations anyway, so this patch might offer a tiny performance improvement.) The hazard extends back to 9.5 where the MULTIEXPR_SUBLINK stuff was introduced, so back-patch that far. However, this may be a live bug only in 9.6.x and 10.x, as the other branches don't seem to want to calculate the final tlist below the Result node. (That plan shape change between branches might be a mini-bug in itself, but I'm not really interested in digging into the reasons for that right now. Still, add a regression test memorializing what we expect there, so we'll notice if it changes again.) Per bug report from Eduards Bezverhijs. Discussion: https://postgr.es/m/b6cd572a-3e44-8785-75e9-c512a5a17a73@tieto.com
* Add stack depth checks to key recursive functions in backend/nodes/*.c.Tom Lane2018-12-10
| | | | | | | | | | Although copyfuncs.c has a check_stack_depth call in its recursion, equalfuncs.c, outfuncs.c, and readfuncs.c lacked one. This seems unwise. Likewise fix planstate_tree_walker(), in branches where that exists. Discussion: https://postgr.es/m/30253.1544286631@sss.pgh.pa.us
* Make TupleDescInitBuiltinEntry throw error for unsupported types.Tom Lane2018-12-10
| | | | | | | | | Previously, it would just pass back a partially-uninitialized tupdesc, which doesn't seem like a safe or useful behavior. Backpatch to v10 where this code came in. Discussion: https://postgr.es/m/30830.1544384975@sss.pgh.pa.us
* Fix misapplication of pgstat_count_truncate to wrong relation.Tom Lane2018-12-07
| | | | | | | | | | | | | | | | | | | | | | | | The stanza of ExecuteTruncate[Guts] that truncates a target table's toast relation re-used the loop local variable "rel" to reference the toast rel. This was safe enough when written, but commit d42358efb added code below that that supposed "rel" still pointed to the parent table. Therefore, the stats counter update was applied to the wrong relcache entry (the toast rel not the user rel); and if we were unlucky and that relcache entry had been flushed during reindex_relation, very bad things could ensue. (I'm surprised that CLOBBER_CACHE_ALWAYS testing hasn't found this. I'm even more surprised that the problem wasn't detected during the development of d42358efb; it must not have been tested in any case with a toast table, as the incorrect stats counts are very obvious.) To fix, replace use of "rel" in that code branch with a more local variable. Adjust test cases added by d42358efb so that some of them use tables with toast tables. Per bug #15540 from Pan Bian. Back-patch to 9.5 where d42358efb came in. Discussion: https://postgr.es/m/15540-01078812338195c0@postgresql.org
* Clean up sloppy coding in publicationcmds.c's OpenTableList().Tom Lane2018-12-07
| | | | | | | | | | | | | | Remove dead code (which would be incorrect if it weren't dead), per report from Pan Bian. Add a CHECK_FOR_INTERRUPTS in the inner loop over child relations, because there's little point in having one in the outer loop if there's not one here too. Minor stylistic adjustments and comment improvements. Seems to be aboriginal to this code (cf commit 665d1fad9). Back-patch to v10 where that came in, not because any of this is significant, but just to keep the branches looking similar. Discussion: https://postgr.es/m/15539-06d00ef6b1e2e1bb@postgresql.org
* Don't mark partitioned indexes invalid unnecessarilyAlvaro Herrera2018-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | When an indexes is created on a partitioned table using ONLY (don't recurse to partitions), it gets marked invalid until index partitions are attached for each table partition. But there's no reason to do this if there are no partitions ... and moreover, there's no way to get the index to become valid afterwards, because all partitions that get created/attached get their own index partition already attached to the parent index, so there's no chance to do ALTER INDEX ... ATTACH PARTITION that would make the parent index valid. Fix by not marking the index as invalid to begin with. This is very similar to 9139aa19423b, but the pg_dump aspect does not appear to be relevant until we add FKs that can point to PKs on partitioned tables. (I tried to cause the pg_upgrade test to break by leaving some of these bogus tables around, but wasn't able to.) Making this change means that an index that was supposed to be invalid in the insert_conflict regression test is no longer invalid; reorder the DDL so that the test continues to verify the behavior we want it to. Author: Álvaro Herrera Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20181203225019.2vvdef2ybnkxt364@alvherre.pgsql
* Fix various checksum check problems for pg_verify_checksums and base backupsMichael Paquier2018-11-30
| | | | | | | | | | | | | | | | | | | | | Three issues are fixed in this patch: - Base backups forgot to ignore files specific to EXEC_BACKEND, leading to spurious warnings when checksums are enabled, per analysis from me. - pg_verify_checksums forgot about files specific to EXEC_BACKEND, leading to failures of the tool on any such build, particularly Windows. This error was originally found by newly-introduced TAP tests in various buildfarm members using EXEC_BACKEND. - pg_verify_checksums forgot to count for temporary files and temporary paths, which could be valid relation files, without checksums, per report from Andres Freund. More tests are added to cover this case. A new test case which emulates corruption for a file in a different tablespace is added, coming from from Michael Banck, while I have coded the main code and refactored the test code. Author: Michael Banck, Michael Paquier Reviewed-by: Stephen Frost, David Steele Discussion: https://postgr.es/m/20181021134206.GA14282@paquier.xyz
* Fix minor typo in dsa.c.Thomas Munro2018-11-29
| | | | | Author: Takeshi Ideriha Discussion: https://postgr.es/m/4E72940DA2BF16479384A86D54D0988A6F3BF22D%40G01JPEXMBKW04
* Fix handling of synchronous replication for stopping WAL sendersMichael Paquier2018-11-29
| | | | | | | | | | | | | | | | | | | This fixes an oversight from c6c3334 which forgot that if a subset of WAL senders are stopping and in a sync state, other WAL senders could still be waiting for a WAL position to be synced while committing a transaction. However the subset of stopping senders would not release waiters, potentially breaking synchronous replication guarantees. This commit makes sure that even WAL senders stopping are able to release waiters and are tracked properly. On 9.4, this can also trigger an assertion failure when setting for example max_wal_senders to 1 where a WAL sender is not able to find itself as in synchronous state when the instance stops. Reported-by: Paul Guo Author: Paul Guo, Michael Paquier Discussion: https://postgr.es/m/CAEET0ZEv8VFqT3C-cQm6byOB4r4VYWcef1J21dOX-gcVhCSpmA@mail.gmail.com Backpatch-through: 9.4
* Have BufFileSize() ereport() on FileSize() failure.Peter Geoghegan2018-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the responsibility for checking for and reporting a failure from the only current BufFileSize() caller, logtape.c, to BufFileSize() itself. Code within buffile.c is generally responsible for interfacing with fd.c to report irrecoverable failures. This seems like a convention that's worth sticking to. Reorganizing things this way makes it easy to make the error message raised in the event of BufFileSize() failure descriptive of the underlying problem. We're now clear on the distinction between temporary file name and BufFile name, and can show errno, confident that its value actually relates to the error being reported. In passing, an existing, similar buffile.c ereport() + errcode_for_file_access() site is changed to follow the same conventions. The API of the function BufFileSize() is changed by this commit, despite already being in a stable release (Postgres 11). This seems acceptable, since the BufFileSize() ABI was changed by commit aa551830421, which hasn't made it into a point release yet. Besides, it's difficult to imagine a third party BufFileSize() caller not just raising an error anyway, since BufFile state should be considered corrupt when BufFileSize() fails. Per complaint from Tom Lane. Discussion: https://postgr.es/m/26974.1540826748@sss.pgh.pa.us Backpatch: 11-, where shared BufFiles were introduced.
* Don't set PAM_RHOST for Unix sockets.Thomas Munro2018-11-28
| | | | | | | | | | | | | | | Since commit 2f1d2b7a we have set PAM_RHOST to "[local]" for Unix sockets. This caused Linux PAM's libaudit integration to make DNS requests for that name. It's not exactly clear what value PAM_RHOST should have in that case, but it seems clear that we shouldn't set it to an unresolvable name, so don't do that. Back-patch to 9.6. Bug #15520. Author: Thomas Munro Reviewed-by: Peter Eisentraut Reported-by: Albert Schabhuetl Discussion: https://postgr.es/m/15520-4c266f986998e1c5%40postgresql.org
* Do not decode TOAST data for table rewritesTomas Vondra2018-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During table rewrites (VACUUM FULL and CLUSTER), the main heap is logged using XLOG / FPI records, and thus (correctly) ignored in decoding. But the associated TOAST table is WAL-logged as plain INSERT records, and so was logically decoded and passed to reorder buffer. That has severe consequences with TOAST tables of non-trivial size. Firstly, reorder buffer has to keep all those changes, possibly spilling them to a file, incurring I/O costs and disk space. Secondly, ReoderBufferCommit() was stashing all those TOAST chunks into a hash table, which got discarded only after processing the row from the main heap. But as the main heap is not decoded for rewrites, this never happened, so all the TOAST data accumulated in memory, resulting either in excessive memory consumption or OOM. The fix is simple, as commit e9edc1ba already introduced infrastructure (namely HEAP_INSERT_NO_LOGICAL flag) to skip logical decoding of TOAST tables, but it only applied it to system tables. So simply use it for all TOAST data in raw_heap_insert(). That would however solve only the memory consumption issue - the TOAST changes would still be decoded and added to the reorder buffer, and spilled to disk (although without TOAST tuple data, so much smaller). But we can solve that by tweaking DecodeInsert() to just ignore such INSERT records altogether, using XLH_INSERT_CONTAINS_NEW_TUPLE flag, instead of skipping them later in ReorderBufferCommit(). Review: Masahiko Sawada Discussion: https://www.postgresql.org/message-id/flat/1a17c643-e9af-3dba-486b-fbe31bc1823a%402ndquadrant.com Backpatch: 9.4-, where logical decoding was introduced
* Fix jit compilation bug on wide tables.Andres Freund2018-11-27
| | | | | | | | | | | | | | | | | | | | The function generated to perform JIT compiled tuple deforming failed when HeapTupleHeader's t_hoff was bigger than a signed int8. I'd failed to realize that LLVM's getelementptr would treat an int8 index argument as signed, rather than unsigned. That means that a hoff larger than 127 would result in a negative offset being applied. Fix that by widening the index to 32bit. Add a testcase with a wide table. Don't drop it, as it seems useful to verify other tools deal properly with wide tables. Thanks to Justin Pryzby for both reporting a bug and then reducing it to a reproducible testcase! Reported-By: Justin Pryzby Author: Andres Freund Discussion: https://postgr.es/m/20181115223959.GB10913@telsasoft.com Backpatch: 11, just as jit compilation was
* Fix float-to-integer coercions to handle edge cases correctly.Tom Lane2018-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | ftoi4 and its sibling coercion functions did their overflow checks in a way that looked superficially plausible, but actually depended on an assumption that the MIN and MAX comparison constants can be represented exactly in the float4 or float8 domain. That fails in ftoi4, ftoi8, and dtoi8, resulting in a possibility that values near the MAX limit will be wrongly converted (to negative values) when they need to be rejected. Also, because we compared before rounding off the fractional part, the other three functions threw errors for values that really ought to get rounded to the min or max integer value. Fix by doing rint() first (requiring an assumption that it handles NaN and Inf correctly; but dtoi8 and ftoi8 were assuming that already), and by comparing to values that should coerce to float exactly, namely INTxx_MIN and -INTxx_MIN. Also remove some random cosmetic discrepancies between these six functions. This back-patches commits cbdb8b4c0 and 452b637d4. In the 9.4 branch, also back-patch the portion of 62e2a8dc2 that added PG_INTnn_MIN and related constants to c.h, so that these functions can rely on them. Per bug #15519 from Victor Petrovykh. Patch by me; thanks to Andrew Gierth for analysis and discussion. Discussion: https://postgr.es/m/15519-4fc785b483201ff1@postgresql.org
* Don't allow partitioned indexes in pg_global tablespaceAlvaro Herrera2018-11-23
| | | | | | | Missing in dfa608141982. Author: David Rowley Discussion: https://postgr.es/m/CAKJS1f-M3NMTCpv=vDfkoqHbMPFf=3-Z1ud=+1DHH00tC+zLaQ@mail.gmail.com