aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Accept XML documents when xmloption = content, as required by SQL:2006+.Tom Lane2019-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we were using the SQL:2003 definition, which doesn't allow this, but that creates a serious dump/restore gotcha: there is no setting of xmloption that will allow all valid XML data. Hence, switch to the 2006 definition. Since libxml doesn't accept <!DOCTYPE> directives in the mode we use for CONTENT parsing, the implementation is to detect <!DOCTYPE> in the input and switch to DOCUMENT parsing mode. This should not cost much, because <!DOCTYPE> should be close to the front of the input if it's there at all. It's possible that this causes the error messages for malformed input to be slightly different than they were before, if said input includes <!DOCTYPE>; but that does not seem like a big problem. In passing, buy back a few cycles in parsing of large XML documents by not doing strlen() of the whole input in parse_xml_decl(). Back-patch because dump/restore failures are not nice. This change shouldn't break any cases that worked before, so it seems safe to back-patch. Chapman Flack (revised a bit by me) Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com
* Make checkpoint requests more robust.Tom Lane2019-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6f6a6d8b1 introduced a delay of up to 2 seconds if we're trying to request a checkpoint but the checkpointer hasn't started yet (or, much less likely, our kill() call fails). However buildfarm experience shows that that's not quite enough for slow or heavily-loaded machines. There's no good reason to assume that the checkpointer won't start eventually, so we may as well make the timeout much longer, say 60 sec. However, if the caller didn't say CHECKPOINT_WAIT, it seems like a bad idea to be waiting at all, much less for as long as 60 sec. We can remove the need for that, and make this whole thing more robust, by adjusting the code so that the existence of a pending checkpoint request is clear from the contents of shared memory, and making sure that the checkpointer process will notice it at startup even if it did not get a signal. In this way there's no need for a non-CHECKPOINT_WAIT call to wait at all; if it can't send the signal, it can nonetheless assume that the checkpointer will eventually service the request. A potential downside of this change is that "kill -INT" on the checkpointer process is no longer enough to trigger a checkpoint, should anyone be relying on something so hacky. But there's no obvious reason to do it like that rather than issuing a plain old CHECKPOINT command, so we'll assume that nobody is. There doesn't seem to be a way to preserve this undocumented quasi-feature without introducing race conditions. Since a principal reason for messing with this is to prevent intermittent buildfarm failures, back-patch to all supported branches. Discussion: https://postgr.es/m/27830.1552752475@sss.pgh.pa.us
* Ensure dummy paths have correct required_outer if rel is parameterized.Tom Lane2019-03-14
| | | | | | | | | | | | | | | | The assertions added by commits 34ea1ab7f et al found another problem: set_dummy_rel_pathlist and mark_dummy_rel were failing to label the dummy paths they create with the correct outer_relids, in case the relation is necessarily parameterized due to having lateral references in its tlist. It's likely that this has no user-visible consequences in production builds, at the moment; but still an assertion failure is a bad thing, so back-patch the fix. Per bug #15694 from Roman Zharkov (via Alexander Lakhin) and an independent report by Tushar Ahuja. Discussion: https://postgr.es/m/15694-74f2ca97e7044f7f@postgresql.org Discussion: https://postgr.es/m/7d72ab20-c725-3ce2-f99d-4e64dd8a0de6@enterprisedb.com
* Disallow NaN as a value for floating-point GUCs.Tom Lane2019-03-10
| | | | | | | | | | | | | | | | | | None of the code that uses GUC values is really prepared for them to hold NaN, but parse_real() didn't have any defense against accepting such a value. Treat it the same as a syntax error. I haven't attempted to analyze the exact consequences of setting any of the float GUCs to NaN, but since they're quite unlikely to be good, this seems like a back-patchable bug fix. Note: we don't need an explicit test for +-Infinity because those will be rejected by existing range checks. I added a regression test for that in HEAD, but not older branches because the spelling of the value in the error message will be platform-dependent in branches where we don't always use port/snprintf.c. Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
* Further fixing for multi-row VALUES lists for updatable views.Dean Rasheed2019-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, rewriteTargetListIU() generated a list of attribute numbers from the targetlist, which were passed to rewriteValuesRTE(), which expected them to contain the same number of entries as there are columns in the VALUES RTE, and to be in the same order. That was fine when the target relation was a table, but for an updatable view it could be broken in at least three different ways --- rewriteTargetListIU() could insert additional targetlist entries for view columns with defaults, the view columns could be in a different order from the columns of the underlying base relation, and targetlist entries could be merged together when assigning to elements of an array or composite type. As a result, when recursing to the base relation, the list of attribute numbers generated from the rewritten targetlist could no longer be relied upon to match the columns of the VALUES RTE. We got away with that prior to 41531e42d3 because it used to always be the case that rewriteValuesRTE() did nothing for the underlying base relation, since all DEFAULTS had already been replaced when it was initially invoked for the view, but that was incorrect because it failed to apply defaults from the base relation. Fix this by examining the targetlist entries more carefully and picking out just those that are simple Vars referencing the VALUES RTE. That's sufficient for the purposes of rewriteValuesRTE(), which is only responsible for dealing with DEFAULT items in the VALUES RTE. Any DEFAULT item in the VALUES RTE that doesn't have a matching simple-Var-assignment in the targetlist is an error which we complain about, but in theory that ought to be impossible. Additionally, move this code into rewriteValuesRTE() to give a clearer separation of concerns between the 2 functions. There is no need for rewriteTargetListIU() to know about the details of the VALUES RTE. While at it, fix the comment for rewriteValuesRTE() which claimed that it doesn't support array element and field assignments --- that hasn't been true since a3c7a993d5 (9.6 and later). Back-patch to all supported versions, with minor differences for the pre-9.6 branches, which don't support array element and field assignments to the same column in multi-row VALUES lists. Reviewed by Amit Langote. Discussion: https://postgr.es/m/15623-5d67a46788ec8b7f@postgresql.org
* Improve documentation of data_sync_retryMichael Paquier2019-02-28
| | | | | | | | Reflecting an updated parameter value requires a server restart, which was not mentioned in the documentation and in postgresql.conf.sample. Reported-by: Thomas Poty Discussion: https://postgr.es/m/15659-0cd812f13027a2d8@postgresql.org
* Fix ecpg bugs caused by missing semicolons in the backend grammar.Tom Lane2019-02-24
| | | | | | | | | | | | | | | | | | | | | | | | | The Bison documentation clearly states that a semicolon is required after every grammar rule, and our scripts that generate ecpg's grammar from the backend's implicitly assumed this is true. But it turns out that only ancient versions of Bison actually enforce that. There have been a couple of rules without trailing semicolons in gram.y for some time, and as a consequence, ecpg's grammar was faulty and produced wrong output for the affected statements. To fix, add the missing semis, and add some cross-checks to ecpg's scripts so that they'll bleat if we mess this up again. The cases that were broken were: * "SET variable = DEFAULT" (but not "SET variable TO DEFAULT"), as well as allied syntaxes such as ALTER SYSTEM SET ... DEFAULT. These produced syntactically invalid output that the server would reject. * Multiple type names in DROP TYPE/DOMAIN commands. Only the first type name would be listed in the emitted command. Per report from Daisuke Higuchi. Back-patch to all supported versions. Discussion: https://postgr.es/m/1803D792815FC24D871C00D17AE95905DB51CE@g01jpexmbkw24
* Tolerate EINVAL when calling fsync() on a directory.Thomas Munro2019-02-25
| | | | | | | | | | | Previously, we tolerated EBADF as a way for the operating system to indicate that it doesn't support fsync() on a directory. Tolerate EINVAL too, for older versions of Linux CIFS. Bug #15636. Back-patch all the way. Reported-by: John Klann Discussion: https://postgr.es/m/15636-d380890dafd78fc6@postgresql.org
* Fix plan created for inherited UPDATE/DELETE with all tables excluded.Tom Lane2019-02-22
| | | | | | | | | | | | | | | | | | | | | | In the case where inheritance_planner() finds that every table has been excluded by constraints, it thought it could get away with making a plan consisting of just a dummy Result node. While certainly there's no updating or deleting to be done, this had two user-visible problems: the plan did not report the correct set of output columns when a RETURNING clause was present, and if there were any statement-level triggers that should be fired, it didn't fire them. Hence, rather than only generating the dummy Result, we need to stick a valid ModifyTable node on top, which requires a tad more effort here. It's been broken this way for as long as inheritance_planner() has known about deleting excluded subplans at all (cf commit 635d42e9c), so back-patch to all supported branches. Amit Langote and Tom Lane, per a report from Petr Fedorov. Discussion: https://postgr.es/m/5da6f0f0-1364-1876-6978-907678f89a3e@phystech.edu
* Make object address handling more robustAlvaro Herrera2019-02-20
| | | | | | | | | pg_identify_object_as_address crashes when passed certain tuples from inconsistent system catalogs. Make it more defensive. Author: Álvaro Herrera Reviewed-by: Michaël Paquier Discussion: https://postgr.es/m/20190218202743.GA12392@alvherre.pgsql
* Fix DEFAULT-handling in multi-row VALUES lists for updatable views.Dean Rasheed2019-02-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | INSERT ... VALUES for a single VALUES row is implemented differently from a multi-row VALUES list, which causes inconsistent behaviour in the way that DEFAULT items are handled. In particular, when inserting into an auto-updatable view on top of a table with a column default, a DEFAULT item in a single VALUES row gets correctly replaced with the table column's default, but for a multi-row VALUES list it is replaced with NULL. Fix this by allowing rewriteValuesRTE() to leave DEFAULT items in the VALUES list untouched if the target relation is an auto-updatable view and has no column default, deferring DEFAULT-expansion until the query against the base relation is rewritten. For all other types of target relation, including tables and trigger- and rule-updatable views, we must continue to replace DEFAULT items with NULL in the absence of a column default. This is somewhat complicated by the fact that if an auto-updatable view has DO ALSO rules attached, the VALUES lists for the product queries need to be handled differently from the original query, since the product queries need to act like rule-updatable views whereas the original query has auto-updatable view semantics. Back-patch to all supported versions. Reported by Roger Curley (bug #15623). Patch by Amit Langote and me. Discussion: https://postgr.es/m/15623-5d67a46788ec8b7f@postgresql.org
* Mark correctly initial slot snapshots with MVCC type when builtMichael Paquier2019-02-20
| | | | | | | | | | | | | When building an initial slot snapshot, snapshots are marked with historic MVCC snapshots as type with the marker field being set in SnapBuildBuildSnapshot() but not overriden in SnapBuildExportSnapshot(). Existing callers of SnapBuildBuildSnapshot() do not care about the type of snapshot used, but extensions calling it actually may, as reported. Author: Antonin Houska Reviewed-by: Álvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/23215.1527665193@localhost Backpatch-through: 9.4
* Fix CREATE VIEW to allow zero-column views.Tom Lane2019-02-17
| | | | | | | | | | | | | | | | | | | | | We should logically have allowed this case when we allowed zero-column tables, but it was overlooked. Although this might be thought a feature addition, it's really a bug fix, because it was possible to create a zero-column view via the convert-table-to-view code path, and then you'd have a situation where dump/reload would fail. Hence, back-patch to all supported branches. Arrange the added test cases to provide coverage of the related pg_dump code paths (since these views will be dumped and reloaded during the pg_upgrade regression test). I also made them test the case where pg_dump has to postpone the view rule into post-data, which disturbingly had no regression coverage before. Report and patch by Ashutosh Sharma (test case by me) Discussion: https://postgr.es/m/CAE9k0PkmHdeSaeZt2ujnb_cKucmK3sDDceDzw7+d5UZoNJPYOg@mail.gmail.com
* Fix support for CREATE TABLE IF NOT EXISTS AS EXECUTEMichael Paquier2019-02-15
| | | | | | | | | | | The grammar IF NOT EXISTS for CTAS is supported since 9.5 and documented as such, however the case of using EXECUTE as query has never been covered as EXECUTE CTAS statements and normal CTAS statements are parsed separately. Author: Andreas Karlsson Discussion: https://postgr.es/m/2ddcc188-e37c-a0be-32bf-a56b07c3559e@proxel.se Backpatch-through: 9.5
* Fix race in dsm_attach() when handles are reused.Thomas Munro2019-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | DSM handle values can be reused as soon as the underlying shared memory object has been destroyed. That means that for a brief moment we might have two DSM slots with the same handle. While trying to attach, if we encounter a slot with refcnt == 1, meaning that it is currently being destroyed, we should continue our search in case the same handle exists in another slot. The race manifested as a rare "dsa_area could not attach to segment" error, and was more likely in 10 and 11 due to the lack of distinct seed for random() in parallel workers. It was made very unlikely in in master by commit 197e4af9, and older releases don't usually create new DSM segments in background workers so it was also unlikely there. This fixes the root cause of bug report #15585, in which the error could also sometimes result in a self-deadlock in the error path. It's not yet clear if further changes are needed to avoid that failure mode. Back-patch to 9.4, where dsm.c arrived. Author: Thomas Munro Reported-by: Justin Pryzby, Sergei Kornilov Discussion: https://postgr.es/m/20190207014719.GJ29720@telsasoft.com Discussion: https://postgr.es/m/15585-324ff6a93a18da46@postgresql.org
* Relax overly strict assertionAlvaro Herrera2019-02-12
| | | | | | | | | | | | | | | Ever since its birth, ReorderBufferBuildTupleCidHash() has contained an assertion that a catalog tuple cannot change Cmax after acquiring one. But that's wrong: if a subtransaction executes DDL that affects that catalog tuple, and later aborts and another DDL affects the same tuple, it will change Cmax. Relax the assertion to merely verify that the Cmax remains valid and monotonically increasing, instead. Add a test that tickles the relevant code. Diagnosed by, and initial patch submitted by: Arseny Sher Co-authored-by: Arseny Sher Discussion: https://postgr.es/m/874l9p8hyw.fsf@ars-thinkpad
* Fix erroneous error reports in snapbuild.c.Tom Lane2019-02-12
| | | | | | | | | | | | | | | | It's pretty unhelpful to report the wrong file name in a complaint about syscall failure, but SnapBuildSerialize managed to do that twice in a span of 50 lines. Also fix half a dozen missing or poorly-chosen errcode assignments; that's mostly cosmetic, but still wrong. Noted while studying recent failures on buildfarm member nightjar. I'm not sure whether those reports are actually giving the wrong filename, because there are two places here with identically spelled error messages. The other one is specifically coded not to report ENOENT, but if it's this one, how could we be getting ENOENT from open() with O_CREAT? Need to sit back and await results. However, these ereports are clearly broken from birth, so back-patch.
* Translation updatesPeter Eisentraut2019-02-11
| | | | | Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 82b08ac4d1dfb5febd26bb493d0055cc5d71d513
* Defend against null error message reported by libxml2.Tom Lane2019-02-08
| | | | | | | | | | | | | | While this isn't really supposed to happen, it can occur in OOM situations and perhaps others. Instead of crashing, substitute "(no message provided)". I didn't worry about localizing this text, since we aren't localizing anything else here; besides, if we're on the edge of OOM, it's unlikely gettext() would work. Report and fix by Sergio Conde Gómez in bug #15624. Discussion: https://postgr.es/m/15624-4dea54091a2864e6@postgresql.org
* Ensure that foreign scans with lateral refs are planned correctly.Tom Lane2019-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | As reported in bug #15613 from Srinivasan S A, file_fdw and postgres_fdw neglected to mark plain baserel foreign paths as parameterized when the relation has lateral_relids. Other FDWs have surely copied this mistake, so rather than just patching those two modules, install a band-aid fix in create_foreignscan_path to rectify the mistake centrally. Although the band-aid is enough to fix the visible symptom, correct the calls in file_fdw and postgres_fdw anyway, so that they are valid examples for external FDWs. Also, since the band-aid isn't enough to make this work for parameterized foreign joins, throw an elog(ERROR) if such a case is passed to create_foreignscan_path. This shouldn't pose much of a problem for existing external FDWs, since it's likely they aren't trying to make such paths anyway (though some of them may need a defense against joins with lateral_relids, similar to the one this patch installs into postgres_fdw). Add some assertions in relnode.c to catch future occurrences of the same error --- in particular, as backstop against core-code mistakes like the one fixed by commit bdd9a99aa. Discussion: https://postgr.es/m/15613-092be1be9576c728@postgresql.org
* Propagate lateral-reference information to indirect descendant relations.Tom Lane2019-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | create_lateral_join_info() computes a bunch of information about lateral references between base relations, and then attempts to propagate those markings to appendrel children of the original base relations. But the original coding neglected the possibility of indirect descendants (grandchildren etc). During v11 development we noticed that this was wrong for partitioned-table cases, but failed to realize that it was just as wrong for any appendrel. While the case can't arise for appendrels derived from traditional table inheritance (because we make a flat appendrel for that), nested appendrels can arise from nested UNION ALL subqueries. Failure to mark the lower-level relations as having lateral references leads to confusion in add_paths_to_append_rel about whether unparameterized paths can be built. It's not very clear whether that leads to any user-visible misbehavior; the lack of field reports suggests that it may cause nothing worse than minor cost misestimation. Still, it's a bug, and it leads to failures of Asserts that I intend to add later. To fix, we need to propagate information from all appendrel parents, not just those that are RELOPT_BASERELs. We can still do it in one pass, if we rely on the append_rel_list to be ordered with ancestor relationships before descendant ones; add assertions checking that. While fixing this, we can make a small performance improvement by traversing the append_rel_list just once instead of separately for each appendrel parent relation. Noted while investigating bug #15613, though this patch does not fix that (which is why I'm not committing the related Asserts yet). Discussion: https://postgr.es/m/3951.1549403812@sss.pgh.pa.us
* Keep perl style checker happyAndrew Dunstan2019-02-05
| | | | It doesn't like code before "use strict;".
* Fix searchpath for modern Perl for genbki.plAndrew Dunstan2019-02-05
| | | | | | | This was fixed for MSVC tools by commit 1df92eeafefac4, but per buildfarm member bowerbird genbki.pl needs the same treatment. Backpatch to all live branches.
* Avoid possible deadlock while locking multiple heap pages.Amit Kapila2019-02-02
| | | | | | | | | | | | | | | | | To avoid deadlock, backend acquires a lock on heap pages in block number order. In certain cases, lock on heap pages is dropped and reacquired. In this case, the locks are dropped for reading in corresponding VM page/s. The issue is we re-acquire locks in bufferId order whereas the intention was to acquire in blockid order. This commit ensures that we will always acquire locks on heap pages in blockid order. Reported-by: Nishant Fnu Author: Nishant Fnu Reviewed-by: Amit Kapila and Robert Haas Backpatch-through: 9.4 Discussion: https://postgr.es/m/5883C831-2ED1-47C8-BFAC-2D5BAE5A8CAE@amazon.com
* Fix use of dangling pointer in heap_delete() when logging replica identityMichael Paquier2019-02-01
| | | | | | | | | | | | | | When logging the replica identity of a deleted tuple, XLOG_HEAP_DELETE records include references of the old tuple. Its data is stored in an intermediate variable used to register this information for the WAL record, but this variable gets away from the stack when the record gets actually inserted. Spotted by clang's AddressSanitizer. Author: Stas Kelvish Discussion: https://postgr.es/m/085C8825-AD86-4E93-AF80-E26CDF03D1EA@postgrespro.ru Backpatch-through: 9.4
* Fix a crash in logical replicationPeter Eisentraut2019-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bug was that determining which columns are part of the replica identity index using RelationGetIndexAttrBitmap() would run eval_const_expressions() on index expressions and predicates across all indexes of the table, which in turn might require a snapshot, but there wasn't one set, so it crashes. There were actually two separate bugs, one on the publisher and one on the subscriber. To trigger the bug, a table that is part of a publication or subscription needs to have an index with a predicate or expression that lends itself to constant expressions simplification. The fix is to avoid the constant expressions simplification in RelationGetIndexAttrBitmap(), so that it becomes safe to call in these contexts. The constant expressions simplification comes from the calls to RelationGetIndexExpressions()/RelationGetIndexPredicate() via BuildIndexInfo(). But RelationGetIndexAttrBitmap() calling BuildIndexInfo() is overkill. The latter just takes pg_index catalog information, packs it into the IndexInfo structure, which former then just unpacks again and throws away. We can just do this directly with less overhead and skip the troublesome calls to eval_const_expressions(). This also removes the awkward cross-dependency between relcache.c and index.c. Bug: #15114 Reported-by: Петър Славов <pet.slavov@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/152110589574.1223.17983600132321618383@wrigleys.postgresql.org/
* Allow UNLISTEN in hot-standby mode.Tom Lane2019-01-25
| | | | | | | | | | | | | | | Since LISTEN is (still) disallowed, UNLISTEN must be a no-op in a hot-standby session, and so there's no harm in allowing it. This change allows client code to not worry about whether it's connected to a primary or standby server when performing session-state-reset type activities. (Note that DISCARD ALL, which includes UNLISTEN, was already allowed, making it inconsistent to reject UNLISTEN.) Per discussion, back-patch to all supported versions. Shay Rojansky, reviewed by Mi Tar Discussion: https://postgr.es/m/CADT4RqCf2gA_TJtPAjnGzkC3ZiexfBZiLmA-mV66e4UyuVv8bA@mail.gmail.com
* Postpone aggregate checks until after collation is assigned.Andrew Gierth2019-01-17
| | | | | | | | | | | | | | | | | | | | | | | Previously, parseCheckAggregates was run before assign_query_collations, but this causes problems if any expression has already had a collation assigned by some transform function (e.g. transformCaseExpr) before parseCheckAggregates runs. The differing collations would cause expressions not to be recognized as equal to the ones in the GROUP BY clause, leading to spurious errors about unaggregated column references. The result was that CASE expr WHEN val ... would fail when "expr" contained a GROUPING() expression or matched one of the group by expressions, and where collatable types were involved; whereas the supposedly identical CASE WHEN expr = val ... would succeed. Backpatch all the way; this appears to have been wrong ever since collations were introduced. Per report from Guillaume Lelarge, analysis and patch by me. Discussion: https://postgr.es/m/CAECtzeVSO_US8C2Khgfv54ZMUOBR4sWq+6_bLrETnWExHT=rFg@mail.gmail.com Discussion: https://postgr.es/m/87muo0k0c7.fsf@news-spur.riddles.org.uk
* Improve ANALYZE's handling of concurrent-update scenarios.Tom Lane2019-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the rule for whether or not a tuple seen by ANALYZE should be included in its sample. When we last touched this logic, in commit 51e1445f1, we weren't thinking very hard about tuples being UPDATEd by a long-running concurrent transaction. In such a case, we might see the pre-image as either LIVE or DELETE_IN_PROGRESS depending on timing; and we might see the post-image not at all, or as INSERT_IN_PROGRESS. Since the existing code will not sample either DELETE_IN_PROGRESS or INSERT_IN_PROGRESS tuples, this leads to concurrently-updated rows being omitted from the sample entirely. That's not very helpful, and it's especially the wrong thing if the concurrent transaction ends up rolling back. The right thing seems to be to sample DELETE_IN_PROGRESS rows just as if they were live. This makes the "sample it" and "count it" decisions the same, which seems good for consistency. It's clearly the right thing if the concurrent transaction ends up rolling back; in effect, we are sampling as though IN_PROGRESS transactions haven't happened yet. Also, this combination of choices ensures maximum robustness against the different combinations of whether and in which state we might see the pre- and post-images of an update. It's slightly annoying that we end up recording immediately-out-of-date stats in the case where the transaction does commit, but on the other hand the stats are fine for columns that didn't change in the update. And the alternative of sampling INSERT_IN_PROGRESS rows instead seems like a bad idea, because then the sampling would be inconsistent with the way rows are counted for the stats report. Per report from Mark Chambers; thanks to Jeff Janes for diagnosing what was happening. Back-patch to all supported versions. Discussion: https://postgr.es/m/CAFh58O_Myr6G3tcH3gcGrF-=OExB08PJdWZcSBcEcovaiPsrHA@mail.gmail.com
* Don't believe MinMaxExpr is leakproof without checking.Tom Lane2019-01-02
| | | | | | | | | | | | | | | | MinMaxExpr invokes the btree comparison function for its input datatype, so it's only leakproof if that function is. Many such functions are indeed leakproof, but others are not, and we should not just assume that they are. Hence, adjust contain_leaked_vars to verify the leakproofness of the referenced function explicitly. I didn't add a regression test because it would need to depend on some particular comparison function being leaky, and that's a moving target, per discussion. This has been wrong all along, so back-patch to supported branches. Discussion: https://postgr.es/m/31042.1546194242@sss.pgh.pa.us
* Have DISCARD ALL/TEMP remove leftover temp tablesAlvaro Herrera2018-12-27
| | | | | | | | | | | | | | | | | | | | Previously, it would only remove temp tables created in the same session; but if the session uses the BackendId of a previously crashed backend that left temp tables around, those would not get removed. Since autovacuum would not drop them either (because it sees that the BackendId is in use by the current session) these can cause annoying xid-wraparound warnings. Apply to branches 9.4 to 10. This is not a problem since version 11, because commit 943576bddcb5 added state tracking that makes autovacuum realize that those temp tables are not ours, so it removes them. This is useful to handle in DISCARD, because even though it does not handle all situations, it does handle the common one where a connection pooler keeps the same session open for an indefinitely long time. Discussion: https://postgr.es/m/20181226190834.wsk2wzott5yzrjiq@alvherre.pgsql Reviewed-by: Takayuki Tsunakawa, Michaël Paquier
* Make autovacuum more selective about temp tables to keepAlvaro Herrera2018-12-27
| | | | | | | | | | | | | | | | | | | When temp tables are in danger of XID wraparound, autovacuum drops them; however, it preserves those that are owned by a working session. This is desirable, except when the session is connected to a different database (because the temp tables cannot be from that session), so make it only keep the temp tables only if the backend is in the same database as the temp tables. This is not bulletproof: it fails to detect temp tables left by a session whose backend ID is reused in the same database but the new session does not use temp tables. Commit 943576bddcb5 fixes that case too, for branches 11 and up (which is why we don't apply this fix to those branches), but back-patching that one is not universally agreed on. Discussion: https://postgr.es/m/20181214162843.37g6h3txto43akrb@alvherre.pgsql Reviewed-by: Takayuki Tsunakawa, Michaël Paquier
* Ignore inherited temp relations from other sessions when truncatingMichael Paquier2018-12-27
| | | | | | | | | | | | | | | | | | Inheritance trees can include temporary tables if the parent is permanent, which makes possible the presence of multiple temporary children from different sessions. Trying to issue a TRUNCATE on the parent in this scenario causes a failure, so similarly to any other queries just ignore such cases, which makes TRUNCATE work transparently. This makes truncation behave similarly to any other DML query working on the parent table with queries which need to be issues on children. A set of isolation tests is added to cover basic cases. Reported-by: Zhou Digoal Author: Amit Langote, Michael Paquier Discussion: https://postgr.es/m/15565-ce67a48d0244436a@postgresql.org Backpatch-through: 9.4
* Prioritize history files when archivingMichael Paquier2018-12-24
| | | | | | | | | | | | | | | | | | | | | | | | At the end of recovery for the post-promotion process, a new history file is created followed by the last partial segment of the previous timeline. Based on the timing, the archiver would first try to archive the last partial segment and then the history file. This can delay the detection of a new timeline taken, particularly depending on the time it takes to transfer the last partial segment as it delays the moment the history file of the new timeline gets archived. This can cause promoted standbys to use the same timeline as one already taken depending on the circumstances if multiple instances look at archives at the same location. This commit changes the order of archiving so as history files are archived in priority over other file types, which reduces the likelihood of the same timeline being taken (still not reducing the window to zero), and it makes the archiver behave more consistently with the startup process doing its post-promotion business. Author: David Steele Reviewed-by: Michael Paquier, Kyotaro Horiguchi Discussion: https://postgr.es/m/929068cf-69e1-bba2-9dc0-e05986aed471@pgmasters.net Backpatch-through: 9.5
* Fix ancient thinko in mergejoin cost estimation.Tom Lane2018-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | "rescanratio" was computed as 1 + rescanned-tuples / total-inner-tuples, which is sensible if it's to be multiplied by total-inner-tuples or a cost value corresponding to scanning all the inner tuples. But in reality it was (mostly) multiplied by inner_rows or a related cost, numbers that take into account the possibility of stopping short of scanning the whole inner relation thanks to a limited key range in the outer relation. This'd still make sense if we could expect that stopping short would result in a proportional decrease in the number of tuples that have to be rescanned. It does not, however. The argument that establishes the validity of our estimate for that number is independent of whether we scan all of the inner relation or stop short, and experimentation also shows that stopping short doesn't reduce the number of rescanned tuples. So the correct calculation is 1 + rescanned-tuples / inner_rows, and we should be sure to multiply that by inner_rows or a corresponding cost value. Most of the time this doesn't make much difference, but if we have both a high rescan rate (due to lots of duplicate values) and an outer key range much smaller than the inner key range, then the error can be significant, leading to a large underestimate of the cost associated with rescanning. Per report from Vijaykumar Jain. This thinko appears to go all the way back to the introduction of the rescan estimation logic in commit 70fba7043, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAE7uO5hMb_TZYJcZmLAgO6iD68AkEK6qCe7i=vZUkCpoKns+EQ@mail.gmail.com
* Fix use-after-free bug when renaming constraintsMichael Paquier2018-12-17
| | | | | | | | | This is an oversight from recent commit b13fd344. While on it, tweak the previous test with a better name for the renamed primary key. Detected by buildfarm member prion which forces relation cache release with -DRELCACHE_FORCE_RELEASE. Back-patch down to 9.4 as the previous commit.
* Make constraint rename issue relcache invalidation on target relationMichael Paquier2018-12-17
| | | | | | | | | | | | | | | When a constraint gets renamed, it may have associated with it a target relation (for example domain constraints don't have one). Not invalidating the target relation cache when issuing the renaming can result in issues with subsequent commands that refer to the old constraint name using the relation cache, causing various failures. One pattern spotted was using CREATE TABLE LIKE after a constraint renaming. Reported-by: Stuart <sfbarbee@gmail.com> Author: Amit Langote Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/2047094.V130LYfLq4@station53.ousa.org
* Prevent GIN deleted pages from being reclaimed too earlyAlexander Korotkov2018-12-13
| | | | | | | | | | | | | | | | | | When GIN vacuum deletes a posting tree page, it assumes that no concurrent searchers can access it, thanks to ginStepRight() locking two pages at once. However, since 9.4 searches can skip parts of posting trees descending from the root. That leads to the risk that page is deleted and reclaimed before concurrent search can access it. This commit prevents the risk of above by waiting for every transaction, which might wait to reference this page, to finish. Due to binary compatibility we can't change GinPageOpaqueData to store corresponding transaction id. Instead we reuse page header pd_prune_xid field, which is unused in index pages. Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Andrey Borodin, Alexander Korotkov Reviewed-by: Alexander Korotkov Backpatch-through: 9.4
* Prevent deadlock in ginRedoDeletePage()Alexander Korotkov2018-12-13
| | | | | | | | | | | | | | | | | | | | On standby ginRedoDeletePage() can work concurrently with read-only queries. Those queries can traverse posting tree in two ways. 1) Using rightlinks by ginStepRight(), which locks the next page before unlocking its left sibling. 2) Using downlinks by ginFindLeafPage(), which locks at most one page at time. Original lock order was: page, parent, left sibling. That lock order can deadlock with ginStepRight(). In order to prevent deadlock this commit changes lock order to: left sibling, page, parent. Note, that position of parent in locking order seems insignificant, because we only lock one page at time while traversing downlinks. Reported-by: Chen Huajun Diagnosed-by: Chen Huajun, Peter Geoghegan, Andrey Borodin Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Alexander Korotkov Backpatch-through: 9.4
* Repair bogus handling of multi-assignment Params in upper plan levels.Tom Lane2018-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our support for multiple-set-clauses in UPDATE assumes that the Params referencing a MULTIEXPR_SUBLINK SubPlan will appear before that SubPlan in the targetlist of the plan node that calculates the updated row. (Yeah, it's a hack...) In some PG branches it's possible that a Result node gets inserted between the primary calculation of the update tlist and the ModifyTable node. setrefs.c did the wrong thing in this case and left the upper-level Params as Params, causing a crash at runtime. What it should do is replace them with "outer" Vars referencing the child plan node's output. That's a result of careless ordering of operations in fix_upper_expr_mutator, so we can fix it just by reordering the code. Fix fix_join_expr_mutator similarly for consistency, even though join nodes could never appear in such a context. (In general, it seems likely to be a bit cheaper to use Vars than Params in such situations anyway, so this patch might offer a tiny performance improvement.) The hazard extends back to 9.5 where the MULTIEXPR_SUBLINK stuff was introduced, so back-patch that far. However, this may be a live bug only in 9.6.x and 10.x, as the other branches don't seem to want to calculate the final tlist below the Result node. (That plan shape change between branches might be a mini-bug in itself, but I'm not really interested in digging into the reasons for that right now. Still, add a regression test memorializing what we expect there, so we'll notice if it changes again.) Per bug report from Eduards Bezverhijs. Discussion: https://postgr.es/m/b6cd572a-3e44-8785-75e9-c512a5a17a73@tieto.com
* Add stack depth checks to key recursive functions in backend/nodes/*.c.Tom Lane2018-12-10
| | | | | | | | | | Although copyfuncs.c has a check_stack_depth call in its recursion, equalfuncs.c, outfuncs.c, and readfuncs.c lacked one. This seems unwise. Likewise fix planstate_tree_walker(), in branches where that exists. Discussion: https://postgr.es/m/30253.1544286631@sss.pgh.pa.us
* Fix misapplication of pgstat_count_truncate to wrong relation.Tom Lane2018-12-07
| | | | | | | | | | | | | | | | | | | | | | | | The stanza of ExecuteTruncate[Guts] that truncates a target table's toast relation re-used the loop local variable "rel" to reference the toast rel. This was safe enough when written, but commit d42358efb added code below that that supposed "rel" still pointed to the parent table. Therefore, the stats counter update was applied to the wrong relcache entry (the toast rel not the user rel); and if we were unlucky and that relcache entry had been flushed during reindex_relation, very bad things could ensue. (I'm surprised that CLOBBER_CACHE_ALWAYS testing hasn't found this. I'm even more surprised that the problem wasn't detected during the development of d42358efb; it must not have been tested in any case with a toast table, as the incorrect stats counts are very obvious.) To fix, replace use of "rel" in that code branch with a more local variable. Adjust test cases added by d42358efb so that some of them use tables with toast tables. Per bug #15540 from Pan Bian. Back-patch to 9.5 where d42358efb came in. Discussion: https://postgr.es/m/15540-01078812338195c0@postgresql.org
* Fix handling of synchronous replication for stopping WAL sendersMichael Paquier2018-11-29
| | | | | | | | | | | | | | | | | | | This fixes an oversight from c6c3334 which forgot that if a subset of WAL senders are stopping and in a sync state, other WAL senders could still be waiting for a WAL position to be synced while committing a transaction. However the subset of stopping senders would not release waiters, potentially breaking synchronous replication guarantees. This commit makes sure that even WAL senders stopping are able to release waiters and are tracked properly. On 9.4, this can also trigger an assertion failure when setting for example max_wal_senders to 1 where a WAL sender is not able to find itself as in synchronous state when the instance stops. Reported-by: Paul Guo Author: Paul Guo, Michael Paquier Discussion: https://postgr.es/m/CAEET0ZEv8VFqT3C-cQm6byOB4r4VYWcef1J21dOX-gcVhCSpmA@mail.gmail.com Backpatch-through: 9.4
* Do not decode TOAST data for table rewritesTomas Vondra2018-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During table rewrites (VACUUM FULL and CLUSTER), the main heap is logged using XLOG / FPI records, and thus (correctly) ignored in decoding. But the associated TOAST table is WAL-logged as plain INSERT records, and so was logically decoded and passed to reorder buffer. That has severe consequences with TOAST tables of non-trivial size. Firstly, reorder buffer has to keep all those changes, possibly spilling them to a file, incurring I/O costs and disk space. Secondly, ReoderBufferCommit() was stashing all those TOAST chunks into a hash table, which got discarded only after processing the row from the main heap. But as the main heap is not decoded for rewrites, this never happened, so all the TOAST data accumulated in memory, resulting either in excessive memory consumption or OOM. The fix is simple, as commit e9edc1ba already introduced infrastructure (namely HEAP_INSERT_NO_LOGICAL flag) to skip logical decoding of TOAST tables, but it only applied it to system tables. So simply use it for all TOAST data in raw_heap_insert(). That would however solve only the memory consumption issue - the TOAST changes would still be decoded and added to the reorder buffer, and spilled to disk (although without TOAST tuple data, so much smaller). But we can solve that by tweaking DecodeInsert() to just ignore such INSERT records altogether, using XLH_INSERT_CONTAINS_NEW_TUPLE flag, instead of skipping them later in ReorderBufferCommit(). Review: Masahiko Sawada Discussion: https://www.postgresql.org/message-id/flat/1a17c643-e9af-3dba-486b-fbe31bc1823a%402ndquadrant.com Backpatch: 9.4-, where logical decoding was introduced
* Fix float-to-integer coercions to handle edge cases correctly.Tom Lane2018-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | ftoi4 and its sibling coercion functions did their overflow checks in a way that looked superficially plausible, but actually depended on an assumption that the MIN and MAX comparison constants can be represented exactly in the float4 or float8 domain. That fails in ftoi4, ftoi8, and dtoi8, resulting in a possibility that values near the MAX limit will be wrongly converted (to negative values) when they need to be rejected. Also, because we compared before rounding off the fractional part, the other three functions threw errors for values that really ought to get rounded to the min or max integer value. Fix by doing rint() first (requiring an assumption that it handles NaN and Inf correctly; but dtoi8 and ftoi8 were assuming that already), and by comparing to values that should coerce to float exactly, namely INTxx_MIN and -INTxx_MIN. Also remove some random cosmetic discrepancies between these six functions. This back-patches commits cbdb8b4c0 and 452b637d4. In the 9.4 branch, also back-patch the portion of 62e2a8dc2 that added PG_INTnn_MIN and related constants to c.h, so that these functions can rely on them. Per bug #15519 from Victor Petrovykh. Patch by me; thanks to Andrew Gierth for analysis and discussion. Discussion: https://postgr.es/m/15519-4fc785b483201ff1@postgresql.org
* PANIC on fsync() failure.Thomas Munro2018-11-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some operating systems, it doesn't make sense to retry fsync(), because dirty data cached by the kernel may have been dropped on write-back failure. In that case the only remaining copy of the data is in the WAL. A subsequent fsync() could appear to succeed, but not have flushed the data. That means that a future checkpoint could apparently complete successfully but have lost data. Therefore, violently prevent any future checkpoint attempts by panicking on the first fsync() failure. Note that we already did the same for WAL data; this change extends that behavior to non-temporary data files. Provide a GUC data_sync_retry to control this new behavior, for users of operating systems that don't eject dirty data, and possibly forensic/testing uses. If it is set to on and the write-back error was transient, a later checkpoint might genuinely succeed (on a system that does not throw away buffers on failure); if the error is permanent, later checkpoints will continue to fail. The GUC defaults to off, meaning that we panic. Back-patch to all supported releases. There is still a narrow window for error-loss on some operating systems: if the file is closed and later reopened and a write-back error occurs in the intervening time, but the inode has the bad luck to be evicted due to memory pressure before we reopen, we could miss the error. A later patch will address that with a scheme for keeping files with dirty data open at all times, but we judge that to be too complicated to back-patch. Author: Craig Ringer, with some adjustments by Thomas Munro Reported-by: Craig Ringer Reviewed-by: Robert Haas, Thomas Munro, Andres Freund Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
* Don't forget about failed fsync() requests.Thomas Munro2018-11-19
| | | | | | | | | | | | If fsync() fails, md.c must keep the request in its bitmap, so that future attempts will try again. Back-patch to all supported releases. Author: Thomas Munro Reviewed-by: Amit Kapila Reported-by: Andrew Gierth Discussion: https://postgr.es/m/87y3i1ia4w.fsf%40news-spur.riddles.org.uk
* Initialize TransactionState and user ID consistently at transaction startMichael Paquier2018-11-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a failure happens when a transaction is starting between the moment the transaction status is changed from TRANS_DEFAULT to TRANS_START and the moment the current user ID and security context flags are fetched via GetUserIdAndSecContext(), or before initializing its basic fields, then those may get reset to incorrect values when the transaction aborts, leaving the session in an inconsistent state. One problem reported is that failing a starting transaction at the first query of a session could cause several kinds of system crashes on the follow-up queries. In order to solve that, move the initialization of the transaction state fields and the call of GetUserIdAndSecContext() in charge of fetching the current user ID close to the point where the transaction status is switched to TRANS_START, where there cannot be any error triggered in-between, per an idea of Tom Lane. This properly ensures that the current user ID, the security context flags and that the basic fields of TransactionState remain consistent even if the transaction fails while starting. Reported-by: Richard Guo Diagnosed-By: Richard Guo Author: Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAN_9JTxECSb=pEPcb0a8d+6J+bDcOZ4=DgRo_B7Y5gRHJUM=Rw@mail.gmail.com Backpatch-through: 9.4
* Limit the number of index clauses considered in choose_bitmap_and().Tom Lane2018-11-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | classify_index_clause_usage() is O(N^2) in the number of distinct index qual clauses it considers, because of its use of a simple search list to store them. For nearly all queries, that's fine because only a few clauses will be considered. But Alexander Kuzmenkov reported a machine-generated query with 80000 (!) index qual clauses, which caused this code to take forever. Somewhat remarkably, this is the only O(N^2) behavior we now have for such a query, so let's fix it. We can get rid of the O(N^2) runtime for cases like this without much damage to the functionality of choose_bitmap_and() by separating out paths with "too many" qual or pred clauses, and deeming them to always be nonredundant with other paths. Then their clauses needn't go into the search list, so it doesn't get too long, but we don't lose the ability to consider bitmap AND plans altogether. I set the threshold for "too many" to be 100 clauses per path, which should be plenty to ensure no change in planning behavior for normal queries. There are other things we could do to make this go faster, but it's not clear that it's worth any additional effort. 80000 qual clauses require a whole lot of work in many other places, too. The code's been like this for a long time, so back-patch to all supported branches. The troublesome query only works back to 9.5 (in 9.4 it fails with stack overflow in the parser); so I'm not sure that fixing this in 9.4 has any real-world benefit, but perhaps it does. Discussion: https://postgr.es/m/90c5bdfa-d633-dabe-9889-3cf3e1acd443@postgrespro.ru
* Fix missing role dependencies for some schema and type ACLs.Tom Lane2018-11-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes several related cases in which pg_shdepend entries were never made, or were lost, for references to roles appearing in the ACLs of schemas and/or types. While that did no immediate harm, if a referenced role were later dropped, the drop would be allowed and would leave a dangling reference in the object's ACL. That still wasn't a big problem for normal database usage, but it would cause obscure failures in subsequent dump/reload or pg_upgrade attempts, taking the form of attempts to grant privileges to all-numeric role names. (I think I've seen field reports matching that symptom, but can't find any right now.) Several cases are fixed here: 1. ALTER DOMAIN SET/DROP DEFAULT would lose the dependencies for any existing ACL entries for the domain. This case is ancient, dating back as far as we've had pg_shdepend tracking at all. 2. If a default type privilege applies, CREATE TYPE recorded the ACL properly but forgot to install dependency entries for it. This dates to the addition of default privileges for types in 9.2. 3. If a default schema privilege applies, CREATE SCHEMA recorded the ACL properly but forgot to install dependency entries for it. This dates to the addition of default privileges for schemas in v10 (commit ab89e465c). Another somewhat-related problem is that when creating a relation rowtype or implicit array type, TypeCreate would apply any available default type privileges to that type, which we don't really want since such an object isn't supposed to have privileges of its own. (You can't, for example, drop such privileges once they've been added to an array type.) ab89e465c is also to blame for a race condition in the regression tests: privileges.sql transiently installed globally-applicable default privileges on schemas, which sometimes got absorbed into the ACLs of schemas created by concurrent test scripts. This should have resulted in failures when privileges.sql tried to drop the role holding such privileges; but thanks to the bug fixed here, it instead led to dangling ACLs in the final state of the regression database. We'd managed not to notice that, but it became obvious in the wake of commit da906766c, which allowed the race condition to occur in pg_upgrade tests. To fix, add a function recordDependencyOnNewAcl to encapsulate what callers of get_user_default_acl need to do; while the original call sites got that right via ad-hoc code, none of the later-added ones have. Also change GenerateTypeDependencies to generate these dependencies, which requires adding the typacl to its parameter list. (That might be annoying if there are any extensions calling that function directly; but if there are, they're most likely buggy in the same way as the core callers were, so they need work anyway.) While I was at it, I changed GenerateTypeDependencies to accept most of its parameters in the form of a Form_pg_type pointer, making its parameter list a bit less unwieldy and mistake-prone. The test race condition is fixed just by wrapping the addition and removal of default privileges into a single transaction, so that that state is never visible externally. We might eventually prefer to separate out tests of default privileges into a script that runs by itself, but that would be a bigger change and would make the tests run slower overall. Back-patch relevant parts to all supported branches. Discussion: https://postgr.es/m/15719.1541725287@sss.pgh.pa.us