aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Attach FPI to the first record after full_page_writes is turned on.Amit Kapila2018-09-13
| | | | | | | | | | | | | | | | XLogInsert fails to attach a required FPI to the first record after full_page_writes is turned on by the last checkpoint. This bug got introduced in 9.5 due to code rearrangement in commits 2c03216d83 and 2076db2aea. Fix it by ensuring that XLogInsertRecord performs a recomputation when the given record is generated with FPW as off but found that the flag has been turned on while actually inserting the record. Reported-by: Kyotaro Horiguchi Author: Kyotaro Horiguchi Reviewed-by: Amit Kapila Backpatch-through: 9.5 where this problem was introduced Discussion: https://postgr.es/m/20180420.151043.74298611.horiguchi.kyotaro@lab.ntt.co.jp
* Simplify static function in extension.cMichael Paquier2018-09-13
| | | | | | | | | | | An extra argument for the filename defining the extension script location was present, aimed at being used for error reporting, but has never been used. This was around since extensions have been added in d9572c4. Author: Yugo Nagata Reviewed-by: Tatsuo Ishii Discussion: https://postgr.es/m/20180907180504.1ff19e1675bb44a67e9c7ab1@sraoss.co.jp
* Simplify index tuple descriptor initializationPeter Eisentraut2018-09-13
| | | | | | | | | | | | | | | | We have two code paths for initializing the tuple descriptor for a new index: For a normal index, we copy the tuple descriptor from the table and reset a number of fields that are not applicable to indexes. For an expression index, we make a blank tuple descriptor and fill in the needed fields based on the provided expressions. As pg_attribute has grown over time, the number of fields that we need to reset in the first case is now bigger than the number of fields we actually want to copy, so it's sensible to do it the other way around: Make a blank descriptor and copy just the fields we need. This also allows more code sharing between the two branches, and it avoids having to touch this code for almost every unrelated change to the pg_attribute structure. Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
* Repair bug in regexp split performance improvements.Andrew Gierth2018-09-12
| | | | | | | | | | | | | | | | Commit c8ea87e4b introduced a temporary conversion buffer for substrings extracted during regexp splits. Unfortunately the code that sized it was failing to ignore the effects of ignored degenerate regexp matches, so for regexp_split_* calls it could under-size the buffer in such cases. Fix, and add some regression test cases (though those will only catch the bug if run in a multibyte encoding). Backpatch to 9.3 as the faulty code was. Thanks to the PostGIS project, Regina Obe and Paul Ramsey for the report (via IRC) and assistance in analysis. Patch by me.
* Remove ruleutils.c's special case for BIT [VARYING] literals.Tom Lane2018-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now, get_const_expr() insisted on prefixing BIT and VARBIT literals with 'B'. That's not really necessary, because we always append explicit-cast syntax to identify the constant's type. Moreover, it's subtly wrong for VARBIT, because the parser will interpret B'...' as '...'::"bit"; see make_const() which explicitly assigns type BITOID for a T_BitString literal. So what had been a simple VARBIT literal is reconstructed as ('...'::"bit")::varbit, which is not the same thing, at least not before constant folding. This results in odd differences after dump/restore, as complained of by the patch submitter, and it could result in actual failures in partitioning or inheritance DDL operations (see commit 542320c2b, which repaired similar misbehaviors for some other data types). Fixing it is pretty easy: just remove the special case and let the default code path handle these types. We could have kept the special case for BIT only, but there seems little point in that. Like the previous patch, I judge that back-patching this into stable branches wouldn't be a good idea. However, it seems not quite too late for v11, so let's fix it there. Paul Guo, reviewed by Davy Machado and John Naylor, minor adjustments by me Discussion: https://postgr.es/m/CABQrizdTra=2JEqA6+Ms1D1k1Kqw+aiBBhC9TreuZRX2JzxLAA@mail.gmail.com
* Repair double-free in SP-GIST rescan (bug #15378)Andrew Gierth2018-09-11
| | | | | | | | | | | | | | | | | | | | | | spgrescan would first reset traversalCxt, and then traverse a potentially non-empty stack containing pointers to traversalValues which had been allocated in those contexts, freeing them a second time. This bug originates in commit ccd6eb49a where traversalValue was introduced. Repair by traversing the stack before the context reset; this isn't ideal, since it means doing retail pfree in a context that's about to be reset, but the freeing of a stack entry is also done in other places in the code during the scan so it's not worth trying to refactor it further. Regression test added. Backpatch to 9.6 where the problem was introduced. Per bug #15378; analysis and patch by me, originally from a report on IRC by user velix; see also PostGIS ticket #4174; review by Alexander Korotkov. Discussion: https://postgr.es/m/153663176628.23136.11901365223750051490@wrigleys.postgresql.org
* Improve behavior of to_timestamp()/to_date() functionsAlexander Korotkov2018-09-09
| | | | | | | | | | | | | | | | | | | | | | to_timestamp()/to_date() functions were introduced mainly for Oracle compatibility, and became very popular among PostgreSQL users. However, some behavior of to_timestamp()/to_date() functions are both incompatible with Oracle and confusing for our users. This behavior is related to handling of spaces and separators in non FX (fixed format) mode. This commit reworks this behavior making less confusing, better documented and more compatible with Oracle. Nevertheless, there are still following incompatibilities with Oracle. 1) We don't insist that there are no format string patterns unmatched to input string. 2) In FX mode we don't insist space and separators in format string to exactly match input string. 3) When format string patterns are divided by mix of spaces and separators, we don't distinguish them, while Oracle takes into account only last group of spaces/separators. Discussion: https://postgr.es/m/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com Author: Artur Zakirov, Alexander Korotkov, Liudmila Mantrova Review: Amul Sul, Robert Haas, Tom Lane, Dmitry Dolgov, David G. Johnston
* Fix past pd_upper write in ginRedoRecompress()Alexander Korotkov2018-09-09
| | | | | | | | | | | | | | | | ginRedoRecompress() replays actions over compressed segments of posting list in-place. However, it might lead to write past pg_upper, because intermediate state during playing the changes can take more space than both original state and final state. This commit fixes that by refuse from in-place modification. Instead page tail is copied once modification is started, and then it's used as the source of original segments. Backpatch to 9.4 where posting list compression was introduced. Reported-by: Sivasubramanian Ramasubramanian Discussion: https://postgr.es/m/1536091151804.6588%40amazon.com Author: Alexander Korotkov based on patch from and ideas by Sivasubramanian Ramasubramanian Review: Sivasubramanian Ramasubramanian Backpatch-through: 9.4
* Work around stdbool problem in dfmgr.c.Tom Lane2018-09-09
| | | | | | | | | | | | Commit 842cb9fa6 refactored things so that dfmgr.c includes <dlfcn.h>, which before that had only been directly included in platform-specific stub files. It turns out that on macOS, <dlfcn.h> includes <stdbool.h>, and that causes problems on platforms where _Bool is not char-sized ... which happens to include the PPC versions of macOS. Work around it much as we have in plperl.h, by #undef'ing bool after including the problematic file, but only if we're not using stdbool-style booleans. Discussion: https://postgr.es/m/E1fxqjl-0003YS-NS@gemulon.postgresql.org
* Install a check for mis-linking of src/port and src/common functions.Tom Lane2018-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | On ELF-based platforms (and maybe others?) it's possible for a shared library, when dynamically loaded into the backend, to call the backend versions of src/port and src/common functions rather than the frontend versions that are actually linked into the shlib. This is definitely not what we want, because the frontend versions often behave slightly differently. Up to now it's been "slight" enough that nobody noticed; but with the addition of SCRAM support functions in src/common, we're observing crashes due to the difference between palloc and malloc memory allocation rules, as reported in bug #15367 from Jeremy Evans. The purpose of this patch is to create a direct test for this type of mis-linking, so that we know whether any given platform requires extra measures to prevent using the wrong functions. If the test fails, it will lead to connection failures in the contrib/postgres_fdw regression test. At the moment, *BSD platforms using ELF format are known to have the problem and can be expected to fail; but we need to know whether anything else does, and we need a reliable ongoing check for future platforms. Actually fixing the problem will be the subject of later commit(s). Discussion: https://postgr.es/m/153626613985.23143.4743626885618266803@wrigleys.postgresql.org
* Minor cleanup/future-proofing for pg_saslprep().Tom Lane2018-09-08
| | | | | | | | | | | | | | | | | Ensure that pg_saslprep() initializes its output argument to NULL in all failure paths, and then remove the redundant initialization that some (not all) of its callers did. This does not fix any live bug, but it reduces the odds of future bugs of omission. Also add a comment about why the existing failure-path coding is adequate. Back-patch so as to keep the function's API consistent across branches, again to forestall future bug introduction. Patch by me, reviewed by Michael Paquier Discussion: https://postgr.es/m/16558.1536407783@sss.pgh.pa.us
* Remove duplicated words split across lines in commentsMichael Paquier2018-09-08
| | | | | | | | This has been detected using some interesting tricks with sed, and the method used is mentioned in details in the discussion below. Author: Justin Pryzby Discussion: https://postgr.es/m/20180908013109.GB15350@telsasoft.com
* Save/restore SPI's global variables in SPI_connect() and SPI_finish().Tom Lane2018-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes two sources of interference between nominally independent functions when one SPI-using function calls another, perhaps without knowing that it does so. Chapman Flack pointed out that xml.c's query_to_xml_internal() expects SPI_tuptable and SPI_processed to stay valid across datatype output function calls; but it's possible that such a call could involve re-entrant use of SPI. It seems likely that there are similar hazards elsewhere, if not in the core code then in third-party SPI users. Previously SPI_finish() reset SPI's API globals to zeroes/nulls, which would typically make for a crash in such a situation. Restoring them to the values they had at SPI_connect() seems like a considerably more useful behavior, and it still meets the design goal of not leaving any dangling pointers to tuple tables of the function being exited. Also, cause SPI_connect() to reset these variables to zeroes/nulls after saving them. This prevents interference in the opposite direction: it's possible that a SPI-using function that's only ever been tested standalone contains assumptions that these variables start out as zeroes. That was the case as long as you were the outermost SPI user, but not so much for an inner user. Now it's consistent. Report and fix suggestion by Chapman Flack, actual patch by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/9fa25bef-2e4f-1c32-22a4-3ad0723c4a17@anastigmatix.net
* Limit depth of forced recursion for CLOBBER_CACHE_RECURSIVELY.Tom Lane2018-09-07
| | | | | | | | | | | | It's somewhat surprising that we got away with this before. (Actually, since nobody tests this routinely AFAIK, it might've been broken for awhile. But it's definitely broken in the wake of commit f868a8143.) It seems sufficient to limit the forced recursion to a small number of levels. Back-patch to all supported branches, like the preceding patch. Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us
* Fix longstanding recursion hazard in sinval message processing.Tom Lane2018-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LockRelationOid and sibling routines supposed that, if our session already holds the lock they were asked to acquire, they could skip calling AcceptInvalidationMessages on the grounds that we must have already read any remote sinval messages issued against the relation being locked. This is normally true, but there's a critical special case where it's not: processing inside AcceptInvalidationMessages might attempt to access system relations, resulting in a recursive call to acquire a relation lock. Hence, if the outer call had acquired that same system catalog lock, we'd fall through, despite the possibility that there's an as-yet-unread sinval message for that system catalog. This could, for example, result in failure to access a system catalog or index that had just been processed by VACUUM FULL. This is the explanation for buildfarm failures we've been seeing intermittently for the past three months. The bug is far older than that, but commits a54e1f158 et al added a new recursion case within AcceptInvalidationMessages that is apparently easier to hit than any previous case. To fix this, we must not skip calling AcceptInvalidationMessages until we have *finished* a call to it since acquiring a relation lock, not merely acquired the lock. (There's already adequate logic inside AcceptInvalidationMessages to deal with being called recursively.) Fortunately, we can implement that at trivial cost, by adding a flag to LOCALLOCK hashtable entries that tracks whether we know we have completed such a call. There is an API hazard added by this patch for external callers of LockAcquire: if anything is testing for LOCKACQUIRE_ALREADY_HELD, it might be fooled by the new return code LOCKACQUIRE_ALREADY_CLEAR into thinking the lock wasn't already held. This should be a fail-soft condition, though, unless something very bizarre is being done in response to the test. Also, I added an additional output argument to LockAcquireExtended, assuming that that probably isn't called by any outside code given the very limited usefulness of its additional functionality. Back-patch to all supported branches. Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us
* Improve handling of corrupted two-phase state files at recoveryMichael Paquier2018-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When a corrupted two-phase state file is found by WAL replay, be it for crash recovery or archive recovery, then the file is simply skipped and a WARNING is logged to the user, causing the transaction to be silently lost. Facing an on-disk WAL file which is corrupted is as likely to happen as what is stored in WAL records, but WAL records are already able to fail hard if there is a CRC mismatch. On-disk two-phase state files, on the contrary, are simply ignored if corrupted. Note that when restoring the initial two-phase data state at recovery, files newer than the horizon XID are discarded hence no files present in pg_twophase/ should be torned and have been made durable by a previous checkpoint, so recovery should never see any corrupted two-phase state file by design. The situation got better since 978b2f6 which has added two-phase state information directly in WAL instead of using on-disk files, so the risk is limited to two-phase transactions which live across at least one checkpoint for long periods. Backups having legit two-phase state files on-disk could also lose silently transactions when restored if things get corrupted. This behavior exists since two-phase commit has been introduced, no back-patch is done for now per the lack of complaints about this problem. Author: Michael Paquier Discussion: https://postgr.es/m/20180709050309.GM1467@paquier.xyz
* Use C99 designated initializers for some structsPeter Eisentraut2018-09-07
| | | | | | | These are just a few particularly egregious cases that were hard to read and write, and error prone because of many similar adjacent types. Discussion: https://www.postgresql.org/message-id/flat/4c9f01be-9245-2148-b569-61a8562ef190%402ndquadrant.com
* Refactor dlopen() supportPeter Eisentraut2018-09-06
| | | | | | | | | | Nowadays, all platforms except Windows and older HP-UX have standard dlopen() support. So having a separate implementation per platform under src/backend/port/dynloader/ is a bit excessive. Instead, treat dlopen() like other library functions that happen to be missing sometimes and put a replacement implementation under src/port/. Discussion: https://www.postgresql.org/message-id/flat/e11a49cb-570a-60b7-707d-7084c8de0e61%402ndquadrant.com#54e735ae37476a121abb4e33c2549b03
* Remove no-longer-used variable.Tom Lane2018-09-05
| | | | Oversight in 2fbdf1b38. Per buildfarm.
* Simplify partitioned table creation vs. relcacheAlvaro Herrera2018-09-05
| | | | | | | | | | | | | | | | | | | In the original code, we were storing the pg_inherits row for a partitioned table too early: enough that we had a hack for relcache to avoid falling flat on its face while reading such a partial entry. If we finish the pg_class creation first and *then* store the pg_inherits entry, we don't need that hack. Also recognize that pg_class.relpartbound is not marked NOT NULL and therefore it's entirely possible to read null values, so having only Assert() protection isn't enough. Change those to if/elog tests instead. This qualifies as a robustness fix, so backpatch to pg11. In passing, remove one access that wasn't actually needed, and reword one message to be like all the others that check for the same thing. Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20180903213916.hh6wasnrdg6xv2ud@alvherre.pgsql
* Improve some error message strings and errcodesMichael Paquier2018-09-04
| | | | | | | | | | | | | | | | This makes a bit less work for translators, by unifying error strings a bit more with what the rest of the code does, this time for three error strings in autoprewarm and one in base backup code. After some code review of slot.c, some file-access errcodes are reported but lead to an incorrect internal error, while corrupted data makes the most sense, similarly to the previous work done in e41d0a1. Also, after calling rmtree(), a WARNING gets reported, which is a duplicate of what the internal call report, so make the code more consistent with all other code paths calling this function. Author: Michael Paquier Discussion: https://postgr.es/m/20180902200747.GC1343@paquier.xyz
* Fully enforce uniqueness of constraint names.Tom Lane2018-09-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
* Prohibit pushing subqueries containing window function calculation toAmit Kapila2018-09-04
| | | | | | | | | | | | | | | | | | | | workers. Allowing window function calculation in workers leads to inconsistent results because if the input row ordering is not fully deterministic, the output of window functions might vary across workers. The fix is to treat them as parallel-restricted. In the passing, improve the coding pattern in max_parallel_hazard_walker so that it has a chain of mutually-exclusive if ... else if ... else if ... else if ... IsA tests. Reported-by: Marko Tiikkaja Bug: 15324 Author: Amit Kapila Reviewed-by: Tom Lane Backpatch-through: 9.6 Discussion: https://postgr.es/m/CAL9smLAnfPJCDUUG4ckX2iznj53V7VSMsYefzZieN93YxTNOcw@mail.gmail.com
* During the split, set checksum on an empty hash index page.Amit Kapila2018-09-04
| | | | | | | | | | | | | | | | On a split, we allocate a new splitpoint's worth of bucket pages wherein we initialize the last page with zeros which is fine, but we forgot to set the checksum for that last page. We decided to back-patch this fix till 10 because we don't have an easy way to test it in prior versions. Another reason is that the hash-index code is changed heavily in 10, so it is not advisable to push the fix without testing it in prior versions. Author: Amit Kapila Reviewed-by: Yugo Nagata Backpatch-through: 10 Discussion: https://postgr.es/m/5d03686d-727c-dbf8-0064-bf8b97ffe850@2ndquadrant.com
* Remove pg_constraint.conincludingAlvaro Herrera2018-09-03
| | | | | | | | | | | | This column was added in commit 8224de4f42cc ("Indexes with INCLUDE columns and their support in B-tree") to ease writing the ruleutils.c supporting code for that feature, but it turns out to be unnecessary -- we can do the same thing with just one more syscache lookup. Even the documentation for the new column being removed in this commit is awkward. Discussion: https://postgr.es/m/20180902165018.33otxftp3olgtu4t@alvherre.pgsql
* Fix memory leak in TRUNCATE decodingTomas Vondra2018-09-03
| | | | | | | | | | | | | | | | | | | | | | | | When decoding a TRUNCATE record, the relids array was being allocated in the main ReorderBuffer memory context, but not released with the change resulting in a memory leak. The array was also ignored when serializing/deserializing the change, assuming all the information is stored in the change itself. So when spilling the change to disk, we've only we have serialized only the pointer to the relids array. Thanks to never releasing the array, the pointer however remained valid even after loading the change back to memory, preventing an actual crash. This fixes both the memory leak and (de)serialization. The relids array is still allocated in the main ReorderBuffer memory context (none of the existing ones seems like a good match, and adding an extra context seems like an overkill). The allocation is wrapped in a new ReorderBuffer API functions, to keep the details within reorderbuffer.c, just like the other ReorderBufferGet methods do. Author: Tomas Vondra Discussion: https://www.postgresql.org/message-id/flat/66175a41-9342-2845-652f-1bd4c3ee50aa%402ndquadrant.com Backpatch: 11, where decoding of TRUNCATE was introduced
* Fix initial sync of slot parent directory when restoring statusMichael Paquier2018-09-02
| | | | | | | | | | | | | | | | At the beginning of recovery, information from replication slots is recovered from disk to memory. In order to ensure the durability of the information, the status file as well as its parent directory are synced. It happens that the sync on the parent directory was done directly using the status file path, which is logically incorrect, and the current code has been doing a sync on the same object twice in a row. Reported-by: Konstantin Knizhnik Diagnosed-by: Konstantin Knizhnik Author: Michael Paquier Discussion: https://postgr.es/m/9eb1a6d5-b66f-2640-598d-c5ea46b8f68a@postgrespro.ru Backpatch-through: 9.4-
* Avoid using potentially-under-aligned page buffers.Tom Lane2018-09-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a project policy against using plain "char buf[BLCKSZ]" local or static variables as page buffers; preferred style is to palloc or malloc each buffer to ensure it is MAXALIGN'd. However, that policy's been ignored in an increasing number of places. We've apparently got away with it so far, probably because (a) relatively few people use platforms on which misalignment causes core dumps and/or (b) the variables chance to be sufficiently aligned anyway. But this is not something to rely on. Moreover, even if we don't get a core dump, we might be paying a lot of cycles for misaligned accesses. To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock that the compiler must allocate with sufficient alignment, and use those in place of plain char arrays. I used these types even for variables where there's no risk of a misaligned access, since ensuring proper alignment should make kernel data transfers faster. I also changed some places where we had been palloc'ing short-lived buffers, for coding style uniformity and to save palloc/pfree overhead. Since this seems to be a live portability hazard (despite the lack of field reports), back-patch to all supported versions. Patch by me; thanks to Michael Paquier for review. Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
* Implement "pg_ctl logrotate" commandAlexander Korotkov2018-09-01
| | | | | | | | | | | | Currently there are two ways to trigger log rotation in logging collector process: call pg_rotate_logfile() SQL-function or send SIGUSR1 signal directly to logging collector process. However, it's nice to have more suitable way for external tools to do that, which wouldn't require SQL connection or knowledge of logging collector pid. This commit implements triggering log rotation by "pg_ctl logrotate" command. Discussion: https://postgr.es/m/20180416.115435.28153375.horiguchi.kyotaro%40lab.ntt.co.jp Author: Kyotaro Horiguchi, Alexander Kuzmenkov, Alexander Korotkov
* Ignore server-side delays when enforcing wal_sender_timeout.Noah Misch2018-08-31
| | | | | | | | | | | Healthy clients of servers having poor I/O performance, such as buildfarm members hamster and tern, saw unexpected timeouts. That disagreed with documentation. This fix adds one gettimeofday() call whenever ProcessRepliesIfAny() finds no client reply messages. Back-patch to 9.4; the bug's symptom is rare and mild, and the code all moved between 9.3 and 9.4. Discussion: https://postgr.es/m/20180826034600.GA1105084@rfd.leadboat.com
* Ensure correct minimum consistent point on standbysMichael Paquier2018-08-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Startup process has improved its calculation of incorrect minimum consistent point in 8d68ee6, which ensures that all WAL available gets replayed when doing crash recovery, and has introduced an incorrect calculation of the minimum recovery point for non-startup processes, which can cause incorrect page references on a standby when for example the background writer flushed a couple of pages on-disk but was not updating the control file to let a subsequent crash recovery replay to where it should have. The only case where this has been reported to be a problem is when a standby needs to calculate the latest removed xid when replaying a btree deletion record, so one would need connections on a standby that happen just after recovery has thought it reached a consistent point. Using a background worker which is started after the consistent point is reached would be the easiest way to get into problems if it connects to a database. Having clients which attempt to connect periodically could also be a problem, but the odds of seeing this problem are much lower. The fix used is pretty simple, as the idea is to give access to the minimum recovery point written in the control file to non-startup processes so as they use a reference, while the startup process still initializes its own references of the minimum consistent point so as the original problem with incorrect page references happening post-promotion with a crash do not show up. Reported-by: Alexander Kukushkin Diagnosed-by: Alexander Kukushkin Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, Alexander Kukushkin Discussion: https://postgr.es/m/153492341830.1368.3936905691758473953@wrigleys.postgresql.org Backpatch-through: 9.3
* Disable support for partitionwise joins in problematic cases.Etsuro Fujita2018-08-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f49842d, which added support for partitionwise joins, built the child's tlist by applying adjust_appendrel_attrs() to the parent's. So in the case where the parent's included a whole-row Var for the parent, the child's contained a ConvertRowtypeExpr. To cope with that, that commit added code to the planner, such as setrefs.c, but some code paths still assumed that the tlist for a scan (or join) rel would only include Vars and PlaceHolderVars, which was true before that commit, causing errors: * When creating an explicit sort node for an input path for a mergejoin path for a child join, prepare_sort_from_pathkeys() threw the 'could not find pathkey item to sort' error. * When deparsing a relation participating in a pushed down child join as a subquery in contrib/postgres_fdw, get_relation_column_alias_ids() threw the 'unexpected expression in subquery output' error. * When performing set_plan_references() on a local join plan generated by contrib/postgres_fdw for EvalPlanQual support for a pushed down child join, fix_join_expr() threw the 'variable not found in subplan target lists' error. To fix these, two approaches have been proposed: one by Ashutosh Bapat and one by me. While the former keeps building the child's tlist with a ConvertRowtypeExpr, the latter builds it with a whole-row Var for the child not to violate the planner assumption, and tries to fix it up later, But both approaches need more work, so refuse to generate partitionwise join paths when whole-row Vars are involved, instead. We don't need to handle ConvertRowtypeExprs in the child's tlists for now, so this commit also removes the changes to the planner. Previously, partitionwise join computed attr_needed data for each child separately, and built the child join's tlist using that data, which also required an extra step for adding PlaceHolderVars to that tlist, but it would be more efficient to build it from the parent join's tlist through the adjust_appendrel_attrs() transformation. So this commit builds that list that way, and simplifies build_joinrel_tlist() and placeholder.c as well as part of set_append_rel_size() to basically what they were before partitionwise join went in. Back-patch to PG11 where partitionwise join was introduced. Report by Rajkumar Raghuwanshi. Analysis by Ashutosh Bapat, who also provided some of regression tests. Patch by me, reviewed by Robert Haas. Discussion: https://postgr.es/m/CAKcux6ktu-8tefLWtQuuZBYFaZA83vUzuRd7c1YHC-yEWyYFpg@mail.gmail.com
* Remove extra word from src/backend/optimizer/READMEEtsuro Fujita2018-08-31
|
* Error position support for partition specificationsPeter Eisentraut2018-08-30
| | | | | | Add support for error position reporting for partition specifications. Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
* Error position support for defaults and check constraintsPeter Eisentraut2018-08-30
| | | | | | | | | Add support for error position reporting for the expressions contained in defaults and check constraint definitions. This currently works only for CREATE TABLE, not ALTER TABLE, because the latter is not set up to pass around the original query string. Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
* Stop bgworkers during fast shutdown with postmaster in startup phaseMichael Paquier2018-08-29
| | | | | | | | | | | | | | When a postmaster gets into its phase PM_STARTUP, it would start background workers using BgWorkerStart_PostmasterStart mode immediately, which would cause problems for a fast shutdown as the postmaster forgets to send SIGTERM to already-started background workers. With smart and immediate shutdowns, this correctly happened, and fast shutdown is the only mode missing the shot. Author: Alexander Kukushkin Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAFh8B=mvnD8+DZUfzpi50DoaDfZRDfd7S=gwj5vU9GYn8UvHkA@mail.gmail.com Backpatch-through: 9.5
* Avoid quadratic slowdown in regexp match/split functions.Andrew Gierth2018-08-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | regexp_matches, regexp_split_to_table and regexp_split_to_array all work by compiling a list of match positions as character offsets (NOT byte positions) in the source string. Formerly, they then used text_substr to extract the matched text; but in a multi-byte encoding, that counts the characters in the string, and the characters needed to reach the starting byte position, on every call. Accordingly, the performance degraded as the product of the input string length and the number of match positions, such that splitting a string of a few hundred kbytes could take many minutes. Repair by keeping the wide-character copy of the input string available (only in the case where encoding_max_length is not 1) after performing the match operation, and extracting substrings from that instead. This reduces the complexity to being linear in the number of result bytes, discounting the actual regexp match itself (which is not affected by this patch). In passing, remove cleanup using retail pfree() which was obsoleted by commit ff428cded (Feb 2008) which made cleanup of SRF multi-call contexts automatic. Also increase (to ~134 million) the maximum number of matches and provide an error message when it is reached. Backpatch all the way because this has been wrong forever. Analysis and patch by me; review by Kaiting Chen. Discussion: https://postgr.es/m/87pnyn55qh.fsf@news-spur.riddles.org.uk see also https://postgr.es/m/87lg996g4r.fsf@news-spur.riddles.org.uk
* Fix snapshot leak warning for some proceduresPeter Eisentraut2018-08-27
| | | | | | | | | | | | | | | | | | | | The problem arises with the combination of CALL with output parameters and doing a COMMIT inside the procedure. When a CALL has output parameters, the portal uses the strategy PORTAL_UTIL_SELECT instead of PORTAL_MULTI_QUERY. Using PORTAL_UTIL_SELECT causes the portal's snapshot to be registered with the current resource owner (portal->holdSnapshot); see 9ee1cf04ab6bcefe03a11837b53f29ca9dc24c7a for the reason. Normally, PortalDrop() unregisters the snapshot. If not, then ResourceOwnerRelease() will print a warning about a snapshot leak on transaction commit. A transaction commit normally drops all portals (PreCommit_Portals()), except the active portal. So in case of the active portal, we need to manually release the snapshot to avoid the warning. Reported-by: Prabhat Sahu <prabhat.sahu@enterprisedb.com> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
* Improve VACUUM and ANALYZE by avoiding early lock queueMichael Paquier2018-08-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A caller of VACUUM can perform early lookup obtention which can cause other sessions to block on the request done, causing potentially DOS attacks as even a non-privileged user can attempt a vacuum fill of a critical catalog table to block even all incoming connection attempts. Contrary to TRUNCATE, a client could attempt a system-wide VACUUM after building the list of relations to VACUUM, which can cause vacuum_rel() or analyze_rel() to try to lock the relation but the operation would just block. When the client specifies a list of relations and the relation needs to be skipped, ownership checks are done when building the list of relations to work on, preventing a later lock attempt. vacuum_rel() already had the sanity checks needed, except that those were applied too late. This commit refactors the code so as relation skips are checked beforehand, making it safer to avoid too early locks, for both manual VACUUM with and without a list of relations specified. An isolation test is added emulating the fact that early locks do not happen anymore, issuing a WARNING message earlier if the user calling VACUUM is not a relation owner. When a partitioned table is listed in a manual VACUUM or ANALYZE command, its full list of partitions is fetched, all partitions get added to the list to work on, and then each one of them is processed one by one, with ownership checks happening at the later phase of vacuum_rel() or analyze_rel(). Trying to do early ownership checks for each partition is proving to be tedious as this would result in deadlock risks with lock upgrades, and skipping all partitions if the listed partitioned table is not owned would result in a behavior change compared to how Postgres 10 has implemented vacuum for partitioned tables. The original problem reported related to early lock queue for critical relations is fixed anyway, so priority is given to avoiding a backward-incompatible behavior. Reported-by: Lloyd Albin, Jeremy Schneider Author: Michael Paquier Reviewed by: Nathan Bossart, Kyotaro Horiguchi Discussion: https://postgr.es/m/152512087100.19803.12733865831237526317@wrigleys.postgresql.org Discussion: https://postgr.es/m/20180812222142.GA6097@paquier.xyz
* Fix typos.Thomas Munro2018-08-27
| | | | | Author: David Rowley Discussion: https://postgr.es/m/CAKJS1f8du35u5DprpykWvgNEScxapbWYJdHq%2Bz06Wj3Y2KFPbw%40mail.gmail.com
* Make syslogger more robust against failures in opening CSV log files.Tom Lane2018-08-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous coding figured it'd be good enough to postpone opening the first CSV log file until we got a message we needed to write there. This is unsafe, though, because if the open fails we end up in infinite recursion trying to report the failure. Instead make the CSV log file management code look as nearly as possible like the longstanding logic for the stderr log file. In particular, open it immediately at postmaster startup (if enabled), or when we get a SIGHUP in which we find that log_destination has been changed to enable CSV logging. It seems OK to fail if a postmaster-start-time open attempt fails, as we've long done for the stderr log file. But we can't die if we fail to open a CSV log file during SIGHUP, so we're still left with a problem. In that case, write any output meant for the CSV log file to the stderr log file. (This will also cover race-condition cases in which backends send CSV log data before or after we have the CSV log file open.) This patch also fixes an ancient oversight that, if CSV logging was turned off during a SIGHUP, we never actually closed the last CSV log file. In passing, remember to reset whereToSendOutput = DestNone during syslogger start, since (unlike all other postmaster children) it's forked before the postmaster has done that. This made for a platform-dependent difference in error reporting behavior between the syslogger and other children: except on Windows, it'd report problems to the original postmaster stderr as well as the normal error log file(s). It's barely possible that that was intentional at some point; but it doesn't seem likely to be desirable in production, and the platform dependency definitely isn't desirable. Per report from Alexander Kukushkin. It's been like this for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAFh8B==iLUD_gqC-dAENS0V+kVrCeGiKujtKqSQ7++S-caaChw@mail.gmail.com
* Reconsider new file extension in commit 91f26d5f.Jeff Davis2018-08-25
| | | | | | | Andres and Tom objected to the choice of the ".tmp" extension. Changing to Andres's suggestion of ".spill". Discussion: https://postgr.es/m/88092095-3348-49D8-8746-EB574B1D30EA%40anarazel.de
* Change extension of spilled ReorderBufferChange data to ".tmp".Jeff Davis2018-08-25
| | | | | | | The previous extension, ".snap", was chosen for historical reasons and became confusing. Discussion: https://postgr.es/m/CAMp0ubd_P8vBGx8=MfDXQJZxHA5D_Zarw5cCkDxJ_63+pWRJ9w@mail.gmail.com
* LLVMJIT: LLVMGetHostCPUFeatures now is upstream, use LLMV version if available.Andres Freund2018-08-24
| | | | | | | Noticed thanks to buildfarm animal seawasp. Author: Andres Freund Backpatch: v11-, where LLVM based JIT compliation was introduced.
* Suppress uninitialized-variable warning in new SCRAM code.Tom Lane2018-08-24
| | | | | | | While we generally don't sweat too much about "may be used uninitialized" warnings from older compilers, I noticed that there's a fair number of buildfarm animals that are producing such a warning *only* for this variable. So it seems worth silencing.
* Introduce minimal C99 usage to verify compiler support.Andres Freund2018-08-23
| | | | | | | | | This just converts a few for loops in postgres.c to declare variables in the loop initializer, and uses designated initializers in smgr.c's definition of smgr callbacks. Author: Andres Freund Discussion: https://postgr.es/m/97d4b165-192d-3605-749c-f614a0c4e783@2ndquadrant.com
* Deduplicate code between slot_getallattrs() and slot_getsomeattrs().Andres Freund2018-08-23
| | | | | | | | | | | | | | | Code in slot_getallattrs() is the same as if slot_getsomeattrs() is called with number of attributes specified in the tuple descriptor. Implement it that way instead of duplicating the code between those two functions. This is part of a patchseries abstracting TupleTableSlots so they can store arbitrary forms of tuples, but is a nice enough cleanup on its own. Author: Ashutosh Bapat Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
* Fix lexing of standard multi-character operators in edge cases.Andrew Gierth2018-08-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commits c6b3c939b (which fixed the precedence of >=, <=, <> operators) and 865f14a2d (which added support for the standard => notation for named arguments) created a class of lexer tokens which look like multi-character operators but which have their own token IDs distinct from Op. However, longest-match rules meant that following any of these tokens with another operator character, as in (1<>-1), would cause them to be incorrectly returned as Op. The error here isn't immediately obvious, because the parser would usually still find the correct operator via the Op token, but there were more subtle problems: 1. If immediately followed by a comment or +-, >= <= <> would be given the old precedence of Op rather than the correct new precedence; 2. If followed by a comment, != would be returned as Op rather than as NOT_EQUAL, causing it not to be found at all; 3. If followed by a comment or +-, the => token for named arguments would be lexed as Op, causing the argument to be mis-parsed as a simple expression, usually causing an error. Fix by explicitly checking for the operators in the {operator} code block in addition to all the existing special cases there. Backpatch to 9.5 where the problem was introduced. Analysis and patch by me; review by Tom Lane. Discussion: https://postgr.es/m/87va851ppl.fsf@news-spur.riddles.org.uk
* Reduce an unnecessary O(N^3) loop in lexer.Andrew Gierth2018-08-23
| | | | | | | | | | | | The lexer's handling of operators contained an O(N^3) hazard when dealing with long strings of + or - characters; it seems hard to prevent this case from being O(N^2), but the additional N multiplier was not needed. Backpatch all the way since this has been there since 7.x, and it presents at least a mild hazard in that trying to do Bind, PREPARE or EXPLAIN on a hostile query could take excessive time (without honouring cancels or timeouts) even if the query was never executed.
* Change PROCEDURE to FUNCTION in CREATE TRIGGER syntaxPeter Eisentraut2018-08-22
| | | | | | | | | | | | | Since procedures are now a different thing from functions, change the CREATE TRIGGER and CREATE EVENT TRIGGER syntax to use FUNCTION in the clause that specifies the function. PROCEDURE is still accepted for compatibility. pg_dump and ruleutils.c output is not changed yet, because that would require a change in information_schema.sql and thus a catversion change. Reported-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>