aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAge
* Doc: clarify description of degenerate NATURAL joins.Tom Lane2017-07-20
| | | | | | | | | Claiming that NATURAL JOIN is equivalent to CROSS JOIN when there are no common column names is only strictly correct if it's an inner join; you can't say e.g. CROSS LEFT JOIN. Better to explain it as meaning JOIN ON TRUE, instead. Per a suggestion from David Johnston. Discussion: https://postgr.es/m/CAKFQuwb+mYszQhDS9f_dqRrk1=Pe-S6D=XMkAXcDf4ykKPmgKQ@mail.gmail.com
* Fix dumping of outer joins with empty qual lists.Tom Lane2017-07-20
| | | | | | | | | | | | | | | | | Normally, a JoinExpr would have empty "quals" only if it came from CROSS JOIN syntax. However, it's possible to get to this state by specifying NATURAL JOIN between two tables with no common column names, and there might be other ways too. The code previously printed no ON clause if "quals" was empty; that's right for CROSS JOIN but syntactically invalid if it's some type of outer join. Fix by printing ON TRUE in that case. This got broken by commit 2ffa740be, which stopped using NATURAL JOIN syntax in ruleutils output due to its brittleness in the face of column renamings. Back-patch to 9.3 where that commit appeared. Per report from Tushar Ahuja. Discussion: https://postgr.es/m/98b283cd-6dda-5d3f-f8ac-87db8c76a3da@enterprisedb.com
* Doc: add missing note about permissions needed to change log_lock_waits.Tom Lane2017-07-19
| | | | | | | log_lock_waits is PGC_SUSET, but config.sgml lacked the standard boilerplate sentence noting that. Report: https://postgr.es/m/20170719100838.19352.16320@wrigleys.postgresql.org
* Doc: explain dollar quoting in the intro part of the pl/pgsql chapter.Tom Lane2017-07-17
| | | | | | | | We're throwing people into the guts of the syntax with not much context; let's back up one step and point out that this goes inside a literal in a CREATE FUNCTION command. Per suggestion from Kurt Kartaltepe. Discussion: https://postgr.es/m/CACawnnyWAmH+au8nfZhLiFfWKjXy4d0kY+eZWfcxPRnjVfaa_Q@mail.gmail.com
* Merge large_object.sql test into largeobject.source.Tom Lane2017-07-17
| | | | | | | | | | | | | | | | | It seems pretty confusing to have tests named both largeobject and large_object. The latter is of very recent vintage (commit ff992c074), so get rid of it in favor of merging into the former. Also, enable the LO comment test that was added by commit 70ad7ed4e, since the later commit added the then-missing pg_upgrade functionality. The large_object.sql test case is almost completely redundant with that, but not quite: it seems like creating a user-defined LO with an OID in the system range might be an interesting case for pg_upgrade, so let's keep it. Like the earlier patch, back-patch to all supported branches. Discussion: https://postgr.es/m/18665.1500306372@sss.pgh.pa.us
* fix typoAndrew Dunstan2017-07-16
|
* Fix vcregress.pl PROVE_FLAGS bug in commit 93b7d9731fAndrew Dunstan2017-07-16
| | | | | | | This change didn't adjust the publicly visible taptest function, causing buildfarm failures on bowerbird. Backpatch to 9.4 like previous change.
* Fix pg_basebackup output to stdout on Windows.Heikki Linnakangas2017-07-14
| | | | | | | | | | | | | | | When writing a backup to stdout with pg_basebackup on Windows, put stdout to binary mode. Any CR bytes in the output will otherwise be output incorrectly as CR+LF. In the passing, standardize on using "_setmode" instead of "setmode", for the sake of consistency. They both do the same thing, but according to MSDN documentation, setmode is deprecated. Fixes bug #14634, reported by Henry Boehlert. Patch by Haribabu Kommi. Backpatch to all supported versions. Discussion: https://www.postgresql.org/message-id/20170428082818.24366.13134@wrigleys.postgresql.org
* Fix dumping of FUNCTION RTEs that contain non-function-call expressions.Tom Lane2017-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | The grammar will only accept something syntactically similar to a function call in a function-in-FROM expression. However, there are various ways to input something that ruleutils.c won't deparse that way, potentially leading to a view or rule that fails dump/reload. Fix by inserting a dummy CAST around anything that isn't going to deparse as a function (which is one of the ways to get something like that in there in the first place). In HEAD, also make use of the infrastructure added by this to avoid emitting unnecessary parentheses in CREATE INDEX deparsing. I did not change that in back branches, thinking that people might find it to be unexpected/unnecessary behavioral change. In HEAD, also fix incorrect logic for when to add extra parens to partition key expressions. Somebody apparently thought they could get away with simpler logic than pg_get_indexdef_worker has, but they were wrong --- a counterexample is PARTITION BY LIST ((a[1])). Ignoring the prettyprint flag for partition expressions isn't exactly a nice solution anyway. This has been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/10477.1499970459@sss.pgh.pa.us
* Fix race between GetNewTransactionId and GetOldestActiveTransactionId.Heikki Linnakangas2017-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The race condition goes like this: 1. GetNewTransactionId advances nextXid e.g. from 100 to 101 2. GetOldestActiveTransactionId reads the new nextXid, 101 3. GetOldestActiveTransactionId loops through the proc array. There are no active XIDs there, so it returns 101 as the oldest active XID. 4. GetNewTransactionid stores XID 100 to MyPgXact->xid So, GetOldestActiveTransactionId returned XID 101, even though 100 only just started and is surely still running. This would be hard to hit in practice, and even harder to spot any ill effect if it happens. GetOldestActiveTransactionId is only used when creating a checkpoint in a master server, and the race condition can only happen on an online checkpoint, as there are no backends running during a shutdown checkpoint. The oldestActiveXid value of an online checkpoint is only used when starting up a hot standby server, to determine the starting point where pg_subtrans is initialized from. For the race condition to happen, there must be no other XIDs in the proc array that would hold back the oldest-active XID value, which means that the missed XID must be a top transaction's XID. However, pg_subtrans is not used for top XIDs, so I believe an off-by-one error is in fact inconsequential. Nevertheless, let's fix it, as it's clearly wrong and the fix is simple. This has been wrong ever since hot standby was introduced, so backport to all supported versions. Discussion: https://www.postgresql.org/message-id/e7258662-82b6-7a45-56d4-99b337a32bf7@iki.fi
* Fix ruleutils.c for domain-over-array cases, too.Tom Lane2017-07-12
| | | | | | | | | | | | | Further investigation shows that ruleutils isn't quite up to speed either for cases where we have a domain-over-array: it needs to be prepared to look past a CoerceToDomain at the top level of field and element assignments, else it decompiles them incorrectly. Potentially this would result in failure to dump/reload a rule, if it looked like the one in the new test case. (I also added a test for EXPLAIN; that output isn't broken, but clearly we need more test coverage here.) Like commit b1cb32fb6, this bug is reachable in cases we already support, so back-patch all the way.
* Reduce memory usage of tsvector type analyze function.Heikki Linnakangas2017-07-12
| | | | | | | | | | | | | | | | | compute_tsvector_stats() detoasted and kept in memory every tsvector value in the sample, but that can be a lot of memory. The original bug report described a case using over 10 gigabytes, with statistics target of 10000 (the maximum). To fix, allocate a separate copy of just the lexemes that we keep around, and free the detoasted tsvector values as we go. This adds some palloc/pfree overhead, when you have a lot of distinct lexemes in the sample, but it's better than running out of memory. Fixes bug #14654 reported by James C. Reviewed by Tom Lane. Backport to all supported versions. Discussion: https://www.postgresql.org/message-id/20170514200602.1451.46797@wrigleys.postgresql.org
* Avoid integer overflow while sifting-up a heap in tuplesort.c.Tom Lane2017-07-12
| | | | | | | | | | | | | | | | If the number of tuples in the heap exceeds approximately INT_MAX/2, this loop's calculation "2*i+1" could overflow, resulting in a crash. Fix it by using unsigned int rather than int for the relevant local variables; that shouldn't cost anything extra on any popular hardware. Per bug #14722 from Sergey Koposov. Original patch by Sergey Koposov, modified by me per a suggestion from Heikki Linnakangas to use unsigned int not int64. Back-patch to 9.4, where tuplesort.c grew the ability to sort as many as INT_MAX tuples in-memory (commit 263865a48). Discussion: https://postgr.es/m/20170629161637.1478.93109@wrigleys.postgresql.org
* Fix variable and type name in comment.Heikki Linnakangas2017-07-12
| | | | | | Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/20170711.163441.241981736.horiguchi.kyotaro@lab.ntt.co.jp
* Remove unnecessary braces, to match the surrounding style.Heikki Linnakangas2017-07-12
| | | | | | | | | Mostly in the new subscription-related commands. Backport the few that were also present in older versions. Thomas Munro Discussion: https://www.postgresql.org/message-id/CAEepm=3CyW1QmXcXJXmqiJXtXzFDc8SvSfnxkEGD3Bkv2SrkeQ@mail.gmail.com
* Fix multiple assignments to a column of a domain type.Tom Lane2017-07-11
| | | | | | | | | | | | | | | | | | | We allow INSERT and UPDATE commands to assign to the same column more than once, as long as the assignments are to subfields or elements rather than the whole column. However, this failed when the target column was a domain over array rather than plain array. Fix by teaching process_matched_tle() to look through CoerceToDomain nodes, and add relevant test cases. Also add a group of test cases exercising domains over array of composite. It's doubtless accidental that CREATE DOMAIN allows this case while not allowing straight domain over composite; but it does, so we'd better make sure we don't break it. (I could not find any documentation mentioning either side of that, so no doc changes.) It's been like this for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us
* On Windows, retry process creation if we fail to reserve shared memory.Tom Lane2017-07-10
| | | | | | | | | | | | | | | We've heard occasional reports of backend launch failing because pgwin32_ReserveSharedMemoryRegion() fails, indicating that something has already used that address space in the child process. It's not very clear what, given that we disable ASLR in Windows builds, but suspicion falls on antivirus products. It'd be better if we didn't have to disable ASLR, anyway. So let's try to ameliorate the problem by retrying the process launch after such a failure, up to 100 times. Patch by me, based on previous work by Amit Kapila and others. This is a longstanding issue, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAA4eK1+R6hSx6t_yvwtx+NRzneVp+MRqXAdGJZChcau8Uij-8g@mail.gmail.com
* Doc: clarify wording about tool requirements in sourcerepo.sgml.Tom Lane2017-07-10
| | | | | | | | | Original wording had confusingly vague antecedent for "they", so replace that with a more repetitive but clearer formulation. In passing, make the link to the installation requirements section more specific. Per gripe from Martin Mai, though this is not the fix he initially proposed. Discussion: https://postgr.es/m/CAN_NWRu-cWuNaiXUjV3m4H-riWURuPW=j21bSaLADs6rjjzXgQ@mail.gmail.com
* Fix potential data corruption during freezeTeodor Sigaev2017-07-06
| | | | | | | Fix oversight in 3b97e6823b94 bug fix. Bitwise AND is used instead of OR and it cleans all bits in t_infomask heap tuple field. Backpatch to 9.3
* Treat clean shutdown of an SSL connection same as the non-SSL case.Heikki Linnakangas2017-07-03
| | | | | | | | | | If the client closes an SSL connection, treat it the same as EOF on a non-SSL connection. In particular, don't write a message in the log about that. Michael Paquier. Discussion: https://www.postgresql.org/message-id/CAB7nPqSfyVV42Q2acFo%3DvrvF2gxoZAMJLAPq3S3KkjhZAYi7aw@mail.gmail.com
* Fix walsender to exit promptly if client requests shutdown.Tom Lane2017-06-30
| | | | | | | | | | | | | | It's possible for WalSndWaitForWal to be asked to wait for WAL that doesn't exist yet. That's fine, in fact it's the normal situation if we're caught up; but when the client requests shutdown we should not keep waiting. The previous coding could wait indefinitely if the source server was idle. In passing, improve the rather weak comments in this area, and slightly rearrange some related code for better readability. Back-patch to 9.4 where this code was introduced. Discussion: https://postgr.es/m/14154.1498781234@sss.pgh.pa.us
* Second try at fixing tcp_keepalives_idle option on Solaris.Tom Lane2017-06-28
| | | | | | | | | | | | | | | | | | | | | | | Buildfarm evidence shows that TCP_KEEPALIVE_THRESHOLD doesn't exist after all on Solaris < 11. This means we need to take positive action to prevent the TCP_KEEPALIVE code path from being taken on that platform. I've chosen to limit it with "&& defined(__darwin__)", since it's unclear that anyone else would follow Apple's precedent of spelling the symbol that way. Also, follow a suggestion from Michael Paquier of eliminating code duplication by defining a couple of intermediate symbols for the socket option. In passing, make some effort to reduce the number of translatable messages by replacing "setsockopt(foo) failed" with "setsockopt(%s) failed", etc, throughout the affected files. And update relevant documentation so that it doesn't claim to provide an exhaustive list of the possible socket option names. Like the previous commit (f0256c774), back-patch to all supported branches. Discussion: https://postgr.es/m/20170627163757.25161.528@wrigleys.postgresql.org
* Support tcp_keepalives_idle option on Solaris.Tom Lane2017-06-27
| | | | | | | | | | | | | | | | Turns out that the socket option for this is named TCP_KEEPALIVE_THRESHOLD, at least according to the tcp(7P) man page for Solaris 11. (But since that text refers to "SunOS", it's likely pretty ancient.) It appears that the symbol TCP_KEEPALIVE does get defined on that platform, but it doesn't seem to represent a valid protocol-level socket option. This leads to bleats in the postmaster log, and no tcp_keepalives_idle functionality. Per bug #14720 from Andrey Lizenko, as well as an earlier report from Dhiraj Chawla that nobody had followed up on. The issue's been there since we added the TCP_KEEPALIVE code path in commit 5acd417c8, so back-patch to all supported branches. Discussion: https://postgr.es/m/20170627163757.25161.528@wrigleys.postgresql.org
* Re-allow SRFs and window functions within sub-selects within aggregates.Tom Lane2017-06-27
| | | | | | | | | | | check_agg_arguments_walker threw an error upon seeing a SRF or window function, but that is too aggressive: if the function is within a sub-select then it's perfectly fine. I broke the SRF case in commit 0436f6bde by copying the logic for window functions ... but that was broken too, and had been since commit eaccfded9. Repair both cases in HEAD, and the window function case back to 9.3. 9.2 gets this right.
* Don't lose walreceiver start requests due to race condition in postmaster.Tom Lane2017-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a walreceiver dies, the startup process will notice that and send a PMSIGNAL_START_WALRECEIVER signal to the postmaster, asking for a new walreceiver to be launched. There's a race condition, which at least in HEAD is very easy to hit, whereby the postmaster might see that signal before it processes the SIGCHLD from the walreceiver process. In that situation, sigusr1_handler() just dropped the start request on the floor, reasoning that it must be redundant. Eventually, after 10 seconds (WALRCV_STARTUP_TIMEOUT), the startup process would make a fresh request --- but that's a long time if the connection could have been re-established almost immediately. Fix it by setting a state flag inside the postmaster that we won't clear until we do launch a walreceiver. In cases where that results in an extra walreceiver launch, it's up to the walreceiver to realize it's unwanted and go away --- but we have, and need, that logic anyway for the opposite race case. I came across this through investigating unexpected delays in the src/test/recovery TAP tests: it manifests there in test cases where a master server is stopped and restarted while leaving streaming slaves active. This logic has been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/21344.1498494720@sss.pgh.pa.us
* Ignore old stats file timestamps when starting the stats collector.Tom Lane2017-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The stats collector disregards inquiry messages that bear a cutoff_time before when it last wrote the relevant stats file. That's fine, but at startup when it reads the "permanent" stats files, it absorbed their timestamps as if they were the times at which the corresponding temporary stats files had been written. In reality, of course, there's no data out there at all. This led to disregarding inquiry messages soon after startup if the postmaster had been shut down and restarted within less than PGSTAT_STAT_INTERVAL; which is a pretty common scenario, both for testing and in the field. Requesting backends would hang for 10 seconds and then report failure to read statistics, unless they got bailed out by some other backend coming along and making a newer request within that interval. I came across this through investigating unexpected delays in the src/test/recovery TAP tests: it manifests there because the autovacuum launcher hangs for 10 seconds when it can't get statistics at startup, thus preventing a second shutdown from occurring promptly. We might want to do some things in the autovac code to make it less prone to getting stuck that way, but this change is a good bug fix regardless. In passing, also fix pgstat_read_statsfiles() to ensure that it re-zeroes its global stats variables if they are corrupted by a short read from the stats file. (Other reads in that function go into temp variables, so that the issue doesn't arise.) This has been broken since we created the separation between permanent and temporary stats files in 8.4, so back-patch to all supported branches. Discussion: https://postgr.es/m/16860.1498442626@sss.pgh.pa.us
* Fix typo in commentAlvaro Herrera2017-06-22
| | | | | | | | Once upon a time, WAL pointers could be NULL, but no longer. We talk about "valid" now. Reported-by: Amit Langote Discussion: https://postgr.es/m/33e9617d-27f1-eee8-3311-e27af98eaf2b@lab.ntt.co.jp
* Fix possibility of creating a "phantom" segment after promotion.Andres Freund2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When promoting a standby just after a XLOG_SWITCH record was replayed, and next segment(s) are already are locally available (via walsender, restore_command + trigger/recovery target), that segment could accidentally be recycled onto the past of the new timeline. Later checkpointer would create a .ready file for it, assuming there was an error during creation, and it would get archived. That causes trouble if another standby is later brought up from a basebackup from before the timeline creation, because it would try to read the segment, because XLogFileReadAnyTLI just tries all possible timelines, which doesn't have valid contents. Thus replay would fail. The problem, if already occurred, can be fixed by removing the segment and/or having restore_command filter it out. The reason for the creation of such "phantom" segments was, that after an XLOG_SWITCH record the EndOfLog variable points to the beginning of the next segment, and RemoveXlogFile() used XLByteToPrevSeg(). Normally RemoveXlogFile() doing so is harmless, because the last segment will still exist preventing InstallXLogFileSegment() from causing harm, but just after promotion there's no previous segment on the new timeline. Fix that by using XLByteToSeg() instead of XLByteToPrevSeg(). Author: Andres Freund Reported-By: Greg Burek Discussion: https://postgr.es/m/20170619073026.zcwpe6mydsaz5ygd@alap3.anarazel.de Backpatch: 9.2-, bug older than all supported versions
* pg_upgrade: start/stop new server after pg_resetwalBruce Momjian2017-06-20
| | | | | | | | | | | | | | | | | | When commit 0f33a719fdbb5d8c43839ea0d2c90cd03e2af2d2 removed the instructions to start/stop the new cluster before running rsync, it was now possible for pg_resetwal/pg_resetxlog to leave the final WAL record at wal_level=minimum, preventing upgraded standby servers from reconnecting. This patch fixes that by having pg_upgrade unconditionally start/stop the new cluster after pg_resetwal/pg_resetxlog has run. Backpatch through 9.2 since, though the instructions were added in PG 9.5, they worked all the way back to 9.2. Discussion: https://postgr.es/m/20170620171844.GC24975@momjian.us Backpatch-through: 9.2
* Fix materialized-view documentation oversights.Tom Lane2017-06-19
| | | | | | | | | | When materialized views were added, psql's \d commands were made to treat them as a separate object category ... but not everyplace in the documentation or comments got the memo. Noted by David Johnston. Back-patch to 9.3 where matviews came in. Discussion: https://postgr.es/m/CAKFQuwb27M3VXRhHErjCpkWwN9eKThbqWb1=trtoXi9_ejqPXQ@mail.gmail.com
* On Windows, make pg_dump use binary mode for compressed plain text output.Tom Lane2017-06-19
| | | | | | | | | | | | | The combination of -Z -Fp and output to stdout resulted in corrupted output data, because we left stdout in text mode, resulting in newline conversion being done on the compressed stream. Switch stdout to binary mode for this case, at the same place where we do it for non-text output formats. Report and patch by Kuntal Ghosh, tested by Ashutosh Sharma and Neha Sharma. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAGz5QCJPvbBjXAmJuGx1B_41yVCetAJhp7rtaDf7XQGWuB1GSw@mail.gmail.com
* Fix leaking of small spilled subtransactions during logical decoding.Andres Freund2017-06-18
| | | | | | | | | | | | | | | | | | When, during logical decoding, a transaction gets too big, it's contents get spilled to disk. Not just the top-transaction gets spilled, but *also* all of its subtransactions, even if they're not that large themselves. Unfortunately we didn't clean up such small spilled subtransactions from disk. Fix that, by keeping better track of whether a transaction has been spilled to disk. Author: Andres Freund Reported-By: Dmitriy Sarafannikov, Fabrízio de Royes Mello Discussion: https://postgr.es/m/1457621358.355011041@f382.i.mail.ru https://postgr.es/m/CAFcNs+qNMhNYii4nxpO6gqsndiyxNDYV0S=JNq0v_sEE+9PHXg@mail.gmail.com Backpatch: 9.4-, where logical decoding was introduced
* Fix dependency, when changing a function's argument/return type.Heikki Linnakangas2017-06-16
| | | | | | | | | | | | When a new base type is created using the old-style procedure of first creating the input/output functions with "opaque" in place of the base type, the "opaque" argument/return type is changed to the final base type, on CREATE TYPE. However, we did not create a pg_depend record when doing that, so the functions were left not depending on the type. Fixes bug #14706, reported by Karen Huddleston. Discussion: https://www.postgresql.org/message-id/20170614232259.1424.82774@wrigleys.postgresql.org
* Fix low-probability leaks of PGresult objects in the backend.Tom Lane2017-06-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We had three occurrences of essentially the same coding pattern wherein we tried to retrieve a query result from a libpq connection without blocking. In the case where PQconsumeInput failed (typically indicating a lost connection), all three loops simply gave up and returned, forgetting to clear any previously-collected PGresult object. Since those are malloc'd not palloc'd, the oversight results in a process-lifespan memory leak. One instance, in libpqwalreceiver, is of little significance because the walreceiver process would just quit anyway if its connection fails. But we might as well fix it. The other two instances, in postgres_fdw, are somewhat more worrisome because at least in principle the scenario could be repeated, allowing the amount of memory leaked to build up to something worth worrying about. Moreover, in these cases the loops contain CHECK_FOR_INTERRUPTS calls, as well as other calls that could potentially elog(ERROR), providing another way to exit without having cleared the PGresult. Here we need to add PG_TRY logic similar to what exists in quite a few other places in postgres_fdw. Coverity noted the libpqwalreceiver bug; I found the other two cases by checking all calls of PQconsumeInput. Back-patch to all supported versions as appropriate (9.2 lacks postgres_fdw, so this is really quite unexciting for that branch). Discussion: https://postgr.es/m/22620.1497486981@sss.pgh.pa.us
* doc: remove mention of Windows junction points by pg_upgradeBruce Momjian2017-06-15
| | | | | | | | | | | pg_upgrade never used Windows junction points but instead always used Windows hard links. Reported-by: Adrian Klaver Discussion: https://postgr.es/m/6a638c60-90bb-4921-8ee4-5fdad68f8b09@aklaver.com Backpatch-through: 9.3, where the mention first appeared
* Fix document bug regarding read only transactions.Tatsuo Ishii2017-06-15
| | | | | | | | It was explained that read only transactions (not in standby) allow to update sequences. This had been wrong since the commit: 05d8a561ff85db1545f5768fe8d8dc9d99ad2ef7 Discussion: https://www.postgresql.org/message-id/20170614.110826.425627939780392324.t-ishii%40sraoss.co.jp
* Take PROVE_FLAGS from the command line but not the environmentAndrew Dunstan2017-06-10
| | | | | | | | | | | This reverts commit 56b6ef893fee9e9bf47d927a02f4d1ea911f4d9c and instead makes vcregress.pl parse out PROVE_FLAGS from a command line argument when doing a TAP test, thus making it consistent with the makefile treatment. Discussion: https://postgr.es/m/c26a7416-2fb9-34ab-7991-618c922f896e%402ndquadrant.com Backpatch to 9.4 like previous patch.
* postgres_fdw: Allow cancellation of transaction control commands.Robert Haas2017-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f039eaac7131ef2a4cf63a10cf98486f8bcd09d2, later back-patched with commit 1b812afb0eafe125b820cc3b95e7ca03821aa675, allowed many of the queries issued by postgres_fdw to fetch remote data to respond to cancel interrupts in a timely fashion. However, it didn't do anything about the transaction control commands, which remained noninterruptible. Improve the situation by changing do_sql_command() to retrieve query results using pgfdw_get_result(), which uses the asynchronous interface to libpq so that it can check for interrupts every time libpq returns control. Since this might result in a situation where we can no longer be sure that the remote transaction state matches the local transaction state, add a facility to force all levels of the local transaction to abort if we've lost track of the remote state; without this, an apparently-successful commit of the local transaction might fail to commit changes made on the remote side. Also, add a 60-second timeout for queries issue during transaction abort; if that expires, give up and mark the state of the connection as unknown. Drop all such connections when we exit the local transaction. Together, these changes mean that if we're aborting the local toplevel transaction anyway, we can just drop the remote connection in lieu of waiting (possibly for a very long time) for it to complete an abort. This still leaves quite a bit of room for improvement. PQcancel() has no asynchronous interface, so if we get stuck sending the cancel request we'll still hang. Also, PQsetnonblocking() is not used, which means we could block uninterruptibly when sending a query. There might be some other optimizations possible as well. Nonetheless, this allows us to escape a wait for an unresponsive remote server quickly in many more cases than previously. Report by Suraj Kharage. Patch by me and Rafia Sabih. Review and testing by Amit Kapila and Tushar Ahuja. Discussion: http://postgr.es/m/CAF1DzPU8Kx+fMXEbFoP289xtm3bz3t+ZfxhmKavr98Bh-C0TqQ@mail.gmail.com
* Fix docs to not claim ECPG's SET CONNECTION is not thread-aware.Michael Meskes2017-06-07
| | | | Changed by: Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com>
* Clear auth context correctly when re-connecting after failed auth attempt.Heikki Linnakangas2017-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If authentication over an SSL connection fails, with sslmode=prefer, libpq will reconnect without SSL and retry. However, we did not clear the variables related to GSS, SSPI, and SASL authentication state, when reconnecting. Because of that, the second authentication attempt would always fail with a "duplicate GSS/SASL authentication request" error. pg_SSPI_startup did not check for duplicate authentication requests like the corresponding GSS and SASL functions, so with SSPI, you would leak some memory instead. Another way this could manifest itself, on version 10, is if you list multiple hostnames in the "host" parameter. If the first server requests Kerberos or SCRAM authentication, but it fails, the attempts to connect to the other servers will also fail with "duplicate authentication request" errors. To fix, move the clearing of authentication state from closePGconn to pgDropConnection, so that it is cleared also when re-connecting. Patch by Michael Paquier, with some kibitzing by me. Backpatch down to 9.3. 9.2 has the same bug, but the code around closing the connection is somewhat different, so that this patch doesn't apply. To fix this in 9.2, I think we would need to back-port commit 210eb9b743 first, and then apply this patch. However, given that we only bumped into this in our own testing, we haven't heard any reports from users about this, and that 9.2 will be end-of-lifed in a couple of months anyway, it doesn't seem worth the risk and trouble. Discussion: https://www.postgresql.org/message-id/CAB7nPqRuOUm0MyJaUy9L3eXYJU3AKCZ-0-03=-aDTZJGV4GyWw@mail.gmail.com
* Unify SIGHUP handling between normal and walsender backends.Andres Freund2017-06-05
| | | | | | | | | | | | | | | | | | | | Because walsender and normal backends share the same main loop it's problematic to have two different flag variables, set in signal handlers, indicating a pending configuration reload. Only certain walsender commands reach code paths checking for the variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT ... LOGICAL, notably not base backups). This is a bug present since the introduction of walsender, but has gotten worse in releases since then which allow walsender to do more. A later patch, not slated for v10, will similarly unify SIGHUP handling in other types of processes as well. Author: Petr Jelinek, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de Backpatch: 9.2-, bug is present since 9.0
* Prevent possibility of panics during shutdown checkpoint.Andres Freund2017-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the checkpointer writes the shutdown checkpoint, it checks afterwards whether any WAL has been written since it started and throws a PANIC if so. At that point, only walsenders are still active, so one might think this could not happen, but walsenders can also generate WAL, for instance in BASE_BACKUP and logical decoding related commands (e.g. via hint bits). So they can trigger this panic if such a command is run while the shutdown checkpoint is being written. To fix this, divide the walsender shutdown into two phases. First, checkpointer, itself triggered by postmaster, sends a PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend is idle or runs an SQL query this causes the backend to shutdown, if logical replication is in progress all existing WAL records are processed followed by a shutdown. Otherwise this causes the walsender to switch to the "stopping" state. In this state, the walsender will reject any further replication commands. The checkpointer begins the shutdown checkpoint once all walsenders are confirmed as stopping. When the shutdown checkpoint finishes, the postmaster sends us SIGUSR2. This instructs walsender to send any outstanding WAL, including the shutdown checkpoint record, wait for it to be replicated to the standby, and then exit. Author: Andres Freund, based on an earlier patch by Michael Paquier Reported-By: Fujii Masao, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de Backpatch: 9.4, where logical decoding was introduced
* Have walsenders participate in procsignal infrastructure.Andres Freund2017-06-05
| | | | | | | | | | | | | | | | | The non-participation in procsignal was a problem for both changes in master, e.g. parallelism not working for normal statements run in walsender backends, and older branches, e.g. recovery conflicts and catchup interrupts not working for logical decoding walsenders. This commit thus replaces the previous WalSndXLogSendHandler with procsignal_sigusr1_handler. In branches since db0f6cad48 that can lead to additional SetLatch calls, but that only rarely seems to make a difference. Author: Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170421014030.fdzvvvbrz4nckrow@alap3.anarazel.de Backpatch: 9.4, earlier commits don't seem to benefit sufficiently
* Fix thinko in previous openssl changeAndrew Dunstan2017-06-05
|
* Find openssl lib files in right directory for MSVCAndrew Dunstan2017-06-05
| | | | | | | | | Some openssl builds put their lib files in a VC subdirectory, others do not. Cater for both cases. Backpatch to all live branches. From an offline discussion with Leonardo Cecchi.
* Always use -fPIC, not -fpic, when building shared libraries with gcc.Tom Lane2017-06-01
| | | | | | | | | | | | | | | | | On some platforms, -fpic fails for sufficiently large shared libraries. We've mostly not hit that boundary yet, but there are some extensions such as Citus and pglogical where it's becoming a problem. A bit of research suggests that the penalty for -fPIC is small, in the single-digit-percentage range --- and there's none at all on popular platforms such as x86_64. So let's just default to -fPIC everywhere and provide one less thing for extension developers to worry about. Per complaint from Christoph Berg. Back-patch to all supported branches. (I did not bother to touch the recently-removed Makefiles for sco and unixware in the back branches, though. We'd have no way to test that it doesn't break anything on those platforms.) Discussion: https://postgr.es/m/20170529155850.qojdfrwkkqnjb3ap@msg.df7cb.de
* Try to ensure that stats collector's receive buffer size is at least 100KB.Tom Lane2017-05-29
| | | | | | Back-patch of commit 8b0b6303e991079726e83d17401405e94da11564. Discussion: https://postgr.es/m/22173.1494788088@sss.pgh.pa.us
* Prevent running pg_resetwal/pg_resetxlog against wrong-version data dirs.Tom Lane2017-05-29
| | | | | | | | | | | | | | | | | | | | pg_resetwal (formerly pg_resetxlog) doesn't insist on finding a matching version number in pg_control, and that seems like an important thing to preserve since recovering from corrupt pg_control is a prime reason to need to run it. However, that means you can try to run it against a data directory of a different major version, which is at best useless and at worst disastrous. So as to provide some protection against that type of pilot error, inspect PG_VERSION at startup and refuse to do anything if it doesn't match. PG_VERSION is read-only after initdb, so it's unlikely to get corrupted, and even if it were corrupted it would be easy to fix by hand. This hazard has been there all along, so back-patch to all supported branches. Michael Paquier, with some kibitzing by me Discussion: https://postgr.es/m/f4b8eb91-b934-8a0d-b3cc-68f06e2279d1@enterprisedb.com
* Allow NumericOnly to be "+ FCONST".Tom Lane2017-05-29
| | | | | | | | | | | | | | | | | | | | | The NumericOnly grammar production accepted ICONST, + ICONST, - ICONST, FCONST, and - FCONST, but for some reason not + FCONST. This led to strange inconsistencies like regression=# set random_page_cost = +4; SET regression=# set random_page_cost = 4000000000; SET regression=# set random_page_cost = +4000000000; ERROR: syntax error at or near "4000000000" (because 4000000000 is too large to be an ICONST). While there's no actual functional reason to need to write a "+", if we allow it for integers it seems like we should allow it for numerics too. It's been like that forever, so back-patch to all supported branches. Discussion: https://postgr.es/m/30908.1496006184@sss.pgh.pa.us
* Move autogenerated array types out of the way during ALTER ... RENAME.Tom Lane2017-05-26
| | | | | | | | | | | | Commit 9aa3c782c added code to allow CREATE TABLE/CREATE TYPE to not fail when the desired type name conflicts with an autogenerated array type, by dint of renaming the array type out of the way. But I (tgl) overlooked that the same case arises in ALTER TABLE/TYPE RENAME. Fix that too. Back-patch to all supported branches. Report and patch by Vik Fearing, modified a bit by me Discussion: https://postgr.es/m/0f4ade49-4f0b-a9a3-c120-7589f01d1eb8@2ndquadrant.com