postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Fix bogus time printout in walreceiver's debug log messages.	Tom Lane	2014-04-04
\| \| \| \| \| \| \| \| \| \| \| \| \|	The displayed sendtime and receipttime were always exactly equal, because somebody forgot that timestamptz_to_str returns a static buffer (thereby simplifying life for most callers, at the cost of complicating it for those who need two results concurrently). Apply the same pstrdup solution used by the other call sites with this issue. Back-patch to 9.2 where the faulty code was introduced. Per bug #9849 from Haruka Takatsuka, though this is not exactly his patch. Possibly we should change timestamptz_to_str's API, but I wouldn't want to do so in the back branches.
*	Avoid allocations in critical sections.	Heikki Linnakangas	2014-04-04
\| \| \| \|	If a palloc in a critical section fails, it becomes a PANIC.
*	Avoid palloc in critical section in GiST WAL-logging.	Heikki Linnakangas	2014-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Memory allocation can fail if you run out of memory, and inside a critical section that will lead to a PANIC. Use conservatively-sized arrays in stack instead. There was previously no explicit limit on the number of pages a GiST split can produce, it was only limited by the number of LWLocks that can be held simultaneously (100 at the moment). This patch adds an explicit limit of 75 pages. That should be plenty, a typical split shouldn't produce more than 2-3 page halves. The bug has been there forever, but only backpatch down to 9.1. The code was changed significantly in 9.1, and it doesn't seem worth the risk or trouble to adapt this for 9.0 and 8.4.
*	Fix assorted issues in client host name lookup.	Tom Lane	2014-04-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code for matching clients to pg_hba.conf lines that specify host names (instead of IP address ranges) failed to complain if reverse DNS lookup failed; instead it silently didn't match, so that you might end up getting a surprising "no pg_hba.conf entry for ..." error, as seen in bug #9518 from Mike Blackwell. Since we don't want to make this a fatal error in situations where pg_hba.conf contains a mixture of host names and IP addresses (clients matching one of the numeric entries should not have to have rDNS data), remember the lookup failure and mention it as DETAIL if we get to "no pg_hba.conf entry". Apply the same approach to forward-DNS lookup failures, too, rather than treating them as immediate hard errors. Along the way, fix a couple of bugs that prevented us from detecting an rDNS lookup error reliably, and make sure that we make only one rDNS lookup attempt; formerly, if the lookup attempt failed, the code would try again for each host name entry in pg_hba.conf. Since more or less the whole point of this design is to ensure there's only one lookup attempt not one per entry, the latter point represents a performance bug that seems sufficient justification for back-patching. Also, adjust src/port/getaddrinfo.c so that it plays as well as it can with this code. Which is not all that well, since it does not have actual support for rDNS lookup, but at least it should return the expected (and required by spec) error codes so that the main code correctly perceives the lack of functionality as a lookup failure. It's unlikely that PG is still being used in production on any machines that require our getaddrinfo.c, so I'm not excited about working harder than this. To keep the code in the various branches similar, this includes back-patching commits c424d0d1052cb4053c8712ac44123f9b9a9aa3f2 and 1997f34db4687e671690ed054c8f30bb501b1168 into 9.2 and earlier. Back-patch to 9.1 where the facility for hostnames in pg_hba.conf was introduced.
*	Fix bugs in manipulation of PgBackendStatus.st_clienthostname.	Tom Lane	2014-04-01
\| \| \| \| \| \| \| \| \| \|	Initialization of this field was not being done according to the st_changecount protocol (it has to be done within the changecount increment range, not outside). And the test to see if the value should be reported as null was wrong. Noted while perusing uses of Port.remote_hostname. This was wrong from the introduction of this code (commit 4a25bc145), so back-patch to 9.1.
*	Mark FastPathStrongRelationLocks volatile.	Robert Haas	2014-03-31
\| \| \| \| \| \| \| \| \|	Otherwise, the compiler might decide to move modifications to data within this structure outside the enclosing SpinLockAcquire / SpinLockRelease pair, leading to shared memory corruption. This may or may not explain a recent lmgr-related buildfarm failure on prairiedog, but it needs to be fixed either way.
*	Count buffers dirtied due to hints in pgBufferUsage.shared_blks_dirtied.	Robert Haas	2014-03-31
\| \| \| \| \| \| \| \| \| \|	Previously, such buffers weren't counted, with the possible result that EXPLAIN (BUFFERS) and pg_stat_statements would understate the true number of blocks dirtied by an SQL statement. Back-patch to 9.2, where this counter was introduced. Amit Kapila
*	Don't forget to flush XLOG_PARAMETER_CHANGE record.	Fujii Masao	2014-03-26
\| \| \| \|	Backpatch to 9.0 where XLOG_PARAMETER_CHANGE record was instroduced.
*	Address ccvalid/ccnoinherit in TupleDesc support functions.	Noah Misch	2014-03-23
\| \| \| \| \| \| \| \| \|	equalTupleDescs() neglected both of these ConstrCheck fields, and CreateTupleDescCopyConstr() neglected ccnoinherit. At this time, the only known behavior defect resulting from these omissions is constraint exclusion disregarding a CHECK constraint validated by an ALTER TABLE VALIDATE CONSTRAINT statement issued earlier in the same transaction. Back-patch to 9.2, where these fields were introduced.
*	Properly check for readdir/closedir() failures	Bruce Momjian	2014-03-21
\| \| \| \| \| \| \|	Clear errno before calling readdir() and handle old MinGW errno bug while adding full test coverage for readdir/closedir failures. Backpatch through 8.4.
*	Fix memory leak during regular expression execution.	Tom Lane	2014-03-19
\| \| \| \| \| \| \| \|	For a regex containing backrefs, pg_regexec() might fail to free all the sub-DFAs that were created during execution, resulting in a permanent (session lifespan) memory leak. Problem was introduced by me in commit 587359479acbbdc95c8e37da40707e37097423f5. Per report from Sandro Santilli; diagnosis by Greg Stark.
*	Fix bug in clean shutdown of walsender that pg_receiving is connecting to.	Fujii Masao	2014-03-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	On clean shutdown, walsender waits for all WAL to be replicated to a standby, and exits. It determined whether that replication had been completed by checking whether its sent location had been equal to a standby's flush location. Unfortunately this condition never becomes true when the standby such as pg_receivexlog which always returns an invalid flush location is connecting to walsender, and then walsender waits forever. This commit changes walsender so that it just checks a standby's write location if a flush location is invalid. Back-patch to 9.1 where enough infrastructure for this exists.
*	Translation updates	Peter Eisentraut	2014-03-16
\|
*	Prevent interrupts while reporting non-ERROR elog messages.	Tom Lane	2014-03-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should eliminate the risk of recursive entry to syslog(3), which appears to be the cause of the hang reported in bug #9551 from James Morton. Arguably, the real problem here is auth.c's willingness to turn on ImmediateInterruptOK while executing fairly wide swaths of backend code. We may well need to work at narrowing the code ranges in which the authentication_timeout interrupt is enabled. For the moment, though, this is a cheap and reasonably noninvasive fix for a field-reported failure; the other approach would be complex and not necessarily bug-free itself. Back-patch to all supported branches.
*	Avoid transaction-commit race condition while receiving a NOTIFY message.	Tom Lane	2014-03-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use TransactionIdIsInProgress, then TransactionIdDidCommit, to distinguish whether a NOTIFY message's originating transaction is in progress, committed, or aborted. The previous coding could accept a message from a transaction that was still in-progress according to the PGPROC array; if the client were fast enough at starting a new transaction, it might fail to see table rows added/updated by the message-sending transaction. Which of course would usually be the point of receiving the message. We noted this type of race condition long ago in tqual.c, but async.c overlooked it. The race condition probably cannot occur unless there are multiple NOTIFY senders in action, since an individual backend doesn't send NOTIFY signals until well after it's done committing. But if two senders commit in close succession, it's certainly possible that we could see the second sender's message within the race condition window while responding to the signal from the first one. Per bug #9557 from Marko Tiikkaja. This patch is slightly more invasive than what he proposed, since it removes the now-redundant TransactionIdDidAbort call. Back-patch to 9.0, where the current NOTIFY implementation was introduced.
*	In WAL replay, restore GIN metapage unconditionally to avoid torn page.	Heikki Linnakangas	2014-03-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't take a full-page image of the GIN metapage; instead, the WAL record contains all the information required to reconstruct it from scratch. But to avoid torn page hazards, we must re-initialize it from the WAL record every time, even if it already has a greater LSN, similar to how normal full page images are restored. This was highly unlikely to cause any problems in practice, because the GIN metapage is small. We rely on an update smaller than a 512 byte disk sector to be atomic elsewhere, at least in pg_control. But better safe than sorry, and this would be easy to overlook if more fields are added to the metapage so that it's no longer small. Reported by Noah Misch. Backpatch to all supported versions.
*	Fix dangling smgr_owner pointer when a fake relcache entry is freed.	Heikki Linnakangas	2014-03-07
\| \| \| \| \| \| \| \| \| \| \| \|	A fake relcache entry can "own" a SmgrRelation object, like a regular relcache entry. But when it was free'd, the owner field in SmgrRelation was not cleared, so it was left pointing to free'd memory. Amazingly this apparently hasn't caused crashes in practice, or we would've heard about it earlier. Andres found this with Valgrind. Report and fix by Andres Freund, with minor modifications by me. Backpatch to all supported versions.
*	Avoid memcpy() with same source and destination address.	Heikki Linnakangas	2014-03-07
\| \| \| \| \| \| \|	The behavior of that is undefined, although unlikely to lead to problems in practice. Found by running regression tests with Valgrind.
*	Avoid getting more than AccessShareLock when deparsing a query.	Tom Lane	2014-03-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In make_ruledef and get_query_def, we have long used AcquireRewriteLocks to ensure that the querytree we are about to deparse is up-to-date and the schemas of the underlying relations aren't changing. Howwever, that function thinks the query is about to be executed, so it acquires locks that are stronger than necessary for the purpose of deparsing. Thus for example, if pg_dump asks to deparse a rule that includes "INSERT INTO t", we'd acquire RowExclusiveLock on t. That results in interference with concurrent transactions that might for example ask for ShareLock on t. Since pg_dump is documented as being purely read-only, this is unexpected. (Worse, it used to actually be read-only; this behavior dates back only to 8.1, cf commit ba4200246.) Fix this by adding a parameter to AcquireRewriteLocks to tell it whether we want the "real" execution locks or only AccessShareLock. Report, diagnosis, and patch by Dean Rasheed. Back-patch to all supported branches.
*	Do wal_level and hot standby checks when doing crash-then-archive recovery.	Heikki Linnakangas	2014-03-05
\| \| \| \| \| \| \| \|	CheckRequiredParameterValues() should perform the checks if archive recovery was requested, even if we are going to perform crash recovery first. Reported by Kyotaro HORIGUCHI. Backpatch to 9.2, like the crash-then-archive recovery mode.
*	Fix lastReplayedEndRecPtr calculation when starting from shutdown checkpoint.	Heikki Linnakangas	2014-03-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When entering crash recovery followed by archive recovery, and the latest checkpoint is a shutdown checkpoint, and there are no more WAL records to replay before transitioning from crash to archive recovery, we would not immediately allow read-only connections in hot standby mode even if we could. That's because when starting from a shutdown checkpoint, we set lastReplayedEndRecPtr incorrectly to the record before the checkpoint record, instead of the checkpoint record itself. We don't run the redo routine of the shutdown checkpoint record, but starting recovery from it goes through the same motions, so it should be considered as replayed. Reported by Kyotaro HORIGUCHI. All versions with hot standby are affected, so backpatch to 9.0.
*	Allow regex operations to be terminated early by query cancel requests.	Tom Lane	2014-03-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The regex code didn't have any provision for query cancel; which is unsurprising given its non-Postgres origin, but still problematic since some operations can take a long time. Introduce a callback function to check for a pending query cancel or session termination request, and call it in a couple of strategic spots where we can make the regex code exit with an error indicator. If we ever actually split out the regex code as a standalone library, some additional work will be needed to let the cancel callback function be specified externally to the library. But that's straightforward (certainly so by comparison to putting the locale-dependent character classification logic on a similar arms-length basis), and there seems no need to do it right now. A bigger issue is that there may be more places than these two where we need to check for cancels. We can always add more checks later, now that the infrastructure is in place. Since there are known examples of not-terribly-long regexes that can lock up a backend for a long time, back-patch to all supported branches. I have hopes of fixing the known performance problems later, but adding query cancel ability seems like a good idea even if they were all fixed.
*	Use SnapshotDirty rather than an active snapshot to probe index endpoints.	Tom Lane	2014-02-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If there are lots of uncommitted tuples at the end of the index range, get_actual_variable_range() ends up fetching each one and doing an MVCC visibility check on it, until it finally hits a visible tuple. This is bad enough in isolation, considering that we don't need an exact answer only an approximate one. But because the tuples are not yet committed, each visibility check does a TransactionIdIsInProgress() test, which involves scanning the ProcArray. When multiple sessions do this concurrently, the ensuing contention results in horrid performance loss. 20X overall throughput loss on not-too-complicated queries is easy to demonstrate in the back branches (though someone's made it noticeably less bad in HEAD). We can dodge the problem fairly effectively by using SnapshotDirty rather than a normal MVCC snapshot. This will cause the index probe to take uncommitted tuples as good, so that we incur only one tuple fetch and test even if there are many such tuples. The extent to which this degrades the estimate is debatable: it's possible the result is actually a more accurate prediction than before, if the endmost tuple has become committed by the time we actually execute the query being planned. In any case, it's not very likely that it makes the estimate a lot worse. SnapshotDirty will still reject tuples that are known committed dead, so we won't give bogus answers if an invalid outlier has been deleted but not yet vacuumed from the index. (Because btrees know how to mark such tuples dead in the index, we shouldn't have a big performance problem in the case that there are many of them at the end of the range.) This consideration motivates not using SnapshotAny, which was also considered as a fix. Note: the back branches were using SnapshotNow instead of an MVCC snapshot, but the problem and solution are the same. Per performance complaints from Bartlomiej Romanski, Josh Berkus, and others. Back-patch to 9.0, where the issue was introduced (by commit 40608e7f949fb7e4025c0ddd5be01939adc79eec).
*	Remove broken code that tried to handle OVERLAPS with a single argument.	Tom Lane	2014-02-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SQL standard says that OVERLAPS should have a two-element row constructor on each side. The original coding of OVERLAPS support in our grammar attempted to extend that by allowing a single-element row constructor, which it internally duplicated ... or tried to, anyway. But that code has certainly not worked since our List infrastructure was rewritten in 2004, and I'm none too sure it worked before that. As it stands, it ends up building a List that includes itself, leading to assorted undesirable behaviors later in the parser. Even if it worked as intended, it'd be a bit evil because of the possibility of duplicate evaluation of a volatile function that the user had written only once. Given the lack of documentation, test cases, or complaints, let's just get rid of the idea and only support the standard syntax. While we're at it, improve the error cursor positioning for the wrong-number-of-arguments errors, and inline the makeOverlaps() function since it's only called in one place anyway. Per bug #9227 from Joshua Yanovski. Initial patch by Joshua Yanovski, extended a bit by me.
*	Fix comment; checkpointer, not bgwriter, performs checkpoints since 9.2.	Heikki Linnakangas	2014-02-18
\| \| \| \|	Amit Langote
*	Translation updates	Peter Eisentraut	2014-02-17
\|
*	Prevent potential overruns of fixed-size buffers.	Tom Lane	2014-02-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Coverity identified a number of places in which it couldn't prove that a string being copied into a fixed-size buffer would fit. We believe that most, perhaps all of these are in fact safe, or are copying data that is coming from a trusted source so that any overrun is not really a security issue. Nonetheless it seems prudent to forestall any risk by using strlcpy() and similar functions. Fixes by Peter Eisentraut and Jozef Mlich based on Coverity reports. In addition, fix a potential null-pointer-dereference crash in contrib/chkpass. The crypt(3) function is defined to return NULL on failure, but chkpass.c didn't check for that before using the result. The main practical case in which this could be an issue is if libc is configured to refuse to execute unapproved hashing algorithms (e.g., "FIPS mode"). This ideally should've been a separate commit, but since it touches code adjacent to one of the buffer overrun changes, I included it in this commit to avoid last-minute merge issues. This issue was reported by Honza Horak. Security: CVE-2014-0065 for buffer overruns, CVE-2014-0066 for crypt()
*	Predict integer overflow to avoid buffer overruns.	Noah Misch	2014-02-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Several functions, mostly type input functions, calculated an allocation size such that the calculation wrapped to a small positive value when arguments implied a sufficiently-large requirement. Writes past the end of the inadvertent small allocation followed shortly thereafter. Coverity identified the path_in() vulnerability; code inspection led to the rest. In passing, add check_stack_depth() to prevent stack overflow in related functions. Back-patch to 8.4 (all supported versions). The non-comment hstore changes touch code that did not exist in 8.4, so that part stops at 9.0. Noah Misch and Heikki Linnakangas, reviewed by Tom Lane. Security: CVE-2014-0064
*	Avoid repeated name lookups during table and index DDL.	Robert Haas	2014-02-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the name lookups come to different conclusions due to concurrent activity, we might perform some parts of the DDL on a different table than other parts. At least in the case of CREATE INDEX, this can be used to cause the permissions checks to be performed against a different table than the index creation, allowing for a privilege escalation attack. This changes the calling convention for DefineIndex, CreateTrigger, transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible (in 9.2 and newer), and AlterTable (in 9.1 and older). In addition, CheckRelationOwnership is removed in 9.2 and newer and the calling convention is changed in older branches. A field has also been added to the Constraint node (FkConstraint in 8.4). Third-party code calling these functions or using the Constraint node will require updating. Report by Andres Freund. Patch by Robert Haas and Andres Freund, reviewed by Tom Lane. Security: CVE-2014-0062
*	Prevent privilege escalation in explicit calls to PL validators.	Noah Misch	2014-02-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The primary role of PL validators is to be called implicitly during CREATE FUNCTION, but they are also normal functions that a user can call explicitly. Add a permissions check to each validator to ensure that a user cannot use explicit validator calls to achieve things he could not otherwise achieve. Back-patch to 8.4 (all supported versions). Non-core procedural language extensions ought to make the same two-line change to their own validators. Andres Freund, reviewed by Tom Lane and Noah Misch. Security: CVE-2014-0061
*	Shore up ADMIN OPTION restrictions.	Noah Misch	2014-02-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Granting a role without ADMIN OPTION is supposed to prevent the grantee from adding or removing members from the granted role. Issuing SET ROLE before the GRANT bypassed that, because the role itself had an implicit right to add or remove members. Plug that hole by recognizing that implicit right only when the session user matches the current role. Additionally, do not recognize it during a security-restricted operation or during execution of a SECURITY DEFINER function. The restriction on SECURITY DEFINER is not security-critical. However, it seems best for a user testing his own SECURITY DEFINER function to see the same behavior others will see. Back-patch to 8.4 (all supported versions). The SQL standards do not conflate roles and users as PostgreSQL does; only SQL roles have members, and only SQL users initiate sessions. An application using PostgreSQL users and roles as SQL users and roles will never attempt to grant membership in the role that is the session user, so the implicit right to add or remove members will never arise. The security impact was mostly that a role member could revoke access from others, contrary to the wishes of his own grantor. Unapproved role member additions are less notable, because the member can still largely achieve that by creating a view or a SECURITY DEFINER function. Reviewed by Andres Freund and Tom Lane. Reported, independently, by Jonas Sundman and Noah Misch. Security: CVE-2014-0060
*	Fix length checking for Unicode identifiers containing escapes (U&"...").	Tom Lane	2014-02-13
\| \| \| \| \| \| \| \| \| \| \|	We used the length of the input string, not the de-escaped string, as the trigger for NAMEDATALEN truncation. AFAICS this would only result in sometimes printing a phony truncation warning; but it's just luck that there was no worse problem, since we were violating the API spec for truncate_identifier(). Per bug #9204 from Joshua Yanovski. This has been wrong since the Unicode-identifier support was added, so back-patch to all supported branches.
*	In XLogReadBufferExtended, don't assume P_NEW yields consecutive pages.	Tom Lane	2014-02-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a database that's not yet reached consistency, it's possible that some segments of a relation are not full-size but are not the last ones either. Because of the way smgrnblocks() works, asking for a new page with P_NEW will fill in the last not-full-size segment --- and if that makes it full size, the apparent EOF of the relation will increase by more than one page, so that the next P_NEW request will yield a page past the next consecutive one. This breaks the relation-extension logic in XLogReadBufferExtended, possibly allowing a page update to be applied to some page far past where it was intended to go. This appears to be the explanation for reports of table bloat on replication slaves compared to their masters, and probably explains some corrupted-slave reports as well. Fix the loop to check the page number it actually got, rather than merely Assert()'ing that dead reckoning got it to the desired place. AFAICT, there are no other places that make assumptions about exactly which page they'll get from P_NEW. Problem identified by Greg Stark, though this is not the same as his proposed patch. It's been like this for a long time, so back-patch to all supported branches.
*	Use memmove() instead of memcpy() for copying overlapping regions.	Heikki Linnakangas	2014-02-10
\| \| \| \| \|	In commit d2495f272cd164ff075bee5c4ce95aed11338a36, I fixed this bug in to_tsquery(), but missed the fact that plainto_tsquery() has the same bug.
*	Fix *-qualification of named parameters in SQL-language functions.	Tom Lane	2014-02-03
\| \| \| \| \| \| \|	Given a composite-type parameter named x, "$1." worked fine, but "x." not so much. This has been broken since named parameter references were added in commit 9bff0780cf5be2193a5bad0d3df2dbe143085264, so patch back to 9.2. Per bug #9085 from Hardy Falk.
*	Fix some wide-character bugs in the text-search parser.	Tom Lane	2014-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In p_isdigit and other character class test functions generated by the p_iswhat macro, the code path for non-C locales with multibyte encodings contained a bogus pointer cast that would accidentally fail to malfunction if types wchar_t and wint_t have the same width. Apparently that is true on most platforms, but not on recent Cygwin releases. Remove the cast, as it seems completely unnecessary (I think it arose from a false analogy to the need to cast to unsigned char when dealing with the <ctype.h> functions). Per bug #8970 from Marco Atzeri. In the same functions, the code path for C locale with a multibyte encoding simply ANDed each wide character with 0xFF before passing it to the corresponding <ctype.h> function. This could result in false positive answers for some non-ASCII characters, so use a range test instead. Noted by me while investigating Marco's complaint. Also, remove some useless though not actually buggy maskings and casts in the hand-coded p_isalnum and p_isalpha functions, which evidently got tested a bit more carefully than the macro-generated functions.
*	Fix some more bugs in signal handlers and process shutdown logic.	Tom Lane	2014-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \|	WalSndKill was doing things exactly backwards: it should first clear MyWalSnd (to stop signal handlers from touching MyWalSnd->latch), then disown the latch, and only then mark the WalSnd struct unused by clearing its pid field. Also, WalRcvSigUsr1Handler and worker_spi_sighup failed to preserve errno, which is surely a requirement for any signal handler. Per discussion of recent buildfarm failures. Back-patch as far as the relevant code exists.
*	Clear MyProc and MyProcSignalState before they become invalid.	Robert Haas	2014-01-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Evidence from buildfarm member crake suggests that the new test_shm_mq module is routinely crashing the server due to the arrival of a SIGUSR1 after the shared memory segment has been unmapped. Although processes using the new dynamic background worker facilities are more likely to receive a SIGUSR1 around this time, the problem is also possible on older branches, so I'm back-patching the parts of this change that apply to older branches as far as they apply. It's already generally the case that code checks whether these pointers are NULL before deferencing them, so the important thing is mostly to make sure that they do get set to NULL before they become invalid. But in master, there's one case in procsignal_sigusr1_handler that lacks a NULL guard, so add that. Patch by me; review by Tom Lane.
*	Fix unsafe references to errno within error messaging logic.	Tom Lane	2014-01-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Various places were supposing that errno could be expected to hold still within an ereport() nest or similar contexts. This isn't true necessarily, though in some cases it accidentally failed to fail depending on how the compiler chanced to order the subexpressions. This class of thinko explains recent reports of odd failures on clang-built versions, typically missing or inappropriate HINT fields in messages. Problem identified by Christian Kruse, who also submitted the patch this commit is based on. (I fixed a few issues in his patch and found a couple of additional places with the same disease.) Back-patch as appropriate to all supported branches.
*	Allow type_func_name_keywords in even more places	Stephen Frost	2014-01-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A while back, 2c92edad48796119c83d7dbe6c33425d1924626d allowed type_func_name_keywords to be used in more places, including role identifiers. Unfortunately, that commit missed out on cases where name_list was used for lists-of-roles, eg: for DROP ROLE. This resulted in the unfortunate situation that you could CREATE a role with a type_func_name_keywords-allowed identifier, but not DROP it (directly- ALTER could be used to rename it to something which could be DROP'd). This extends allowing type_func_name_keywords to places where role lists can be used. Back-patch to 9.0, as 2c92edad48796119c83d7dbe6c33425d1924626d was.
*	Tweak parse location assignment for CURRENT_DATE and related constructs.	Tom Lane	2014-01-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All these constructs generate parse trees consisting of a Const and a run-time type coercion (perhaps a FuncExpr or a CoerceViaIO). Modify the raw parse output so that we end up with the original token's location attached to the type coercion node while the Const has location -1; before, it was the other way around. This makes no difference in terms of what exprLocation() will say about the parse tree as a whole, so it should not have any user-visible impact. The point of changing it is that we do not want contrib/pg_stat_statements to treat these constructs as replaceable constants. It will do the right thing if the Const has location -1 rather than a valid location. This is a pretty ugly hack, but then this code is ugly already; we should someday replace this translation with special-purpose parse node(s) that would allow ruleutils.c to reconstruct the original query text. (See also commit 5d3fcc4c2e137417ef470d604fee5e452b22f6a7, which also hacked location assignment rules for the benefit of pg_stat_statements.) Back-patch to 9.2 where pg_stat_statements grew the ability to recognize replaceable constants. Kyotaro Horiguchi
*	Allow SET TABLESPACE to database default	Stephen Frost	2014-01-18
\| \| \| \| \| \| \| \| \| \| \| \| \|	We've always allowed CREATE TABLE to create tables in the database's default tablespace without checking for CREATE permissions on that tablespace. Unfortunately, the original implementation of ALTER TABLE ... SET TABLESPACE didn't pick up on that exception. This changes ALTER TABLE ... SET TABLESPACE to allow the database's default tablespace without checking for CREATE rights on that tablespace, just as CREATE TABLE works today. Users could always do this through a series of commands (CREATE TABLE ... AS SELECT * FROM ...; DROP TABLE ...; etc), so let's fix the oversight in SET TABLESPACE's original implementation.
*	Fix multiple bugs in index page locking during hot-standby WAL replay.	Tom Lane	2014-01-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ordinary operation, VACUUM must be careful to take a cleanup lock on each leaf page of a btree index; this ensures that no indexscans could still be "in flight" to heap tuples due to be deleted. (Because of possible index-tuple motion due to concurrent page splits, it's not enough to lock only the pages we're deleting index tuples from.) In Hot Standby, the WAL replay process must likewise lock every leaf page. There were several bugs in the code for that: * The replay scan might come across unused, all-zero pages in the index. While btree_xlog_vacuum itself did the right thing (ie, nothing) with such pages, xlogutils.c supposed that such pages must be corrupt and would throw an error. This accounts for various reports of replication failures with "PANIC: WAL contains references to invalid pages". To fix, add a ReadBufferMode value that instructs XLogReadBufferExtended not to complain when we're doing this. * btree_xlog_vacuum performed the extra locking if standbyState == STANDBY_SNAPSHOT_READY, but that's not the correct test: we won't open up for hot standby queries until the database has reached consistency, and we don't want to do the extra locking till then either, for fear of reading corrupted pages (which bufmgr.c would complain about). Fix by exporting a new function from xlog.c that will report whether we're actually in hot standby replay mode. * To ensure full coverage of the index in the replay scan, btvacuumscan would emit a dummy WAL record for the last page of the index, if no vacuuming work had been done on that page. However, if the last page of the index is all-zero, that would result in corruption of said page, since the functions called on it weren't prepared to handle that case. There's no need to lock any such pages, so change the logic to target the last normal leaf page instead. The first two of these bugs were diagnosed by Andres Freund, the other one by me. Fixes based on ideas from Heikki Linnakangas and myself. This has been wrong since Hot Standby was introduced, so back-patch to 9.0.
*	Fix possible crashes due to using elog/ereport too early in startup.	Tom Lane	2014-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Per reports from Andres Freund and Luke Campbell, a server failure during set_pglocale_pgservice results in a segfault rather than a useful error message, because the infrastructure needed to use ereport hasn't been initialized; specifically, MemoryContextInit hasn't been called. One known cause of this is starting the server in a directory it doesn't have permission to read. We could try to prevent set_pglocale_pgservice from using anything that depends on palloc or elog, but that would be messy, and the odds of future breakage seem high. Moreover there are other things being called in main.c that look likely to use palloc or elog too --- perhaps those things shouldn't be there, but they are there today. The best solution seems to be to move the call of MemoryContextInit to very early in the backend's real main() function. I've verified that an elog or ereport occurring immediately after that is now capable of sending something useful to stderr. I also added code to elog.c to print something intelligible rather than just crashing if MemoryContextInit hasn't created the ErrorContext. This could happen if MemoryContextInit itself fails (due to malloc failure), and provides some future-proofing against someone trying to sneak in new code even earlier in server startup. Back-patch to all supported branches. Since we've only heard reports of this type of failure recently, it may be that some recent change has made it more likely to see a crash of this kind; but it sure looks like it's broken all the way back.
*	Fix compute_scalar_stats() for case that all values exceed WIDTH_THRESHOLD.	Tom Lane	2014-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The standard typanalyze functions skip over values whose detoasted size exceeds WIDTH_THRESHOLD (1024 bytes), so as to limit memory bloat during ANALYZE. However, we (I think I, actually :-() failed to consider the possibility that every non-null value in a column is too wide. While compute_minimal_stats() seems to behave reasonably anyway in such a case, compute_scalar_stats() just fell through and generated no pg_statistic entry at all. That's unnecessarily pessimistic: we can still produce valid stanullfrac and stawidth values in such cases, since we do include too-wide values in the average-width calculation. Furthermore, since the general assumption in this code is that too-wide values are probably all distinct from each other, it seems reasonable to set stadistinct to -1 ("all distinct"). Per complaint from Kadri Raudsepp. This has been like this since roughly neolithic times, so back-patch to all supported branches.
*	Fix "cannot accept a set" error when only some arms of a CASE return a set.	Tom Lane	2014-01-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit c1352052ef1d4eeb2eb1d822a207ddc2d106cb13, I implemented an optimization that assumed that a function's argument expressions would either always return a set (ie multiple rows), or always not. This is wrong however: we allow CASE expressions in which some arms return a set of some type and others just return a scalar of that type. There may be other examples as well. To fix, replace the run-time test of whether an argument returned a set with a static precheck (expression_returns_set). This adds a little bit of query startup overhead, but it seems barely measurable. Per bug #8228 from David Johnston. This has been broken since 8.0, so patch all supported branches.
*	Fix pause_at_recovery_target + recovery_target_inclusive combination.	Heikki Linnakangas	2014-01-08
\| \| \| \| \| \| \| \| \| \| \| \|	If pause_at_recovery_target is set, recovery pauses before applying the target record, even if recovery_target_inclusive is set. If you then continue with pg_xlog_replay_resume(), it will apply the target record before ending recovery. In other words, if you log in while it's paused and verify that the database looks OK, ending recovery changes its state again, possibly destroying data that you were tring to salvage with PITR. Backpatch to 9.1, this has been broken since pause_at_recovery_target was added.
*	Fix bug in determining when recovery has reached consistency.	Heikki Linnakangas	2014-01-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When starting WAL replay from an online checkpoint, the last replayed WAL record variable was initialized using the checkpoint record's location, even though the records between the REDO location and the checkpoint record had not been replayed yet. That was noted as "slightly confusing" but harmless in the comment, but in some cases, it fooled CheckRecoveryConsistency to incorrectly conclude that we had already reached a consistent state immediately at the beginning of WAL replay. That caused the system to accept read-only connections in hot standby mode too early, and also PANICs with message "WAL contains references to invalid pages". Fix by initializing the variables to the REDO location instead. In 9.2 and above, change CheckRecoveryConsistency() to use lastReplayedEndRecPtr variable when checking if backup end location has been reached. It was inconsistently using EndRecPtr for that check, but lastReplayedEndRecPtr when checking min recovery point. It made no difference before this patch, because in all the places where CheckRecoveryConsistency was called the two variables were the same, but it was always an accident waiting to happen, and would have been wrong after this patch anyway. Report and analysis by Tomonari Katsumata, bug #8686. Backpatch to 9.0, where hot standby was introduced.
*	Move permissions check from do_pg_start_backup to pg_start_backup	Magnus Hagander	2014-01-07
\| \| \| \| \| \| \| \| \| \|	And the same for do_pg_stop_backup. The code in do_pg_* is not allowed to access the catalogs. For manual base backups, the permissions check can be handled in the calling function, and for streaming base backups only users with the required permissions can get past the authentication step in the first place. Reported by Antonin Houska, diagnosed by Andres Freund
*	Avoid including tablespaces inside PGDATA twice in base backups	Magnus Hagander	2014-01-07
\| \| \| \| \| \| \| \| \|	If a tablespace was crated inside PGDATA it was backed up both as part of the PGDATA backup and as the backup of the tablespace. Avoid this by skipping any directory inside PGDATA that contains one of the active tablespaces. Dimitri Fontaine and Magnus Hagander