postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
...
*	Don't assume that PageIsEmpty() returns true on an all-zeros page.	Heikki Linnakangas	2015-07-27
\| \| \| \| \| \| \| \|	It does currently, and I don't see us changing that any time soon, but we don't make that assumption anywhere else. Per Tom Lane's suggestion. Backpatch to 9.2, like the previous patch that added this assumption.
*	Fix memory leak in xlogreader facility.	Heikki Linnakangas	2015-07-27
\| \| \| \| \| \| \|	XLogReaderFree failed to free the per-block data buffers, when they happened to not be used by the latest read WAL record. Michael Paquier. Backpatch to 9.5, where the per-block buffers were added.
*	Reuse all-zero pages in GIN.	Heikki Linnakangas	2015-07-27
\| \| \| \| \| \| \| \| \| \|	In GIN, an all-zeros page would be leaked forever, and never reused. Just add them to the FSM in vacuum, and they will be reinitialized when grabbed from the FSM. On master and 9.5, attempting to access the page's opaque struct also caused an assertion failure, although that was otherwise harmless. Reported by Jeff Janes. Backpatch to all supported versions.
*	Fix handling of all-zero pages in SP-GiST vacuum.	Heikki Linnakangas	2015-07-27
\| \| \| \| \| \| \| \| \| \| \| \|	SP-GiST initialized an all-zeros page at vacuum, but that was not WAL-logged, which is not safe. You might get a torn page write, when it gets flushed to disk, and end-up with a half-initialized index page. To fix, leave it in the all-zeros state, and add it to the FSM. It will be initialized when reused. Also don't set the page-deleted flag when recycling an empty page. That was also not WAL-logged, and a torn write of that would cause the page to have an invalid checksum. Backpatch to 9.2, where SP-GiST indexes were added.
*	Avoid calling PageGetSpecialPointer() on an all-zeros page.	Heikki Linnakangas	2015-07-27
\| \| \| \| \| \| \|	That was otherwise harmless, but tripped the new assertion in PageGetSpecialPointer(). Reported by Amit Langote. Backpatch to 9.5, where the assertion was added.
*	Dodge portability issue (apparent compiler bug) in new tablesample code.	Tom Lane	2015-07-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some of the older OS X critters in the buildfarm are failing regression, with symptoms showing that a request for 100% sampling in BERNOULLI or SYSTEM methods actually gets only around 50% of the table. gdb revealed that the computation of the "cutoff" number was producing 0x7FFFFFFF rather than the expected 0x100000000. Inspecting the assembly code, it looks like gcc is trying to use lrint() instead of rint() and then fumbling the conversion from long double to uint64. This seems like a clear compiler bug, but assigning the intermediate result into a plain double variable works around it, so let's just do that. (Another idea would be to give up one bit of hash width so that we don't need to use a uint64 cutoff, but let's see if this is enough.)
*	Redesign tablesample method API, and do extensive code review.	Tom Lane	2015-07-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original implementation of TABLESAMPLE modeled the tablesample method API on index access methods, which wasn't a good choice because, without specialized DDL commands, there's no way to build an extension that can implement a TSM. (Raw inserts into system catalogs are not an acceptable thing to do, because we can't undo them during DROP EXTENSION, nor will pg_upgrade behave sanely.) Instead adopt an API more like procedural language handlers or foreign data wrappers, wherein the only SQL-level support object needed is a single handler function identified by having a special return type. This lets us get rid of the supporting catalog altogether, so that no custom DDL support is needed for the feature. Adjust the API so that it can support non-constant tablesample arguments (the original coding assumed we could evaluate the argument expressions at ExecInitSampleScan time, which is undesirable even if it weren't outright unsafe), and discourage sampling methods from looking at invisible tuples. Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable within and across queries, as required by the SQL standard, and deal more honestly with methods that can't support that requirement. Make a full code-review pass over the tablesample additions, and fix assorted bugs, omissions, infelicities, and cosmetic issues (such as failure to put the added code stanzas in a consistent ordering). Improve EXPLAIN's output of tablesample plans, too. Back-patch to 9.5 so that we don't have to support the original API in production.
*	Fix off-by-one error in calculating subtrans/multixact truncation point.	Heikki Linnakangas	2015-07-23
\| \| \| \| \| \| \| \| \| \|	If there were no subtransactions (or multixacts) active, we would calculate the oldestxid == next xid. That's correct, but if next XID happens to be on the next pg_subtrans (pg_multixact) page, the page does not exist yet, and SimpleLruTruncate will produce an "apparent wraparound" warning. The warning is harmless in this case, but looks very alarming to users. Backpatch to all supported versions. Patch and analysis by Thomas Munro.
*	Fix some oversights in BRIN patch.	Tom Lane	2015-07-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove HeapScanDescData.rs_initblock, which wasn't being used for anything in the final version of the patch. Fix IndexBuildHeapScan so that it supports syncscan again; the patch broke synchronous scanning for index builds by forcing rs_startblk to zero even when the caller did not care about that and had asked for syncscan. Add some commentary and usage defenses to heap_setscanlimits(). Fix heapam so that asking for rs_numblocks == 0 does what you would reasonably expect. As coded it amounted to requesting a whole-table scan, because those "--x <= 0" tests on an unsigned variable would behave surprisingly.
*	Sanity-check that a page zeroed by redo routine is marked with WILL_INIT.	Heikki Linnakangas	2015-07-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was already a sanity-check in the other direction: if a page was marked with WILL_INIT, it had to be initialized by the redo routine. It's not strictly necessary for correctness that a page is marked with WILL_INIT if it's going to be initialized at redo, but it's a missed optimization if nothing else. Fix a few instances of this issue in SP-GiST, where a block in WAL record was not marked with WILL_INIT, but was in fact always initialized at redo. We were creating a full-page image of the page unnecessarily in those cases. Backpatch to 9.5, where the new WILL_INIT flag was added.
*	Improve BRIN documentation somewhat	Alvaro Herrera	2015-07-20
\| \| \| \| \| \| \| \| \| \| \| \|	This removes some info about support procedures being used, which was obsoleted by commit db5f98ab4f, as well as add some more documentation on how to create new opclasses using the Minmax infrastructure. (Hopefully we can get something similar for Inclusion as well.) In passing, fix some obsolete mentions of "mmtuples" in source code comments. Backpatch to 9.5, where BRIN was introduced.
*	Use appendStringInfoString/Char et al where appropriate.	Heikki Linnakangas	2015-07-02
\| \| \| \| \| \|	Patch by David Rowley. Backpatch to 9.5, as some of the calls were new in 9.5, and keeping the code in sync with master makes future backpatching easier.
*	Make XLogFileCopy() look the same as in 9.4.	Fujii Masao	2015-07-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	XLogFileCopy() was changed heavily in commit de76884. However it was partially reverted in commit 7abc685 and most of those changes to XLogFileCopy() were no longer needed. Then commit 7cbee7c removed those unnecessary code, but XLogFileCopy() looked different in master and 9.4 though the contents are almost the same. This patch makes XLogFileCopy() look the same in master and back-branches, which makes back-patching easier, per discussion on pgsql-hackers. Back-patch to 9.5. Discussion: 55760844.7090703@iki.fi Michael Paquier
*	Remove useless check for NULL subexpression.	Tom Lane	2015-06-30
\| \| \| \| \| \| \|	Coverity rightly gripes that it's silly to have a test here when the adjacent ExecEvalExpr() would choke on a NULL expression pointer. Petr Jelinek
*	Don't call PageGetSpecialPointer() on page until it's been initialized.	Heikki Linnakangas	2015-06-30
\| \| \| \| \| \| \| \| \| \|	After calling XLogInitBufferForRedo(), the page might be all-zeros if it was not in page cache already. btree_xlog_unlink_page initialized the page correctly, but it called PageGetSpecialPointer before initializing it, which would lead to a corrupt page at WAL replay, if the unlinked page is not in page cache. Backpatch to 9.4, the bug came with the rewrite of B-tree page deletion.
*	Initialize GIN metapage correctly when replaying metapage-update WAL record.	Heikki Linnakangas	2015-06-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I broke this with my WAL format refactoring patch. Before that, the metapage was read from disk, and modified in-place regardless of the LSN. That was always a bit silly, as there's no need to read the old page version from disk disk when we're overwriting it anyway. So that was changed in 9.5, but I failed to add a GinInitPage call to initialize the page-headers correctly. Usually you wouldn't notice, because the metapage is already in the page cache and is not zeroed. One way to reproduce this is to perform a VACUUM on an already vacuumed table (so that the vacuum has no real work to do), immediately after a checkpoint, and then perform an immediate shutdown. After recovery, the page headers of the metapage will be incorrectly all-zeroes. Reported by Jeff Janes
*	Also trigger restartpoints based on max_wal_size on standby.	Heikki Linnakangas	2015-06-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When archive recovery and restartpoints were initially introduced, checkpoint_segments was ignored on the grounds that the files restored from archive don't consume any space in the recovery server. That was changed in later releases, but even then it was arguably a feature rather than a bug, as performing restartpoints as often as checkpoints during normal operation might be excessive, but you might nevertheless not want to waste a lot of space for pre-allocated WAL by setting checkpoint_segments to a high value. But now that we have separate min_wal_size and max_wal_size settings, you can bound WAL usage with max_wal_size, and still avoid consuming excessive space usage by setting min_wal_size to a lower value, so that argument is moot. There are still some issues with actually limiting the space usage to max_wal_size: restartpoints in recovery can only start after seeing the checkpoint record, while a checkpoint starts flushing buffers as soon as the redo-pointer is set. Restartpoint is paced to happen at the same leisurily speed, determined by checkpoint_completion_target, as checkpoints, but because they are started later, max_wal_size can be exceeded by upto one checkpoint cycle's worth of WAL, depending on checkpoint_completion_target. But that seems better than not trying at all, and max_wal_size is a soft limit anyway. The documentation already claimed that max_wal_size is obeyed in recovery, so this just fixes the behaviour to match the docs. However, add some weasel-words there to mention that max_wal_size may well be exceeded by some amount in recovery.
*	Promote the assertion that XLogBeginInsert() is not called twice into ERROR.	Heikki Linnakangas	2015-06-28
\| \| \| \| \| \| \| \| \|	Seems like cheap insurance for WAL bugs. A spurious call to XLogBeginInsert() in itself would be fairly harmless, but if there is any data registered and the insertion is not completed/cancelled properly, there is a risk that the data ends up in a wrong WAL record. Per Jeff Janes's suggestion.
*	Fix double-XLogBeginInsert call in GIN page splits.	Heikki Linnakangas	2015-06-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If data checksums or wal_log_hints is on, and a GIN page is split, the code to find a new, empty, block was called after having already called XLogBeginInsert(). That causes an assertion failure or PANIC, if finding the new block involves updating a FSM page that had not been modified since last checkpoint, because that update is WAL-logged, which calls XLogBeginInsert again. Nested XLogBeginInsert calls are not supported. To fix, rearrange GIN code so that XLogBeginInsert is called later, after finding the victim buffers. Reported by Jeff Janes.
*	Avoid hot standby cancels from VAC FREEZE	Simon Riggs	2015-06-27
\| \| \| \| \| \| \| \| \| \| \| \|	VACUUM FREEZE generated false cancelations of standby queries on an otherwise idle master. Caused by an off-by-one error on cutoff_xid which goes back to original commit. Backpatch to all versions 9.0+ Analysis and report by Marco Nenciarini Bug fix by Simon Riggs
*	Fix BRIN xlog replay	Alvaro Herrera	2015-06-26
\| \| \| \| \| \| \| \|	There was a confusion about which block number to use when storing an item's pointer in the revmap -- the revmap page's blkno was being used, not the data page's blkno. Spotted-by: Jeff Janes
*	Be more conservative about removing tablespace "symlinks".	Robert Haas	2015-06-26
\| \| \| \| \| \| \| \| \| \|	Don't apply rmtree(), which will gleefully remove an entire subtree, and don't even apply unlink() unless it's symlink or a directory, the only things that we expect to find. Amit Kapila, with minor tweaks by me, per extensive discussions involving Andrew Dunstan, Fujii Masao, and Heikki Linnakangas, at least some of whom also reviewed the code.
*	Fix a couple of bugs with wal_log_hints.	Heikki Linnakangas	2015-06-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Replay of the WAL record for setting a bit in the visibility map contained an assertion that a full-page image of that record type can only occur with checksums enabled. But it can also happen with wal_log_hints, so remove the assertion. Unlike checksums, wal_log_hints can be changed on the fly, so it would be complicated to figure out if it was enabled at the time that the WAL record was generated. 2. wal_log_hints has the same effect on the locking needed to read the LSN of a page as data checksums. BufferGetLSNAtomic() didn't get the memo. Backpatch to 9.4, where wal_log_hints was added.
*	Improve multixact emergency autovacuum logic.	Andres Freund	2015-06-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously autovacuum was not necessarily triggered if space in the members slru got tight. The first problem was that the signalling was tied to values in the offsets slru, but members can advance much faster. Thats especially a problem if old sessions had been around that previously prevented the multixact horizon to increase. Secondly the skipping logic doesn't work if the database was restarted after autovacuum was triggered - that knowledge is not preserved across restart. This is especially a problem because it's a common panic-reaction to restart the database if it gets slow to anti-wraparound vacuums. Fix the first problem by separating the logic for members from offsets. Trigger autovacuum whenever a multixact crosses a segment boundary, as the current member offset increases in irregular values, so we can't use a simple modulo logic as for offsets. Add a stopgap for the second problem, by signalling autovacuum whenver ERRORing out because of boundaries. Discussion: 20150608163707.GD20772@alap3.anarazel.de Backpatch into 9.3, where it became more likely that multixacts wrap around.
*	Add missing check for wal_debug GUC.	Andres Freund	2015-06-21
\| \| \| \| \| \| \| \| \| \|	9a20a9b2 added a new elog(), enabled when WAL_DEBUG is defined. The other WAL_DEBUG dependant messages check for the wal_debug GUC, but this one did not. While at it replace 'upto' with 'up to'. Discussion: 20150610110253.GF3832@alap3.anarazel.de Backpatch to 9.4, the first release containing 9a20a9b2.
*	Fix corner case in autovacuum-forcing logic for multixact wraparound.	Robert Haas	2015-06-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since find_multixact_start() relies on SimpleLruDoesPhysicalPageExist(), and that function looks only at the on-disk state, it's possible for it to fail to find a page that exists in the in-memory SLRU that has not been written yet. If that happens, SetOffsetVacuumLimit() will erroneously decide to force emergency autovacuuming immediately. We should probably fix find_multixact_start() to consider the data cached in memory as well as on the on-disk state, but that's no excuse for SetOffsetVacuumLimit() to be stupid about the case where it can no longer read the value after having previously succeeded in doing so. Report by Andres Freund.
*	Fix typos	Alvaro Herrera	2015-06-08
\| \| \| \| \| \| \|	tablesapce -> tablespace there -> their These were introduced in 72d422a52, so no need to backpatch.
*	Refactor WAL segment copying code.	Fujii Masao	2015-06-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Remove unused argument "dstfname" and related code from XLogFileCopy(). * Previously XLogFileCopy() returned a pstrdup'd string so that InstallXLogFileSegment() used it later. Since the pstrdup'd string was never free'd, there could be a risk of memory leak. It was almost harmless because the startup process exited just after calling XLogFileCopy(), it existed. This commit changes XLogFileCopy() so that it directly calls InstallXLogFileSegment() and doesn't call pstrdup() at all. Which fixes that memory leak problem. * Extend InstallXLogFileSegment() so that the caller can specify the log level. Which allows us to emit an error when InstallXLogFileSegment() fails a disk file access like link() and rename(). Previously it was always logged with LOG level and additionally needed to be logged with ERROR when we wanted to treat it as an error. Michael Paquier
*	Allow HotStandbyActiveInReplay() to be called in single user mode.	Andres Freund	2015-06-08
\| \| \| \| \| \| \| \| \| \| \| \|	HotStandbyActiveInReplay, introduced in 061b079f, only allowed WAL replay to happen in the startup process, missing the single user case. This buglet is fairly harmless as it only causes problems when single user mode in an assertion enabled build is used to replay a btree vacuum record. Backpatch to 9.2. 061b079f was backpatched further, but the assertion was not.
*	Cope with possible failure of the oldest MultiXact to exist.	Robert Haas	2015-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent commits, mainly b69bf30b9bfacafc733a9ba77c9587cf54d06c0c and 53bb309d2d5a9432d2602c93ed18e58bd2924e15, introduced mechanisms to protect against wraparound of the MultiXact member space: the number of multixacts that can exist at one time is limited to 2^32, but the total number of members in those multixacts is also limited to 2^32, and older code did not take care to enforce the second limit, potentially allowing old data to be overwritten while it was still needed. Unfortunately, these new mechanisms failed to account for the fact that the code paths in which they run might be executed during recovery or while the cluster was in an inconsistent state. Also, they failed to account for the fact that users who used pg_upgrade to upgrade a PostgreSQL version between 9.3.0 and 9.3.4 might have might oldestMultiXid = 1 in the control file despite the true value being larger. To fix these problems, first, avoid unnecessarily examining the mmembers of MultiXacts when the cluster is not known to be consistent. TruncateMultiXact has done this for a long time, and this patch does not fix that. But the new calls used to prevent member wraparound are not needed until we reach normal running, so avoid calling them earlier. (SetMultiXactIdLimit is actually called before InRecovery is set, so we can't rely on that; we invent our own multixact-specific flag instead.) Second, make failure to look up the members of a MultiXact a non-fatal error. Instead, if we're unable to determine the member offset at which wraparound would occur, postpone arming the member wraparound defenses until we are able to do so. If we're unable to determine the member offset that should force autovacuum, force it continuously until we are able to do so. If we're unable to deterine the member offset at which we should truncate the members SLRU, log a message and skip truncation. An important consequence of these changes is that anyone who does have a bogus oldestMultiXid = 1 value in pg_control will experience immediate emergency autovacuuming when upgrading to a release that contains this fix. The release notes should highlight this fact. If a user has no pg_multixact/offsets/0000 file, but has oldestMultiXid = 1 in the control file, they may wish to vacuum any tables with relminmxid = 1 prior to upgrading in order to avoid an immediate emergency autovacuum after the upgrade. This must be done with a PostgreSQL version 9.3.5 or newer and with vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age set to 0. This patch also adds an additional log message at each database server startup, indicating either that protections against member wraparound have been engaged, or that they have not. In the latter case, once autovacuum has advanced oldestMultiXid to a sane value, the message indicating that the guards have been engaged will appear at the next checkpoint. A few additional messages have also been added at the DEBUG1 level so that the correct operation of this code can be properly audited. Along the way, this patch fixes another, related bug in TruncateMultiXact that has existed since PostgreSQL 9.3.0: when no MultiXacts exist at all, the truncation code looks up NextMultiXactId, which doesn't exist yet. This can lead to TruncateMultiXact removing every file in pg_multixact/offsets instead of keeping one around, as it should. This in turn will cause the database server to refuse to start afterwards. Patch by me. Review by Álvaro Herrera, Andres Freund, Noah Misch, and Thomas Munro.
*	Fix fsync-at-startup code to not treat errors as fatal.	Tom Lane	2015-05-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 2ce439f3379aed857517c8ce207485655000fc8e introduced a rather serious regression, namely that if its scan of the data directory came across any un-fsync-able files, it would fail and thereby prevent database startup. Worse yet, symlinks to such files also caused the problem, which meant that crash restart was guaranteed to fail on certain common installations such as older Debian. After discussion, we agreed that (1) failure to start is worse than any consequence of not fsync'ing is likely to be, therefore treat all errors in this code as nonfatal; (2) we should not chase symlinks other than those that are expected to exist, namely pg_xlog/ and tablespace links under pg_tblspc/. The latter restriction avoids possibly fsync'ing a much larger part of the filesystem than intended, if the user has left random symlinks hanging about in the data directory. This commit takes care of that and also does some code beautification, mainly moving the relevant code into fd.c, which seems a much better place for it than xlog.c, and making sure that the conditional compilation for the pre_sync_fname pass has something to do with whether pg_flush_data works. I also relocated the call site in xlog.c down a few lines; it seems a bit silly to be doing this before ValidateXLOGDirectoryStructure(). The similar logic in initdb.c ought to be made to match this, but that change is noncritical and will be dealt with separately. Back-patch to all active branches, like the prior commit. Abhijit Menon-Sen and Tom Lane
*	Fix valgrind's "unaddressable bytes" whining about BRIN code.	Tom Lane	2015-05-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	brin_form_tuple calculated an exact tuple size, then palloc'd and filled just that much. Later, brin_doinsert or brin_doupdate would MAXALIGN the tuple size and tell PageAddItem that that was the size of the tuple to insert. If the original tuple size wasn't a multiple of MAXALIGN, the net result would be that PageAddItem would memcpy a few more bytes than the palloc request had been for. AFAICS, this is totally harmless in the real world: the error is a read overrun not a write overrun, and palloc would certainly have rounded the request up to a MAXALIGN multiple internally, so there's no chance of the memcpy fetching off the end of memory. Valgrind, however, is picky to the byte level not the MAXALIGN level. Fix it by pushing the MAXALIGN step back to brin_form_tuple. (The other possible source of tuples in this code, brin_form_placeholder_tuple, was already producing a MAXALIGN'd result.) In passing, be a bit more paranoid about internal allocations in brin_form_tuple.
*	Update README.tuplock	Alvaro Herrera	2015-05-25
\| \| \| \| \| \| \| \| \| \| \|	Multixact truncation is now handled differently, and this file hadn't gotten the memo. Per note from Amit Langote. I didn't use his patch, though. Also update the description of infomask bits, which weren't completely up to date either. This commit also propagates b01a4f6838 back to 9.3 and 9.4, which apparently I failed to do back then.
*	Manual cleanup of pgindent results.	Tom Lane	2015-05-24
\| \| \| \| \| \|	Fix some places where pgindent did silly stuff, often because project style wasn't followed to begin with. (I've not touched the atomics headers, though.)
*	pgindent run for 9.5	Bruce Momjian	2015-05-23
\|
*	Fix incorrect snprintf() limit.	Tom Lane	2015-05-23
\| \| \| \| \| \| \| \|	Typo in commit 7cbee7c0a. No practical effect since the buffer should never actually be overrun, but various compilers and static analyzers will whine about it. Petr Jelinek
*	Still more fixes for lossy-GiST-distance-functions patch.	Tom Lane	2015-05-23
\| \| \| \| \| \|	Fix confusion in documentation, substantial memory leakage if float8 or float4 are pass-by-reference, and assorted comments that were obsoleted by commit 98edd617f3b62a02cb2df9b418fcc4ece45c7ec0.
*	At promotion, don't leave behind a partial segment on the old timeline.	Heikki Linnakangas	2015-05-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With commit de768844, a copy of the partial segment was archived with the .partial suffix, but the original file was still left in pg_xlog, so it didn't actually solve the problems with archiving the partial segment that it was supposed to solve. With this patch, the partial segment is renamed rather than copied, so we only archive it with the .partial suffix. Also be more robust in detecting if the last segment is already being archived. Previously I used XLogArchiveIsBusy() for that, but that's not quite right. With archive_mode='always', there might be a .ready file for it, and we don't want to rename it to .partial in that case. The old segment is needed until we're fully committed to the new timeline, i.e. until we've written the end-of-recovery WAL record and updated the min recovery point and timeline in the control file. So move the renaming later in the startup sequence, after all that's been done.
*	Make recovery_target_action = pause work.	Fujii Masao	2015-05-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously even if recovery_target_action was set to pause and the recovery target was reached, the recovery could never be paused. Because the setting of pause was always overridden with that of shutdown unexpectedly. This override is valid and intentional if hot_standby is not enabled because there is no way to resume the paused recovery in this case and the setting of pause is completely useless. But not if hot_standby is enabled. This patch changes the code so that the setting of pause is overridden with that of shutdown only when hot_standby is not enabled. Bug reported by Andres Freund
*	Fix more typos in comments.	Heikki Linnakangas	2015-05-20
\| \| \| \|	Patch by CharSyam, plus a few more I spotted with grep.
*	Collection of typo fixes.	Heikki Linnakangas	2015-05-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.
*	Fix spelling in comment	Simon Riggs	2015-05-19
\|
*	Fix typos in comments	Magnus Hagander	2015-05-17
\| \| \| \|	Dmitriy Olshevskiy
*	Fix whitespace	Peter Eisentraut	2015-05-16
\|
*	Add BRIN infrastructure for "inclusion" opclasses	Alvaro Herrera	2015-05-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This lets BRIN be used with R-Tree-like indexing strategies. Also provided are operator classes for range types, box and inet/cidr. The infrastructure provided here should be sufficient to create operator classes for similar datatypes; for instance, opclasses for PostGIS geometries should be doable, though we didn't try to implement one. (A box/point opclass was also submitted, but we ripped it out before commit because the handling of floating point comparisons in existing code is inconsistent and would generate corrupt indexes.) Author: Emre Hasegeli. Cosmetic changes by me Review: Andreas Karlsson
*	Move strategy numbers to include/access/stratnum.h	Alvaro Herrera	2015-05-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For upcoming BRIN opclasses, it's convenient to have strategy numbers defined in a single place. Since there's nothing appropriate, create it. The StrategyNumber typedef now lives there, as well as existing strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from gist.h). skey.h is forced to include stratnum.h because of the StrategyNumber typedef, but gist.h is not; extensions that currently rely on gist.h for rtree strategy numbers might need to add a new A few .c files can stop including skey.h and/or gist.h, which is a nice side benefit. Per discussion: https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org Authored by Emre Hasegeli and Álvaro. (It's not clear to me why bootscanner.l has any #include lines at all.)
*	TABLESAMPLE, SQL Standard and extensible	Simon Riggs	2015-05-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a TABLESAMPLE clause to SELECT statements that allows user to specify random BERNOULLI sampling or block level SYSTEM sampling. Implementation allows for extensible sampling functions to be written, using a standard API. Basic version follows SQLStandard exactly. Usable concrete use cases for the sampling API follow in later commits. Petr Jelinek Reviewed by Michael Paquier and Simon Riggs
*	Add archive_mode='always' option.	Heikki Linnakangas	2015-05-15
\| \| \| \| \| \| \|	In 'always' mode, the standby independently archives all files it receives from the primary. Original patch by Fujii Masao, docs and review by me.
*	Fix datatype confusion with the new lossy GiST distance functions.	Heikki Linnakangas	2015-05-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can only support a lossy distance function when the distance function's datatype is comparable with the original ordering operator's datatype. The distance function always returns a float8, so we are limited to float8, and float4 (by a hard-coded cast of the float8 to float4). In light of this limitation, it seems like a good idea to have a separate 'recheck' flag for the ORDER BY expressions, so that if you have a non-lossy distance function, it still works with lossy quals. There are cases like that with the build-in or contrib opclasses, but it's plausible. There was a hidden assumption that the ORDER BY values returned by GiST match the original ordering operator's return type, but there are plenty of examples where that's not true, e.g. in btree_gist and pg_trgm. As long as the distance function is not lossy, we can tolerate that and just not return the distance to the executor (or rather, always return NULL). The executor doesn't need the distances if there are no lossy results. There was another little bug: the recheck variable was not initialized before calling the distance function. That revealed the bigger issue, as the executor tried to reorder tuples that didn't need reordering, and that failed because of the datatype mismatch.
*	Allow GiST distance function to return merely a lower-bound.	Heikki Linnakangas	2015-05-15
\| \| \| \| \| \| \| \| \| \| \|	The distance function can now set *recheck = false, like index quals. The executor will then re-check the ORDER BY expressions, and use a queue to reorder the results on the fly. This makes it possible to do kNN-searches on polygons and circles, which don't store the exact value in the index, but just a bounding box. Alexander Korotkov and me