postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	If we don't have a backup-end-location, don't claim we've reached it.	Heikki Linnakangas	2012-11-28
\| \| \| \| \| \| \| \|	This was apparently a typo, which caused recovery to think that it immediately reached the end of backup, and allowed the database to start up too early. Reported by Jeff Janes. Backpatch to 9.2, where this code was introduced.
*	Add OpenTransientFile, with automatic cleanup at end-of-xact.	Heikki Linnakangas	2012-11-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Files opened with BasicOpenFile or PathNameOpenFile are not automatically cleaned up on error. That puts unnecessary burden on callers that only want to keep the file open for a short time. There is AllocateFile, but that returns a buffered FILE * stream, which in many cases is not the nicest API to work with. So add function called OpenTransientFile, which returns a unbuffered fd that's cleaned up like the FILE* returned by AllocateFile(). This plugs a few rare fd leaks in error cases: 1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile 2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't use OpenTransientFile here because the fd is supposed to persist over transaction boundaries. 3. lo_import/lo_export - fixed by using OpenTransientFile instead of PathNameOpenFile. In addition to plugging those leaks, this replaces many BasicOpenFile() calls with OpenTransientFile() that were not leaking, because the code meticulously closed the file on error. That wasn't strictly necessary, but IMHO it's good for robustness. The same leaks exist in older versions, but given the rarity of the issues, I'm not backpatching this. Not yet, anyway - it might be good to backpatch later, after this mechanism has had some more testing in master branch.
*	Avoid bogus "out-of-sequence timeline ID" errors in standby-mode.	Heikki Linnakangas	2012-11-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When startup process opens a WAL segment after replaying part of it, it validates the first page on the WAL segment, even though the page it's really interested in later in the file. As part of the validation, it checks that the TLI on the page header is >= the TLI it saw on the last page it read. If the segment contains a timeline switch, and we have already replayed it, and then re-open the WAL segment (because of streaming replication got disconnected and reconnected, for example), the TLI check will fail when the first page is validated. Fix that by relaxing the TLI check when re-opening a WAL segment. Backpatch to 9.0. Earlier versions had the same code, but before standby mode was introduced in 9.0, recovery never tried to re-read a segment after partially replaying it. Reported by Amit Kapila, while testing a new feature.
*	Fix archive_cleanup_command.	Heikki Linnakangas	2012-11-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I moved ExecuteRecoveryCommand() from xlog.c to xlogarchive.c, I didn't realize that it's called from the checkpoint process, not the startup process. I tried to use InRedo variable to decide whether or not to attempt cleaning up the archive (must not do so before we have read the initial checkpoint record), but that variable is only valid within the startup process. Instead, let ExecuteRecoveryCommand() always clean up the archive, and add an explicit argument to RestoreArchivedFile() to say whether that's allowed or not. The caller knows better. Reported by Erik Rijkers, diagnosis by Fujii Masao. Only 9.3devel is affected.
*	Skip searching for subxact locks at commit.	Simon Riggs	2012-11-13
\| \| \| \| \| \| \| \|	At commit all standby locks are released for the top-level transaction, so searching for locks for each subtransaction is both pointless and costly (N^2) in the presence of many AccessExclusiveLocks.
*	Fix multiple problems in WAL replay.	Tom Lane	2012-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most of the replay functions for WAL record types that modify more than one page failed to ensure that those pages were locked correctly to ensure that concurrent queries could not see inconsistent page states. This is a hangover from coding decisions made long before Hot Standby was added, when it was hardly necessary to acquire buffer locks during WAL replay at all, let alone hold them for carefully-chosen periods. The key problem was that RestoreBkpBlocks was written to hold lock on each page restored from a full-page image for only as long as it took to update that page. This was guaranteed to break any WAL replay function in which there was any update-ordering constraint between pages, because even if the nominal order of the pages is the right one, any mixture of full-page and non-full-page updates in the same record would result in out-of-order updates. Moreover, it wouldn't work for situations where there's a requirement to maintain lock on one page while updating another. Failure to honor an update ordering constraint in this way is thought to be the cause of bug #7648 from Daniel Farina: what seems to have happened there is that a btree page being split was rewritten from a full-page image before the new right sibling page was written, and because lock on the original page was not maintained it was possible for hot standby queries to try to traverse the page's right-link to the not-yet-existing sibling page. To fix, get rid of RestoreBkpBlocks as such, and instead create a new function RestoreBackupBlock that restores just one full-page image at a time. This function can be invoked by WAL replay functions at the points where they would otherwise perform non-full-page updates; in this way, the physical order of page updates remains the same no matter which pages are replaced by full-page images. We can then further adjust the logic in individual replay functions if it is necessary to hold buffer locks for overlapping periods. A side benefit is that we can simplify the handling of concurrency conflict resolution by moving that code into the record-type-specfic functions; there's no more need to contort the code layout to keep conflict resolution in front of the RestoreBkpBlocks call. In connection with that, standardize on zero-based numbering rather than one-based numbering for referencing the full-page images. In HEAD, I removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are still there in the header files in previous branches, but are no longer used by the code. In addition, fix some other bugs identified in the course of making these changes: spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple and we're not doing a full-page image: it would get fooled by the LSN having been advanced already. This would result in permanent index corruption, not just transient failure of concurrent queries. Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. heap_xlog_freeze() was inconsistent about using a cleanup lock or plain exclusive lock: it did the former in the normal path but the latter for a full-page image. A plain exclusive lock seems sufficient, so change to that. Also, remove gistRedoPageDeleteRecord(), which has been dead code since VACUUM FULL was rewritten. Back-patch to 9.0, where hot standby was introduced. Note however that 9.0 had a significantly different WAL-logging scheme for GIST index updates, and it doesn't appear possible to make that scheme safe for concurrent hot standby queries, because it can leave inconsistent states in the index even between WAL records. Given the lack of complaints from the field, we won't work too hard on fixing that branch.
*	Use correct text domain for translating errcontext() messages.	Heikki Linnakangas	2012-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	errcontext() is typically used in an error context callback function, not within an ereport() invocation like e.g errmsg and errdetail are. That means that the message domain that the TEXTDOMAIN magic in ereport() determines is not the right one for the errcontext() calls. The message domain needs to be determined by the C file containing the errcontext() call, not the file containing the ereport() call. Fix by turning errcontext() into a macro that passes the TEXTDOMAIN to use for the errcontext message. "errcontext" was used in a few places as a variable or struct field name, I had to rename those out of the way, now that errcontext is a macro. We've had this problem all along, but this isn't doesn't seem worth backporting. It's a fairly minor issue, and turning errcontext from a function to a macro requires at least a recompile of any external code that calls errcontext().
*	Remove leftover LWLockRelease() call	Alvaro Herrera	2012-11-09
\| \| \| \| \| \| \|	This code was refactored in d5497b95 but an extra LWLockRelease call was left behind. Per report from Erik Rijkers
*	Fix erroneous choice of timeline variable, too	Alvaro Herrera	2012-10-31
\|
*	Fix erroneous choices of segNo variables	Alvaro Herrera	2012-10-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit dfda6eba (which changed segment numbers to use a single 64 bit variable instead of log/seg) introduced a couple of bogus choices of exactly which log segment number variable to use in each case. This is currently pretty harmless; in one place, the bogus number was only being used in an error message for a pretty unlikely condition (failure to fsync a WAL segment file). In the other, it was using a global variable instead of the local variable; but all callsites were passing the value of the global variable anyway. No need to backpatch because that commit is not on earlier branches.
*	Throw error if expiring tuple is again updated or deleted.	Kevin Grittner	2012-10-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This prevents surprising behavior when a FOR EACH ROW trigger BEFORE UPDATE or BEFORE DELETE directly or indirectly updates or deletes the the old row. Prior to this patch the requested action on the row could be silently ignored while all triggered actions based on the occurence of the requested action could be committed. One example of how this could happen is if the BEFORE DELETE trigger for a "parent" row deleted "children" which had trigger functions to update summary or status data on the parent. This also prevents similar surprising problems if the query has a volatile function which updates a target row while it is already being updated. There are related issues present in FOR UPDATE cursors and READ COMMITTED queries which are not handled by this patch. These issues need further evalution to determine what change, if any, is needed. Where the new error messages are generated, in most cases the best fix will be to move code from the BEFORE trigger to an AFTER trigger. Where this is not feasible, the trigger can avoid the error by re-issuing the triggering statement and returning NULL. Documentation changes will be submitted in a separate patch. Kevin Grittner and Tom Lane with input from Florian Pflug and Robert Haas, based on problems encountered during conversion of Wisconsin Circuit Court trigger logic to plpgsql triggers.
*	Close un-owned SMgrRelations at transaction end.	Tom Lane	2012-10-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If an SMgrRelation is not "owned" by a relcache entry, don't allow it to live past transaction end. This design allows the same SMgrRelation to be used for blind writes of multiple blocks during a transaction, but ensures that we don't hold onto such an SMgrRelation indefinitely. Because an SMgrRelation typically corresponds to open file descriptors at the fd.c level, leaving it open when there's no corresponding relcache entry can mean that we prevent the kernel from reclaiming deleted disk space. (While CacheInvalidateSmgr messages usually fix that, there are cases where they're not issued, such as DROP DATABASE. We might want to add some more sinval messaging for that, but I'd be inclined to keep this type of logic anyway, since allowing VFDs to accumulate indefinitely for blind-written relations doesn't seem like a good idea.) This code replaces a previous attempt towards the same goal that proved to be unreliable. Back-patch to 9.1 where the previous patch was added.
*	Fix silly bug in previous refactoring.	Heikki Linnakangas	2012-10-09
\| \| \| \| \|	I extracted the refactoring patch from a larger patch that contained other changes too, but missed one unintentional change and didn't test enough...
*	Put the logic to wait for WAL in standby mode to a separate function.	Heikki Linnakangas	2012-10-09
\| \| \| \| \|	This is just refactoring with no user-visible effect, to make the code more readable.
*	Fix typo in comment, and reword it slightly while we're at it.	Heikki Linnakangas	2012-10-04
\|
*	Fix two bugs introduced in the xlog.c split.	Heikki Linnakangas	2012-10-03
\| \| \| \| \| \| \|	The comment explaining the naming of timeline history files was wrong, and the history file was not being arhived. Pointed out by Fujii Masao.
*	Add #includes needed on some platforms in the new files.	Heikki Linnakangas	2012-10-02
\| \| \| \|	Hopefully this makes the *BSD buildfarm animals happy.
*	Split off functions related to timeline history files and XLOG archiving.	Heikki Linnakangas	2012-10-02
\| \| \| \| \| \|	This is just refactoring, to make the functions accessible outside xlog.c. A followup patch will make use of that, to allow fetching timeline history files over streaming replication.
*	Fix btmarkpos/btrestrpos to handle array keys.	Tom Lane	2012-09-27
\| \| \| \| \| \| \| \|	This fixes another error in commit 9e8da0f75731aaa7605cf4656c21ea09e84d2eb1. I neglected to make the mark/restore functionality save and restore the current set of array key values, which led to strange behavior if an IndexScan with ScalarArrayOpExpr quals was used as the inner side of a mergejoin. Per bug #7570 from Melese Tesfaye.
*	Put back AcceptInvalidationMessages calls in heap_openrv(_extended).	Tom Lane	2012-09-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These calls were removed in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7 as part of a general refactoring and improvement of DDL locking. However, there's a problem not solved by the rewrite, which is that GRANT/REVOKE update pg_class.relacl without taking any particular lock on the target table as such. If another backend fails to do AcceptInvalidationMessages, it won't notice a recently-committed change in ACLs. Bug #7557 from Piotr Czachur demonstrates that there's at least one code path in 9.2.0 in which a command fails to do any AcceptInvalidationMessages calls at all, if the current transaction already holds all the locks it will need. Since we're hard up against the release deadline for 9.2.1, fix this by putting back the AcceptInvalidationMessages calls in heap_openrv and heap_openrv_extended, thereby restoring the historical behavior in this area. We ought to look for a more elegant and perhaps more bulletproof solution, but there's no time for that right now.
*	Properly set relpersistence for fake relcache entries.	Robert Haas	2012-09-14
\| \| \| \| \| \| \|	This can result in buffers failing to be properly flushed at checkpoint time, leading to data loss. Report, diagnosis, and patch by Jeff Davis.
*	Fix WAL file replacement during cascading replication on Windows.	Heikki Linnakangas	2012-09-05
\| \| \| \| \| \| \| \| \| \| \|	When the startup process restores a WAL file from the archive, it deletes any old file with the same name and renames the new file in its place. On Windows, however, when a file is deleted, it still lingers as long as a process holds a file handle open on it. With cascading replication, a walsender process can hold the old file open, so the rename() in the startup process would fail. To fix that, rename the old file to a temporary name, to make the original file name available for reuse, before deleting the old file.
*	Fix inappropriate error messages for Hot Standby misconfiguration errors.	Tom Lane	2012-09-05
\| \| \| \| \| \| \| \|	Give the correct name of the GUC parameter being complained of. Also, emit a more suitable SQLSTATE (INVALID_PARAMETER_VALUE, not the default INTERNAL_ERROR). Gurjeet Singh, errcode adjustment by me
*	Trim spgist_private.h inclusion	Alvaro Herrera	2012-09-05
\| \| \| \|	It doesn't really need rel.h; relcache.h is enough.
*	Fix compiler warnings about unused variables, caused by my previous commit.	Heikki Linnakangas	2012-09-04
\| \| \| \|	Reported by Peter Eisentraut.
*	Fix bugs in cascading replication with recovery_target_timeline='latest'	Heikki Linnakangas	2012-09-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cascading replication code assumed that the current RecoveryTargetTLI never changes, but that's not true with recovery_target_timeline='latest'. The obvious upshot of that is that RecoveryTargetTLI in shared memory needs to be protected by a lock. A less obvious consequence is that when a cascading standby is connected, and the standby switches to a new target timeline after scanning the archive, it will continue to stream WAL to the cascading standby, but from a wrong file, ie. the file of the previous timeline. For example, if the standby is currently streaming from the middle of file 000000010000000000000005, and the timeline changes, the standby will continue to stream from that file. However, the WAL on the new timeline is in file 000000020000000000000005, so the standby sends garbage from 000000010000000000000005 to the cascading standby, instead of the correct WAL from file 000000020000000000000005. This also fixes a related bug where a partial WAL segment is restored from the archive and streamed to a cascading standby. The code assumed that when a WAL segment is copied from the archive, it can immediately be fully streamed to a cascading standby. However, if the segment is only partially filled, ie. has the right size, but only N first bytes contain valid WAL, that's not safe. That can happen if a partial WAL segment is manually copied to the archive, or if a partial WAL segment is archived because a server is started up on a new timeline within that segment. The cascading standby will get confused if the WAL it received is not valid, and will get stuck until it's restarted. This patch fixes that problem by not allowing WAL restored from the archive to be streamed to a cascading standby until it's been replayed, and thus validated.
*	Replace memcpy() calls in xlog.c critical sections with struct assignments.	Tom Lane	2012-09-03
\| \| \| \| \| \| \| \| \| \| \| \|	This gets rid of a dangerous-looking use of the not-volatile XLogCtl pointer in a couple of spinlock-protected sections, where the normal coding rule is that you should only access shared memory through a pointer-to-volatile. I think the risk is only hypothetical not actual, since for there to be a bug the compiler would have to move the spinlock acquire or release across the memcpy() call, which one sincerely hopes it will not. Still, it looks cleaner this way. Per comment from Daniel Farina and subsequent discussion.
*	Improve coding of gistchoose and gistRelocateBuildBuffersOnSplit.	Tom Lane	2012-08-30
\| \| \| \| \| \| \| \|	This is mostly cosmetic, but it does eliminate a speculative portability issue. The previous coding ignored the fact that sum_grow could easily overflow (in fact, it could be summing multiple IEEE float infinities). On a platform where that didn't guarantee to produce a positive result, the code would misbehave. In any case, it was less than readable.
*	Split tuple struct defs from htup.h to htup_details.h	Alvaro Herrera	2012-08-30
\| \| \| \| \| \| \| \| \| \| \| \|	This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.
*	Fix logic bug in gistchoose and gistRelocateBuildBuffersOnSplit.	Robert Haas	2012-08-30
\| \| \| \| \| \| \| \| \| \| \| \|	Every time the best-tuple-found-so-far changes, we need to reset all the penalty values in which_grow[] to the penalties for the new best tuple. The old code failed to do this, resulting in inferior index quality. The original patch from Alexander Korotkov was just two lines; I took the liberty of fleshing that out by adding a bunch of comments that I hope will make this logic easier for others to understand than it was for me.
*	Optimize SP-GiST insertions.	Heikki Linnakangas	2012-08-29
\| \| \| \| \| \| \|	This includes two micro-optimizations to the tight inner loop in descending the SP-GiST tree: 1. avoid an extra function call to index_getprocinfo when calling user-defined choose function, and 2. avoid a useless palloc+pfree when node labels are not used.
*	Split heapam_xlog.h from heapam.h	Alvaro Herrera	2012-08-28
\| \| \| \| \| \| \| \| \| \| \| \|	The heapam XLog functions are used by other modules, not all of which are interested in the rest of the heapam API. With this, we let them get just the XLog stuff in which they are interested and not pollute them with unrelated includes. Also, since heapam.h no longer requires xlog.h, many files that do include heapam.h no longer get xlog.h automatically, including a few headers. This is useful because heapam.h is getting pulled in by execnodes.h, which is in turn included by a lot of files.
*	Split resowner.h	Alvaro Herrera	2012-08-28
\| \| \| \| \|	This lets files that are mere users of ResourceOwner not automatically include the headers for stuff that is managed by the resowner mechanism.
*	Avoid somewhat-theoretical overflow risks in RecordIsValid().	Tom Lane	2012-08-21
\| \| \| \| \| \| \| \| \| \| \| \| \|	This improves on commit 51fed14d73ed3acd2282b531fb1396877e44e86a by eliminating the assumption that we can form <some pointer value> + <some offset> without overflow. The entire point of those tests is that we don't trust the offset value, so coding them in a way that could wrap around if the buffer happens to be near the top of memory doesn't seem sound. Instead, track the remaining space as a size_t variable and compare offsets against that. Also, improve comment about why we need the extra early check on xl_tot_len.
*	Don't get confused if a WAL partial record header has xl_tot_len == 0.	Heikki Linnakangas	2012-08-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a WAL record header was split across pages, but xl_tot_len was 0, we would get confused and conclude that we had already read the whole record, and proceed to CRC check it. That can lead to a crash in RecordIsValid(), which isn't careful to not read beyond end-of-record, as defined by xl_tot_len. Add an explicit sanity check for xl_tot_len <= SizeOfXlogRecord. Also, make RecordIsValid() more robust by checking in each step that it doesn't try to access memory beyond end of record, even if a length field in the record's or a backup block's header is bogus. Per report and analysis by Tom Lane.
*	Delete inaccurate C comment about FSM and adding pages, per Robert Haas.	Bruce Momjian	2012-08-16
\|
*	Fix GiST buffering build bug, which caused "failed to re-find parent" errors.	Heikki Linnakangas	2012-08-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use a hash table to track the parents of inner pages, but when inserting to a leaf page, the caller of gistbufferinginserttuples() must pass a correct block number of the leaf's parent page. Before gistProcessItup() descends to a child page, it checks if the downlink needs to be adjusted to accommodate the new tuple, and updates the downlink if necessary. However, updating the downlink might require splitting the page, which might move the downlink to a page to the right. gistProcessItup() doesn't realize that, so when it descends to the leaf page, it might pass an out-of-date parent block number as a result. Fix that by returning the block a tuple was inserted to from gistbufferinginserttuples(). This fixes the bug reported by Zdeněk Jílovec.
*	Update C comment to NOTICE to reflect previous commit changing the error	Bruce Momjian	2012-08-15
\| \| \| \|	level, per report from Tom.
*	Fix minor bug in XLogFileRead() that accidentally worked.	Simon Riggs	2012-08-08
\| \| \| \| \| \| \| \| \|	Cascading replication copied the incoming file into pg_xlog but didn't set path correctly, so the first attempt to open file failed causing it to loop around and look for file in pg_xlog. So the earlier coding worked, but accidentally rather than by design. Spotted by Fujii Masao, fix by Fujii Masao and Simon Riggs
*	Fix TwoPhaseGetDummyBackendId().	Tom Lane	2012-08-08
\| \| \| \| \| \| \| \| \| \| \| \| \|	This was broken in commit ed0b409d22346b1b027a4c2099ca66984d94b6dd, which revised the GlobalTransactionData struct to not include the associated PGPROC as its first member, but overlooked one place where a cast was used in reliance on that equivalence. The most effective way of fixing this seems to be to create a new function that looks up the GlobalTransactionData struct given the XID, and make both TwoPhaseGetDummyBackendId and TwoPhaseGetDummyProc rely on that. Per report from Robert Ross.
*	fsync backup_label after pg_start_backup()	Simon Riggs	2012-08-07
\| \| \| \|	Dave Kerr
*	Improve underdocumented btree_xlog_delete_get_latestRemovedXid() code.	Tom Lane	2012-08-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noted by Noah Misch, btree_xlog_delete_get_latestRemovedXid is critically dependent on the assumption that it's examining a consistent state of the database. This was undocumented though, so the seemingly-unrelated check for no active HS sessions might be thought to be merely an optional optimization. Improve comments, and add an explicit check of reachedConsistency just to be sure. This function returns InvalidTransactionId (thereby killing all HS transactions) in several cases that are not nearly unlikely enough for my taste. This commit doesn't attempt to fix those deficiencies, just document them. Back-patch to 9.2, not from any real functional need but just to keep the branches more closely synced to simplify possible future back-patching.
*	In SPGiST replay, do conflict resolution before modifying the page.	Tom Lane	2012-08-03
\| \| \| \| \| \| \| \| \|	In yesterday's commit 962e0cc71e839c58fb9125fa85511b8bbb8bdbee, I added the ResolveRecoveryConflictWithSnapshot call in the wrong place. I correctly put it before spgRedoVacuumRedirect itself would modify the index page --- but not before RestoreBkpBlocks, so replay of a record with a full-page image would modify the page before kicking off any conflicting HS transactions. Oops.
*	Fix race conditions associated with SPGiST redirection tuples.	Tom Lane	2012-08-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The correct test for whether a redirection tuple is removable is whether tuple's xid < RecentGlobalXmin, not OldestXmin; the previous coding failed to protect index searches being done in concurrent transactions that have no XID. This mirrors the recent fix in btree's page recycling logic made in commit d3abbbebe52eb1e59e621c880ad57df9d40d13f2. Also, WAL-log the newest XID of any removed redirection tuple on an index page, and apply ResolveRecoveryConflictWithSnapshot during InHotStandby WAL replay. This protects against concurrent Hot Standby transactions possibly needing to see the redirection tuple(s). Per my query of 2012-03-12 and subsequent discussion.
*	Fix management of pendingOpsTable in auxiliary processes.	Tom Lane	2012-07-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mdinit() was misusing IsBootstrapProcessingMode() to decide whether to create an fsync pending-operations table in the current process. This led to creating a table not only in the startup and checkpointer processes as intended, but also in the bgwriter process, not to mention other auxiliary processes such as walwriter and walreceiver. Creation of the table in the bgwriter is fatal, because it absorbs fsync requests that should have gone to the checkpointer; instead they just sit in bgwriter local memory and are never acted on. So writes performed by the bgwriter were not being fsync'd which could result in data loss after an OS crash. I think there is no live bug with respect to walwriter and walreceiver because those never perform any writes of shared buffers; but the potential is there for future breakage in those processes too. To fix, make AuxiliaryProcessMain() export the current process's AuxProcType as a global variable, and then make mdinit() test directly for the types of aux process that should have a pendingOpsTable. Having done that, we might as well also get rid of the random bool flags such as am_walreceiver that some of the aux processes had grown. (Note that we could not have fixed the bug by examining those variables in mdinit(), because it's called from BaseInit() which is run by AuxiliaryProcessMain() before entering any of the process-type-specific code.) Back-patch to 9.2, where the problem was introduced by the split-up of bgwriter and checkpointer processes. The bogus pendingOpsTable exists in walwriter and walreceiver processes in earlier branches, but absent any evidence that it causes actual problems there, I'll leave the older branches alone.
*	Remove unreachable code	Peter Eisentraut	2012-07-16
\| \| \| \| \| \| \|	The Solaris Studio compiler warns about these instances, unlike more mainstream compilers such as gcc. But manual inspection showed that the code is clearly not reachable, and we hope no worthy compiler will complain about removing this code.
*	Cosmetic cleanup of ginInsertValue().	Tom Lane	2012-07-13
\| \| \| \| \| \| \| \|	Make it clearer that the passed stack mustn't be empty, and that we are not supposed to fall off the end of the stack in the main loop. Tighten the loop that extracts the root block number, too. Markus Wanner and Tom Lane
*	Fix a stupid bug I introduced into XLogFlush().	Robert Haas	2012-07-02
\| \| \| \| \|	Commit f11e8be3e812cdbbc139c1b4e49141378b118dee broke this; it was right in Peter's original patch, but I messed it up before committing.
*	Fix position of WalSndWakeupRequest call.	Robert Haas	2012-07-02
\| \| \| \| \| \| \|	This avoids discriminating against wal_sync_method = open_sync or open_datasync. Fujii Masao, reviewed by Andres Freund
*	Assorted message style improvements	Peter Eisentraut	2012-07-02
\|