postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
...
*	Fix decoding of consecutive MULTI_INSERTs emitted by one heap_multi_insert().	Andres Freund	2014-07-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 1b86c81d2d fixed the decoding of toasted columns for the rows contained in one xl_heap_multi_insert record. But that's not actually enough, because heap_multi_insert() will actually first toast all passed in rows and then emit several *_multi_insert records; one for each page it fills with tuples. Add a XLOG_HEAP_LAST_MULTI_INSERT flag which is set in xl_heap_multi_insert->flag denoting that this multi_insert record is the last emitted by one heap_multi_insert() call. Then use that flag in decode.c to only set clear_toast_afterwards in the right situation. Expand the number of rows inserted via COPY in the corresponding regression test to make sure that more than one heap page is filled with tuples by one heap_multi_insert() call. Backpatch to 9.4 like the previous commit.
*	Rename logical decoding's pg_llog directory to pg_logical.	Andres Freund	2014-07-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old name wasn't very descriptive as of actual contents of the directory, which are historical snapshots in the snapshots/ subdirectory and mappingdata for rewritten tuples in mappings/. There's been a fair amount of discussion what would be a good name. I'm settling for pg_logical because it's likely that further data around logical decoding and replication will need saving in the future. Also add the missing entry for the directory into storage.sgml's list of PGDATA contents. Bumps catversion as the data directories won't be compatible.
*	Avoid copying index tuples when building an index.	Robert Haas	2014-07-01
\| \| \| \| \| \| \| \| \| \| \| \|	The previous code, perhaps out of concern for avoid memory leaks, formed the tuple in one memory context and then copied it to another memory context. However, this doesn't appear to be necessary, since index_form_tuple and the functions it calls take precautions against leaking memory. In my testing, building the tuple directly inside the sort context shaves several percent off the index build time. Rearrange things so we do that. Patch by me. Review by Amit Kapila, Tom Lane, Andres Freund.
*	Fix and enhance the assertion of no palloc's in a critical section.	Heikki Linnakangas	2014-06-30
\| \| \| \| \| \| \| \| \| \| \| \|	The assertion failed if WAL_DEBUG or LWLOCK_STATS was enabled; fix that by using separate memory contexts for the allocations made within those code blocks. This patch introduces a mechanism for marking any memory context as allowed in a critical section. Previously ErrorContext was exempt as a special case. Instead of a blanket exception of the checkpointer process, only exempt the memory context used for the pending ops hash table.
*	Have multixact be truncated by checkpoint, not vacuum	Alvaro Herrera	2014-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of truncating pg_multixact at vacuum time, do it only at checkpoint time. The reason for doing it this way is twofold: first, we want it to delete only segments that we're certain will not be required if there's a crash immediately after the removal; and second, we want to do it relatively often so that older files are not left behind if there's an untimely crash. Per my proposal in http://www.postgresql.org/message-id/20140626044519.GJ7340@eldon.alvh.no-ip.org we now execute the truncation in the checkpointer process rather than as part of vacuum. Vacuum is in only charge of maintaining in shared memory the value to which it's possible to truncate the files; that value is stored as part of checkpoints also, and so upon recovery we can reuse the same value to re-execute truncate and reset the oldest-value-still-safe-to-use to one known to remain after truncation. Per bug reported by Jeff Janes in the course of his tests involving bug #8673. While at it, update some comments that hadn't been updated since multixacts were changed. Backpatch to 9.3, where persistency of pg_multixact files was introduced by commit 0ac5ad5134f2.
*	Fix broken Assert() introduced by 8e9a16ab8f7f0e58	Alvaro Herrera	2014-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't assert MultiXactIdIsRunning if the multi came from a tuple that had been share-locked and later copied over to the new cluster by pg_upgrade. Doing that causes an error to be raised unnecessarily: MultiXactIdIsRunning is not open to the possibility that its argument came from a pg_upgraded tuple, and all its other callers are already checking; but such multis cannot, obviously, have transactions still running, so the assert is pointless. Noticed while investigating the bogus pg_multixact/offsets/0000 file left over by pg_upgrade, as reported by Andres Freund in http://www.postgresql.org/message-id/20140530121631.GE25431@alap3.anarazel.de Backpatch to 9.3, as the commit that introduced the buglet.
*	Check for interrupts during tuple-insertion loops.	Robert Haas	2014-06-23
\| \| \| \| \| \| \| \|	Normally, this won't matter too much; but if I/O is really slow, for example because the system is overloaded, we might write many pages before checking for interrupts. A single toast insertion might write up to 1GB of data, and a multi-insert could write hundreds of tuples (and their corresponding TOAST data).
*	Fix bug in WAL_DEBUG.	Heikki Linnakangas	2014-06-23
\| \| \| \| \|	The record header was not copied correctly to the buffer that was passed to the rm_desc function. Broken by my rm_desc signature refactoring patch.
*	Don't allow to disable backend assertions via the debug_assertions GUC.	Andres Freund	2014-06-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existance of the assert_enabled variable (backing the debug_assertions GUC) reduced the amount of knowledge some static code checkers (like coverity and various compilers) could infer from the existance of the assertion. That could have been solved by optionally removing the assertion_enabled variable from the Assert() et al macros at compile time when some special macro is defined, but the resulting complication doesn't seem to be worth the gain from having debug_assertions. Recompiling is fast enough. The debug_assertions GUC is still available, but readonly, as it's useful when diagnosing problems. The commandline/client startup option -A, which previously also allowed to enable/disable assertions, has been removed as it doesn't serve a purpose anymore. While at it, reduce code duplication in bufmgr.c and localbuf.c assertions checking for spurious buffer pins. That code had to be reindented anyway to cope with the assert_enabled removal.
*	Change the signature of rm_desc so that it's passed a XLogRecord.	Heikki Linnakangas	2014-06-14
\| \| \| \|	Just feels more natural, and is more consistent with rm_redo.
*	Consistency improvements for slot and decoding code.	Andres Freund	2014-06-12
\| \| \| \| \| \| \| \|	Change the order of checks in similar functions to be the same; remove a parameter that's not needed anymore; rename a memory context and expand a couple of comments. Per review comments from Amit Kapila
*	Fix infinite loop when splitting inner tuples in SPGiST text indexes.	Tom Lane	2014-06-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the code used a node label of zero both for strings that contain no bytes beyond the inner tuple's prefix, and for cases where an "allTheSame" inner tuple has to be split to allow a string with a different next byte to be inserted into it. Failing to distinguish these cases meant that if a string ending with the current prefix needed to be inserted into an allTheSame tuple, we got into an infinite loop, because after splitting the tuple we'd descend into the child allTheSame tuple and then find we need to split again. To fix, instead use -1 and -2 as the node labels for these two cases. This requires widening the node label type from "char" to int2, but fortunately SPGiST stores all pass-by-value node label types in their Datum representation, which means that this change is transparently upward compatible so far as the on-disk representation goes. We continue to recognize zero as a dummy node label for reading purposes, but will not attempt to push new index entries down into such a label, so that the loop won't occur even when dealing with an existing index. Per report from Teodor Sigaev. Back-patch to 9.2 where the faulty code was introduced.
*	Wrap multixact/members correctly during extension, take 2	Alvaro Herrera	2014-06-09
\| \| \| \| \| \| \| \|	In a50d97625497b7 I already changed this, but got it wrong for the case where the number of members is larger than the number of entries that fit in the last page of the last segment. As reported by Serge Negodyuck in a followup to bug #8673.
*	Add defenses against running with a wrong selection of LOBLKSIZE.	Tom Lane	2014-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's critical that the backend's idea of LOBLKSIZE match the way data has actually been divided up in pg_largeobject. While we don't provide any direct way to adjust that value, doing so is a one-line source code change and various people have expressed interest recently in changing it. So, just as with TOAST_MAX_CHUNK_SIZE, it seems prudent to record the value in pg_control and cross-check that the backend's compiled-in setting matches the on-disk data. Also tweak the code in inv_api.c so that fetches from pg_largeobject explicitly verify that the length of the data field is not more than LOBLKSIZE. Formerly we just had Asserts() for that, which is no protection at all in production builds. In some of the call sites an overlength data value would translate directly to a security-relevant stack clobber, so it seems worth one extra runtime comparison to be sure. In the back branches, we can't change the contents of pg_control; but we can still make the extra checks in inv_api.c, which will offer some amount of protection against running with the wrong value of LOBLKSIZE.
*	Consistently spell a replication slot's name as slot_name.	Andres Freund	2014-06-05
\| \| \| \| \| \| \| \| \| \| \|	Previously there's been a mix between 'slotname' and 'slot_name'. It's not nice to be unneccessarily inconsistent in a new feature. As a post beta1 initdb now is required in the wake of eeca4cd35e, fix the inconsistencies. Most the changes won't affect usage of replication slots because the majority of changes is around function parameter names. The prominent exception to that is that the recovery.conf parameter 'primary_slotname' is now named 'primary_slot_name'.
*	Adjust SP-GiST WAL record formats to reduce alignment padding.	Heikki Linnakangas	2014-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The way the code was written, the padding was copied from uninitialized memory areas.. Because the structs are local variables in the code where the WAL records are constructed, making them larger and zeroing the padding bytes would not make the code very pretty, so rather than fixing this directly by zeroing out the padding bytes, it seems more clear to not try to align the tuples in the WAL records. The redo functions are taught to copy the tuple header to a local variable to avoid unaligned access. Stable-branches have the same problem, but we can't change the WAL format there, so fix in master only. Reading a few random extra bytes at the stack is harmless in practice, so it's not worth crafting a different back-patchable fix. Per reports from Kevin Grittner and Andres Freund, using clang static analyzer and Valgrind, respectively.
*	Fix error when trying to delete page with half-dead left sibling.	Heikki Linnakangas	2014-05-25
\| \| \| \| \| \| \| \| \| \| \| \|	The new page deletion code didn't cope with the case the target page's right sibling was marked half-dead. It failed a sanity check which checked that the downlinks in the parent page match the lower level, because a half-dead page has no downlink. To cope, check for that condition, and just give up on the deletion if it happens. The vacuum will finish the deletion of the half-dead page when it gets there, and on the next vacuum after that the empty can be deleted. Reported by Jeff Janes.
*	Fix backup-block numbering in redo of b-tree split.	Heikki Linnakangas	2014-05-19
\| \| \| \| \| \| \| \| \| \| \| \|	I got the backup block numbers off-by-one in the commit that changed the way incomplete-splits are handled. I blame the comments, which said "backup block 1" and "backup block 2", even though the backup blocks are numbered starting from 0, in the macros and functions used in replay. Fix the comments and the code. Per Jeff Janes' bug report about corruption caused by torn page writes. The incorrect code is new in git master, but backpatch the comment change down to 9.0, where the numbering in the redo-side macros was changed.
*	Fix a bunch of functions that were declared static then defined not-static.	Tom Lane	2014-05-17
\| \| \| \|	Per testing with a compiler that whines about this.
*	Update README, we don't do post-recovery cleanup actions anymore.	Heikki Linnakangas	2014-05-17
\| \| \| \| \| \| \|	transam/README explained how B-tree incomplete splits were tracked and fixed after recovery, as an example of handling complex actions that need multiple WAL records, but that's not how it works anymore. Explain the new paradigm.
*	Initialize tsId and dbId fields in WAL record of COMMIT PREPARED.	Heikki Linnakangas	2014-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit dd428c79 added dbId and tsId to the xl_xact_commit struct but missed that prepared transaction commits reuse that struct. Fix that. Because those fields were left unitialized, replaying a commit prepared WAL record in a hot standby node would fail to remove the relcache init file. That can lead to "could not open file" errors on the standby. Relcache init file only needs to be removed when a system table/index is rewritten in the transaction using two phase commit, so that should be rare in practice. In HEAD, the incorrect dbId/tsId values are also used for filtering in logical replication code, causing the transaction to always be filtered out. Analysis and fix by Andres Freund. Backpatch to 9.0 where hot standby was introduced.
*	Fix race condition in preparing a transaction for two-phase commit.	Heikki Linnakangas	2014-05-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To lock a prepared transaction's shared memory entry, we used to mark it with the XID of the backend. When the XID was no longer active according to the proc array, the entry was implicitly considered as not locked anymore. However, when preparing a transaction, the backend's proc array entry was cleared before transfering the locks (and some other state) to the prepared transaction's dummy PGPROC entry, so there was a window where another backend could finish the transaction before it was in fact fully prepared. To fix, rewrite the locking mechanism of global transaction entries. Instead of an XID, just have simple locked-or-not flag in each entry (we store the locking backend's backend id rather than a simple boolean, but that's just for debugging purposes). The backend is responsible for explicitly unlocking the entry, and to make sure that that happens, install a callback to unlock it on abort or process exit. Backpatch to all supported versions.
*	Code review for recent changes in relcache.c.	Tom Lane	2014-05-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rd_replidindex should be managed the same as rd_oidindex, and rd_keyattr and rd_idattr should be managed like rd_indexattr. Omissions in this area meant that the bitmapsets computed for rd_keyattr and rd_idattr would be leaked during any relcache flush, resulting in a slow but permanent leak in CacheMemoryContext. There was also a tiny probability of relcache entry corruption if we ran out of memory at just the wrong point in RelationGetIndexAttrBitmap. Otherwise, the fields were not zeroed where expected, which would not bother the code any AFAICS but could greatly confuse anyone examining the relcache entry while debugging. Also, create an API function RelationGetReplicaIndex rather than letting non-relcache code be intimate with the mechanisms underlying caching of that value (we won't even mention the memory leak there). Also, fix a relcache flush hazard identified by Andres Freund: RelationGetIndexAttrBitmap must not assume that rd_replidindex stays valid across index_open. The aspects of this involving rd_keyattr date back to 9.3, so back-patch those changes.
*	Rename min_recovery_apply_delay to recovery_min_apply_delay.	Tom Lane	2014-05-10
\| \| \| \| \| \| \|	Per discussion, this seems like a more consistent choice of name. Fabrízio de Royes Mello, after a suggestion by Peter Eisentraut; some additional documentation wordsmithing by me
*	Fix bug in lossy-page handling in GIN	Heikki Linnakangas	2014-05-10
\| \| \| \| \| \| \| \| \| \| \|	When returning rows from a bitmap, as done with partial match queries, we would get stuck in an infinite loop if the bitmap contained a lossy page reference. This bug is new in master, it was introduced by the patch to allow skipping items refuted by other entries in GIN scans. Report and fix by Alexander Korotkov
*	Remove overeager assertion in logical_heap_begin_rewrite.	Robert Haas	2014-05-09
\| \| \| \| \| \| \|	It's legal to configure wal_level=logical and max_replication_slots=0 simultaneously. Andres Freund
*	Protect against torn pages when deleting GIN list pages.	Heikki Linnakangas	2014-05-08
\| \| \| \| \| \| \| \| \|	To-be-deleted list pages contain no useful information, as they are being deleted, but we must still protect the writes from being torn by a crash after a partial write. To do that, re-initialize the pages on WAL replay. Jeff Janes caught this with a test program to test partial writes. Backpatch to all supported versions.
*	pgindent run for 9.4	Bruce Momjian	2014-05-06
\| \| \| \| \|	This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
*	Correct comment in Hot Standby nbtree handling	Simon Riggs	2014-05-06
\| \| \| \|	Logic is correct, matching handling of LP_DEAD elsewhere.
*	Assert that pre/post-fix updated tuples are on the same page during replay.	Heikki Linnakangas	2014-05-05
\| \| \| \| \| \| \| \| \| \| \|	If they were not 'oldtup.t_data' would be dereferenced while set to NULL in case of a full page image for block 0. Do so primarily to silence coverity; but also to make sure this prerequisite isn't changed without adapting the replay routine as that would appear to work in many cases. Andres Freund
*	Fix failure to detoast fields in composite elements of structured types.	Tom Lane	2014-05-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have an array of records stored on disk, the individual record fields cannot contain out-of-line TOAST pointers: the tuptoaster.c mechanisms are only prepared to deal with TOAST pointers appearing in top-level fields of a stored row. The same applies for ranges over composite types, nested composites, etc. However, the existing code only took care of expanding sub-field TOAST pointers for the case of nested composites, not for other structured types containing composites. For example, given a command such as UPDATE tab SET arraycol = ARRAY[(ROW(x,42)::mycompositetype] ... where x is a direct reference to a field of an on-disk tuple, if that field is long enough to be toasted out-of-line then the TOAST pointer would be inserted as-is into the array column. If the source record for x is later deleted, the array field value would become a dangling pointer, leading to errors along the line of "missing chunk number 0 for toast value ..." when the value is referenced. A reproducible test case for this was provided by Jan Pecek, but it seems likely that some of the "missing chunk number" reports we've heard in the past were caused by similar issues. Code-wise, the problem is that PG_DETOAST_DATUM() is not adequate to produce a self-contained Datum value if the Datum is of composite type. Seen in this light, the problem is not just confined to arrays and ranges, but could also affect some other places where detoasting is done in that way, for example form_index_tuple(). I tried teaching the array code to apply toast_flatten_tuple_attribute() along with PG_DETOAST_DATUM() when the array element type is composite, but this was messy and imposed extra cache lookup costs whether or not any TOAST pointers were present, indeed sometimes when the array element type isn't even composite (since sometimes it takes a typcache lookup to find that out). The idea of extending that approach to all the places that currently use PG_DETOAST_DATUM() wasn't attractive at all. This patch instead solves the problem by decreeing that composite Datum values must not contain any out-of-line TOAST pointers in the first place; that is, we expand out-of-line fields at the point of constructing a composite Datum, not at the point where we're about to insert it into a larger tuple. This rule is applied only to true composite Datums, not to tuples that are being passed around the system as tuples, so it's not as invasive as it might sound at first. With this approach, the amount of code that has to be touched for a full solution is greatly reduced, and added cache lookup costs are avoided except when there actually is a TOAST pointer that needs to be inlined. The main drawback of this approach is that we might sometimes dereference a TOAST pointer that will never actually be used by the query, imposing a rather large cost that wasn't there before. On the other side of the coin, if the field value is used multiple times then we'll come out ahead by avoiding repeat detoastings. Experimentation suggests that common SQL coding patterns are unaffected either way, though. Applications that are very negatively affected could be advised to modify their code to not fetch columns they won't be using. In future, we might consider reverting this solution in favor of detoasting only at the point where data is about to be stored to disk, using some method that can drill down into multiple levels of nested structured types. That will require defining new APIs for structured types, though, so it doesn't seem feasible as a back-patchable fix. Note that this patch changes HeapTupleGetDatum() from a macro to a function call; this means that any third-party code using that macro will not get protection against creating TOAST-pointer-containing Datums until it's recompiled. The same applies to any uses of PG_RETURN_HEAPTUPLEHEADER(). It seems likely that this is not a big problem in practice: most of the tuple-returning functions in core and contrib produce outputs that could not possibly be toasted anyway, and the same probably holds for third-party extensions. This bug has existed since TOAST was invented, so back-patch to all supported branches.
*	Rationalize common/relpath.[hc].	Tom Lane	2014-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit a73018392636ce832b09b5c31f6ad1f18a4643ea created rather a mess by putting dependencies on backend-only include files into include/common. We really shouldn't do that. To clean it up: * Move TABLESPACE_VERSION_DIRECTORY back to its longtime home in catalog/catalog.h. We won't consider this symbol part of the FE/BE API. * Push enum ForkNumber from relfilenode.h into relpath.h. We'll consider relpath.h as the source of truth for fork numbers, since relpath.c was already partially serving that function, and anyway relfilenode.h was kind of a random place for that enum. * So, relfilenode.h now includes relpath.h rather than vice-versa. This direction of dependency is fine. (That allows most, but not quite all, of the existing explicit #includes of relpath.h to go away again.) * Push forkname_to_number from catalog.c to relpath.c, just to centralize fork number stuff a bit better. * Push GetDatabasePath from catalog.c to relpath.c; it was rather odd that the previous commit didn't keep this together with relpath(). * To avoid needing relfilenode.h in common/, redefine the underlying function (now called GetRelationPath) as taking separate OID arguments, and make the APIs using RelFileNode or RelFileNodeBackend into macro wrappers. (The macros have a potential multiple-eval risk, but none of the existing call sites have an issue with that; one of them had such a risk already anyway.) * Fix failure to follow the directions when "init" fork type was added; specifically, the errhint in forkname_to_number wasn't updated, and neither was the SGML documentation for pg_relation_size(). * Fix tablespace-path-too-long check in CreateTableSpace() to account for fork-name component of maximum-length pathnames. This requires putting FORKNAMECHARS into a header file, but it was rather useless (and actually unreferenced) where it was. The last couple of items are potentially back-patchable bug fixes, if anyone is sufficiently excited about them; but personally I'm not. Per a gripe from Christoph Berg about how include/common wasn't self-contained.
*	Fix two bugs in WAL-logging of GIN pending-list pages.	Heikki Linnakangas	2014-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In writeListPage, never take a full-page image of the page, because we have all the information required to re-initialize in the WAL record anyway. Before this fix, a full-page image was always generated, unless full_page_writes=off, because when the page is initialized its LSN is always 0. In stable-branches, keep the code to restore the backup blocks if they exist, in case that the WAL is generated with an older minor version, but in master Assert that there are no full-page images. In the redo routine, add missing "off++". Otherwise the tuples are added to the page in reverse order. That happens to be harmless because we always scan and remove all the tuples together, but it was clearly wrong. Also, it was masked by the first bug unless full_page_writes=off, because the page was always restored from a full-page image. Backpatch to all supported versions.
*	Improve generation algorithm for database system identifier.	Tom Lane	2014-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \|	As noted some time ago, the original coding had a typo ("\|" for "^") that made the result less unique than intended. Even the intended behavior is obsolete since it was based on wanting to produce a usable value even if we didn't have int64 arithmetic --- a limitation we stopped supporting years ago. Instead, let's redefine the system identifier as tv_sec in the upper 32 bits (same as before), tv_usec in the next 20 bits, and the low 12 bits of getpid() in the remaining bits. This is still hardly guaranteed-universally-unique, but it's noticeably better than before. Per my proposal at <29019.1374535940@sss.pgh.pa.us>
*	Fix race when updating a tuple concurrently locked by another process	Alvaro Herrera	2014-04-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a tuple is locked, and this lock is later upgraded either to an update or to a stronger lock, and in the meantime some other process tries to lock, update or delete the same tuple, it (the tuple) could end up being updated twice, or having conflicting locks held. The reason for this is that the second updater checks for a change in Xmax value, or in the HEAP_XMAX_IS_MULTI infomask bit, after noticing the first lock; and if there's a change, it restarts and re-evaluates its ability to update the tuple. But it neglected to check for changes in lock strength or in lock-vs-update status when those two properties stayed the same. This would lead it to take the wrong decision and continue with its own update, when in reality it shouldn't do so but instead restart from the top. This could lead to either an assertion failure much later (when a multixact containing multiple updates is detected), or duplicate copies of tuples. To fix, make sure to compare the other relevant infomask bits alongside the Xmax value and HEAP_XMAX_IS_MULTI bit, and restart from the top if necessary. Also, in the belt-and-suspenders spirit, add a check to MultiXactCreateFromMembers that a multixact being created does not have two or more members that are claimed to be updates. This should protect against other bugs that might cause similar bogus situations. Backpatch to 9.3, where the possibility of multixacts containing updates was introduced. (In prior versions it was possible to have the tuple lock upgraded from shared to exclusive, and an update would not restart from the top; yet we're protected against a bug there because there's always a sleep to wait for the locking transaction to complete before continuing to do anything. Really, the fact that tuple locks always conflicted with concurrent updates is what protected against bugs here.) Per report from Andrew Dunstan and Josh Berkus in thread at http://www.postgresql.org/message-id/534C8B33.9050807@pgexperts.com Bug analysis by Andres Freund.
*	Reset pg_stat_activity.xact_start during PREPARE TRANSACTION.	Tom Lane	2014-04-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Once we've completed a PREPARE, our session is not running a transaction, so its entry in pg_stat_activity should show xact_start as null, rather than leaving the value as the start time of the now-prepared transaction. I think possibly this oversight was triggered by faulty extrapolation from the adjacent comment that says PrepareTransaction should not call AtEOXact_PgStat, so tweak the wording of that comment. Noted by Andres Freund while considering bug #10123 from Maxim Boguk, although this error doesn't seem to explain that report. Back-patch to all active branches.
*	Update obsolete comments.	Heikki Linnakangas	2014-04-23
\| \| \| \|	We no longer have a TLI field in the page header.
*	Fix typos in comment.	Heikki Linnakangas	2014-04-23
\|
*	Cleanup of new b-tree page deletion code.	Heikki Linnakangas	2014-04-23
\| \| \| \| \| \| \| \| \| \| \| \|	When marking a branch as half-dead, a pointer to the top of the branch is stored in the leaf block's hi-key. During normal operation, the high key was left in place, and the block number was just stored in the ctid field of the high key tuple, but in WAL replay, the high key was recreated as a truncated tuple with zero columns. For the sake of easier debugging, also truncate the tuple in normal operation, so that the page is identical after WAL replay. Also, rename the 'downlink' field in the WAL record to 'topparent', as that seems like a more descriptive name. And make sure it's set to invalid when unlinking the leaf page.
*	Fix broken logic in logical_heap_rewrite_flush_mappings().	Tom Lane	2014-04-22
\| \| \| \| \|	It's blatantly obvious that commit 4d0d607a454ee832574afd52a3c515099cc85eb3 wasn't tested. The leak's real enough, though.
*	revert 4d0d607a454ee832574afd52a3c515099cc85eb3	Bruce Momjian	2014-04-22
\| \| \| \|	Revert due to contrib/test_decoding regression failure
*	release memory used while flushing logical mappings	Bruce Momjian	2014-04-22
\| \| \| \|	Patch by Ants Aasma
*	Fix bug in the new B-tree incomplete-split code.	Heikki Linnakangas	2014-04-22
\| \| \| \| \| \|	Forgot to update LSN of left sibling's page, when creating a new root. I fixed this for regular insertions and page splits earlier, but missed new root creation.
*	Fix Gin README.	Heikki Linnakangas	2014-04-22
\| \| \| \| \| \| \|	The README incorrectly claimed that GIN posting tree pages contain an array of uncompressed items in addition to compressed posting lists. Earlier versions of the GIN posting list compression patch worked that way, but not the one that was committed.
*	Fix bug in new B-tree page deletion code.	Heikki Linnakangas	2014-04-22
\| \| \| \| \|	When modifying a page, must hold an exclusive lock. A shared lock is obviously not good enough.
*	Retain original physical order of tuples in redo of b-tree splits.	Heikki Linnakangas	2014-04-22
\| \| \| \| \|	It makes no difference to the system, but minimizing the differences between a master and standby makes debugging simpler.
*	Fix rm_desc routine of b-tree page delete records.	Heikki Linnakangas	2014-04-22
\| \| \| \|	A couple of typos from my refactoring of the page deletion patch.
*	Fix typo.	Robert Haas	2014-04-20
\| \| \| \|	Etsuro Fujita
*	Fix typo	Magnus Hagander	2014-04-18
\| \| \| \|	Amit Langote
*	report stat() error in trigger file check	Bruce Momjian	2014-04-17
\| \| \| \| \| \| \|	Permissions might prevent the existence of the trigger file from being checked. Per report from Andres Freund