postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Support multiple iterators in the Red-Black Tree implementation.	Heikki Linnakangas	2016-09-02
\| \| \| \| \| \| \|	While we don't need multiple iterators at the moment, the interface is nicer and less dangerous this way. Aleksander Alekseev, with some changes by me.
*	Prevent starting a standalone backend with standby_mode on.	Tom Lane	2016-08-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This can't really work because standby_mode expects there to be more WAL arriving, which there will not ever be because there's no WAL receiver process to fetch it. Moreover, if standby_mode is on then hot standby might also be turned on, causing even more strangeness because that expects read-only sessions to be executing in parallel. Bernd Helmle reported a case where btree_xlog_delete_get_latestRemovedXid got confused, but rather than band-aiding individual problems it seems best to prevent getting anywhere near this state in the first place. Back-patch to all supported branches. In passing, also fix some omissions of errcodes in other ereport's in readRecoveryCommandFile(). Michael Paquier (errcode hacking by me) Discussion: <00F0B2CEF6D0CEF8A90119D4@eje.credativ.lan>
*	Split hash.h → hash_xlog.h	Alvaro Herrera	2016-08-29
\| \| \| \| \| \| \| \|	Since the hash AM is going to be revamped to have WAL, this is a good opportunity to clean up the include file a little bit to avoid including a lot of extra stuff in the future. Author: Amit Kapila
*	Fix typos in comments.	Fujii Masao	2016-08-29
\|
*	Fix pg_xlogdump so that it handles cross-page XLP_FIRST_IS_CONTRECORD record.	Fujii Masao	2016-08-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously pg_xlogdump failed to dump the contents of the WAL file if the file starts with the continuation WAL record which spans more than one pages. Since pg_xlogdump assumed that the continuation record always fits on a page, it could not find the valid WAL record to start reading from in that case. This patch changes pg_xlogdump so that it can handle a continuation WAL record which crosses a page boundary and find the valid record to start reading from. Back-patch to 9.3 where pg_xlogdump was introduced. Author: Pavan Deolasee Reviewed-By: Michael Paquier and Craig Ringer Discussion: CABOikdPsPByMiG6J01DKq6om2+BNkxHTPkOyqHM2a4oYwGKsqQ@mail.gmail.com
*	Add macros to make AllocSetContextCreate() calls simpler and safer.	Tom Lane	2016-08-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>
*	Fix potential memory leakage from HandleParallelMessages().	Tom Lane	2016-08-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HandleParallelMessages leaked memory into the caller's context. Since it's called from ProcessInterrupts, there is basically zero certainty as to what CurrentMemoryContext is, which means we could be leaking into long-lived contexts. Over the processing of many worker messages that would grow to be a problem. Things could be even worse than just a leak, if we happened to service the interrupt while ErrorContext is current: elog.c thinks it can reset that on its own whim, possibly yanking storage out from under HandleParallelMessages. Give HandleParallelMessages its own dedicated context instead, which we can reset during each call to ensure there's no accumulation of wasted memory. Discussion: <16610.1472222135@sss.pgh.pa.us>
*	Fix logic for adding "parallel worker" context line to worker errors.	Tom Lane	2016-08-26
\| \| \| \| \| \| \| \| \|	The previous coding here was capable of adding a "parallel worker" context line to errors that were not, in fact, returned from a parallel worker. Instead of using an errcontext callback to add that annotation, just paste it onto the message by hand; this looks uglier but is more reliable. Discussion: <19757.1472151987@sss.pgh.pa.us>
*	Fix small query-lifespan memory leak in bulk updates.	Tom Lane	2016-08-24
\| \| \| \| \| \| \| \| \| \| \| \| \|	When there is an identifiable REPLICA IDENTITY index on the target table, heap_update leaks the id_attrs bitmapset. That's not many bytes, but it adds up over enough rows, since the code typically runs in a query-lifespan context. Bug introduced in commit e55704d8b, which did a rather poor job of cloning the existing use-pattern for RelationGetIndexAttrBitmap(). Per bug #14293 from Zhou Digoal. Back-patch to 9.4 where the bug was introduced. Report: <20160824114320.15676.45171@wrigleys.postgresql.org>
*	Suppress compiler warnings in non-cassert builds.	Tom Lane	2016-08-23
\| \| \| \| \| \|	With Asserts off, these variables are set but never used, resulting in warnings from pickier compilers. Fix that with our standard solution. Per report from Jeff Janes.
*	Improve SP-GiST opclass API to better support unlabeled nodes.	Tom Lane	2016-08-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the spgSplitTuple action could only create a new upper tuple containing a single labeled node. This made it useless for opclasses that prefer to work with fixed sets of nodes (labeled or otherwise), which meant that restrictive prefixes could not be used with such node definitions. Change the output field set for the choose() method to allow it to specify any valid node set for the new upper tuple, and to specify which of these nodes to place the modified lower tuple in. In addition to its primary use for fixed node sets, this feature could allow existing opclasses that use variable node sets to skip a separate spgAddNode action when splitting a tuple, by setting up the node needed for the incoming value as part of the spgSplitTuple action. However, care would have to be taken to add the extra node only when it would not make the tuple bigger than before. (spgAddNode can enlarge the tuple, spgSplitTuple can't.) This is a prerequisite for an upcoming SP-GiST inet opclass, but is being committed separately to increase the visibility of the API change. In passing, improve the documentation about the traverse-values feature that was added by commit ccd6eb49a. Emre Hasegeli, with cosmetic adjustments and documentation rework by me Discussion: <CAE2gYzxtth9qatW_OAqdOjykS0bxq7AYHLuyAQLPgT7H9ZU0Cw@mail.gmail.com>
*	Fix deletion of speculatively inserted TOAST on conflict	Andres Freund	2016-08-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	INSERT .. ON CONFLICT runs a pre-check of the possible conflicting constraints before performing the actual speculative insertion. In case the inserted tuple included TOASTed columns the ON CONFLICT condition would be handled correctly in case the conflict was caught by the pre-check, but if two transactions entered the speculative insertion phase at the same time, one would have to re-try, and the code for aborting a speculative insertion did not handle deleting the speculatively inserted TOAST datums correctly. TOAST deletion would fail with "ERROR: attempted to delete invisible tuple" as we attempted to remove the TOAST tuples using simple_heap_delete which reasoned that the given tuples should not be visible to the command that wrote them. This commit updates the heap_abort_speculative() function which aborts the conflicting tuple to use itself, via toast_delete, for deleting associated TOAST datums. Like before, the inserted toast rows are not marked as being speculative. This commit also adds a isolationtester spec test, exercising the relevant code path. Unfortunately 9.5 cannot handle two waiting sessions, and thus cannot execute this test. Reported-By: Viren Negi, Oskari Saarenmaa Author: Oskari Saarenmaa, edited a bit by me Bug: #14150 Discussion: <20160519123338.12513.20271@wrigleys.postgresql.org> Backpatch: 9.5, where ON CONFLICT was introduced
*	Fix typos	Peter Eisentraut	2016-08-16
\| \| \| \|	From: Alexander Law <exclusion@gmail.com>
*	Final pgindent + perltidy run for 9.6.	Tom Lane	2016-08-15
\|
*	Add SQL-accessible functions for inspecting index AM properties.	Tom Lane	2016-08-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Per discussion, we should provide such functions to replace the lost ability to discover AM properties by inspecting pg_am (cf commit 65c5fcd35). The added functionality is also meant to displace any code that was looking directly at pg_index.indoption, since we'd rather not believe that the bit meanings in that field are part of any client API contract. As future-proofing, define the SQL API to not assume that properties that are currently AM-wide or index-wide will remain so unless they logically must be; instead, expose them only when inquiring about a specific index or even specific index column. Also provide the ability for an index AM to override the behavior. In passing, document pg_am.amtype, overlooked in commit 473b93287. Andrew Gierth, with kibitzing by me and others Discussion: <87mvl5on7n.fsf@news-spur.riddles.org.uk>
*	In B-tree page deletion, clean up properly after page deletion failure.	Tom Lane	2016-08-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In _bt_unlink_halfdead_page(), we might fail to find an immediate left sibling of the target page, perhaps because of corruption of the page sibling links. The code intends to cope with this by just abandoning the deletion attempt; but what actually happens is that it fails outright due to releasing the same buffer lock twice. (And error recovery masks a second problem, which is possible leakage of a pin on another page.) Seems to have been introduced by careless refactoring in commit efada2b8e. Since there are multiple cases to consider, let's make releasing the buffer lock in the failure case the responsibility of _bt_unlink_halfdead_page() not its caller. Also, avoid fetching the leaf page's left-link again after we've dropped lock on the page. This is probably harmless, but it's not exactly good coding practice. Per report from Kyotaro Horiguchi. Back-patch to 9.4 where the faulty code was introduced. Discussion: <20160803.173116.111915228.horiguchi.kyotaro@lab.ntt.co.jp>
*	Change InitToastSnapshot to a macro.	Robert Haas	2016-08-05
\| \| \| \| \| \| \| \| \|	tqual.h is included in some front-end compiles, and a static inline breaks on buildfarm member castoroides. Since the macro is never referenced, it should dodge that problem, although this doesn't seem like the cleanest way of hiding things from front-end compiles. Report and review by Tom Lane; patch by me.
*	Fix hard to hit race condition in heapam's tuple locking code.	Andres Freund	2016-08-04
\| \| \| \| \| \| \| \| \| \| \| \|	As mentioned in its commit message, eca0f1db left open a race condition, where a page could be marked all-visible, after the code checked PageIsAllVisible() to pin the VM, but before the page is locked. Plug that hole. Reviewed-By: Robert Haas, Andres Freund Author: Amit Kapila Discussion: CAEepm=3fWAbWryVW9swHyLTY4sXVf0xbLvXqOwUoDiNCx9mBjQ@mail.gmail.com Backpatch: -
*	Prevent "snapshot too old" from trying to return pruned TOAST tuples.	Robert Haas	2016-08-03
\| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we tested for MVCC snapshots to see whether they were too old, but not TOAST snapshots, which can lead to complaints about missing TOAST chunks if those chunks are subject to early pruning. Ideally, the threshold lsn and timestamp for a TOAST snapshot would be that of the corresponding MVCC snapshot, but since we have no way of deciding which MVCC snapshot was used to fetch the TOAST pointer, use the oldest active or registered snapshot instead. Reported by Andres Freund, who also sketched out what the fix should look like. Patch by me, reviewed by Amit Kapila.
*	Block interrupts during HandleParallelMessages().	Tom Lane	2016-08-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noted by Alvaro, there are CHECK_FOR_INTERRUPTS() calls in the shm_mq.c functions called by HandleParallelMessages(). I believe they're all unreachable since we always pass nowait = true, but it doesn't seem like a great idea to assume that no such call will ever be reachable from HandleParallelMessages(). If that did happen, there would be a risk of a recursive call to HandleParallelMessages(), which it does not appear to be designed for --- for example, there's nothing that would prevent out-of-order processing of received messages. And certainly such cases cannot easily be tested. So let's prevent it by holding off interrupts for the duration of the function. Back-patch to 9.5 which contains identical code. Discussion: <14869.1470083848@sss.pgh.pa.us>
*	Minor cleanup for access/transam/parallel.c.	Tom Lane	2016-08-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ParallelMessagePending must be marked volatile, because it's set by a signal handler. On the other hand, it's pointless for HandleParallelMessageInterrupt to save/restore errno; that must be, and is, done at the outer level of the SIGUSR1 signal handler. Calling CHECK_FOR_INTERRUPTS() inside HandleParallelMessages, which itself is called from CHECK_FOR_INTERRUPTS(), seems both useless and hazardous. The comment claiming that this is needed to handle the error queue going away is certainly misguided, in any case. Improve a couple of error message texts, and use ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE to report loss of parallel worker connection, since that's what's used in e.g. tqueue.c. (Maybe it would be worth inventing a dedicated ERRCODE for this type of failure? But I do not think ERRCODE_INTERNAL_ERROR is appropriate.) Minor stylistic cleanups.
*	Message style improvements	Peter Eisentraut	2016-07-28
\|
*	Remove unused structure member.	Robert Haas	2016-07-21
\| \| \| \|	Michael Paquier
*	Clear all-frozen visibilitymap status when locking tuples.	Andres Freund	2016-07-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since a892234 & fd31cd265 the visibilitymap's freeze bit is used to avoid vacuuming the whole relation in anti-wraparound vacuums. Doing so correctly relies on not adding xids to the heap without also unsetting the visibilitymap flag. Tuple locking related code has not done so. To allow selectively resetting all-frozen - to avoid pessimizing heap_lock_tuple - allow to selectively reset the all-frozen with visibilitymap_clear(). To avoid having to use visibilitymap_get_status (e.g. via VM_ALL_FROZEN) inside a critical section, have visibilitymap_clear() return whether any bits have been reset. There's a remaining issue (denoted by XXX): After the PageIsAllVisible() check in heap_lock_tuple() and heap_lock_updated_tuple_rec() the page status could theoretically change. Practically that currently seems impossible, because updaters will hold a page level pin already. Due to the next beta coming up, it seems better to get the required WAL magic bump done before resolving this issue. The added flags field fields to xl_heap_lock and xl_heap_lock_updated require bumping the WAL magic. Since there's already been a catversion bump since the last beta, that's not an issue. Reviewed-By: Robert Haas, Amit Kapila and Andres Freund Author: Masahiko Sawada, heavily revised by Andres Freund Discussion: CAEepm=3fWAbWryVW9swHyLTY4sXVf0xbLvXqOwUoDiNCx9mBjQ@mail.gmail.com Backpatch: -
*	Add regression test case exercising the sorting path for hash index build.	Tom Lane	2016-07-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've broken this code path at least twice in the past, so it's prudent to have a test case that covers it. To allow exercising the code path without creating a very large (and slow to run) test case, redefine the sort threshold to be bounded by maintenance_work_mem as well as the number of available buffers. While at it, fix an ancient oversight that when building a temp index, the number of available buffers is not NBuffers but NLocBuffer. Also, if assertions are enabled, apply a direct test that the sort actually does return the tuples in the expected order. Peter Geoghegan Patch: <CAM3SWZTBAo4hjbBd780+MrOKiKp_TMo1N3A0Rw9_im8gbD7fQA@mail.gmail.com>
*	Fix torn-page, unlogged xid and further risks from heap_update().	Andres Freund	2016-07-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When heap_update needs to look for a page for the new tuple version, because the current one doesn't have sufficient free space, or when columns have to be processed by the tuple toaster, it has to release the lock on the old page during that. Otherwise there'd be lock ordering and lock nesting issues. To avoid concurrent sessions from trying to update / delete / lock the tuple while the page's content lock is released, the tuple's xmax is set to the current session's xid. That unfortunately was done without any WAL logging, thereby violating the rule that no XIDs may appear on disk, without an according WAL record. If the database were to crash / fail over when the page level lock is released, and some activity lead to the page being written out to disk, the xid could end up being reused; potentially leading to the row becoming invisible. There might be additional risks by not having t_ctid point at the tuple itself, without having set the appropriate lock infomask fields. To fix, compute the appropriate xmax/infomask combination for locking the tuple, and perform WAL logging using the existing XLOG_HEAP_LOCK record. That allows the fix to be backpatched. This issue has existed for a long time. There appears to have been partial attempts at preventing dangers, but these never have fully been implemented, and were removed a long time ago, in 11919160 (cf. HEAP_XMAX_UNLOGGED). In master / 9.6, there's an additional issue, namely that the visibilitymap's freeze bit isn't reset at that point yet. Since that's a new issue, introduced only in a892234f830, that'll be fixed in a separate commit. Author: Masahiko Sawada and Andres Freund Reported-By: Different aspects by Thomas Munro, Noah Misch, and others Discussion: CAEepm=3fWAbWryVW9swHyLTY4sXVf0xbLvXqOwUoDiNCx9mBjQ@mail.gmail.com Backpatch: 9.1/all supported versions
*	Make HEAP_LOCK/HEAP2_LOCK_UPDATED replay reset HEAP_XMAX_INVALID.	Andres Freund	2016-07-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	0ac5ad5 started to compress infomask bits in WAL records. Unfortunately the replay routines for XLOG_HEAP_LOCK/XLOG_HEAP2_LOCK_UPDATED forgot to reset the HEAP_XMAX_INVALID (and some other) hint bits. Luckily that's not problematic in the majority of cases, because after a crash/on a standby row locks aren't meaningful. Unfortunately that does not hold true in the presence of prepared transactions. This means that after a crash, or after promotion, row level locks held by a prepared, but not yet committed, prepared transaction might not be enforced. Discussion: 20160715192319.ubfuzim4zv3rqnxv@alap3.anarazel.de Backpatch: 9.3, the oldest branch on which 0ac5ad5 is present.
*	Avoid serializability errors when locking a tuple with a committed update	Alvaro Herrera	2016-07-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When key-share locking a tuple that has been not-key-updated, and the update is a committed transaction, in some cases we raised serializability errors: ERROR: could not serialize access due to concurrent update Because the key-share doesn't conflict with the update, the error is unnecessary and inconsistent with the case that the update hasn't committed yet. This causes problems for some usage patterns, even if it can be claimed that it's sufficient to retry the aborted transaction: given a steady stream of updating transactions and a long locking transaction, the long transaction can be starved indefinitely despite multiple retries. To fix, we recognize that HeapTupleSatisfiesUpdate can return HeapTupleUpdated when an updating transaction has committed, and that we need to deal with that case exactly as if it were a non-committed update: verify whether the two operations conflict, and if not, carry on normally. If they do conflict, however, there is a difference: in the HeapTupleBeingUpdated case we can just sleep until the concurrent transaction is gone, while in the HeapTupleUpdated case this is not possible and we must raise an error instead. Per trouble report from Olivier Dony. In addition to a couple of test cases that verify the changed behavior, I added a test case to verify the behavior that remains unchanged, namely that errors are raised when a update that modifies the key is used. That must still generate serializability errors. One pre-existing test case changes behavior; per discussion, the new behavior is actually the desired one. Discussion: https://www.postgresql.org/message-id/560AA479.4080807@odoo.com https://www.postgresql.org/message-id/20151014164844.3019.25750@wrigleys.postgresql.org Backpatch to 9.3, where the problem appeared.
*	Fix GiST index build for NaN values in geometric types.	Tom Lane	2016-07-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GiST index build could go into an infinite loop when presented with boxes (or points, circles or polygons) containing NaN component values. This happened essentially because the code assumed that x == x is true for any "double" value x; but it's not true for NaNs. The looping behavior was not the only problem though: we also attempted to sort the items using simple double comparisons. Since NaNs violate the trichotomy law, qsort could (in principle at least) get arbitrarily confused and mess up the sorting of ordinary values as well as NaNs. And we based splitting choices on box size calculations that could produce NaNs, again resulting in undesirable behavior. To fix, replace all comparisons of doubles in this logic with float8_cmp_internal, which is NaN-aware and is careful to sort NaNs consistently, higher than any non-NaN. Also rearrange the box size calculation to not produce NaNs; instead it should produce an infinity for a box with NaN on one side and not-NaN on the other. I don't by any means claim that this solves all problems with NaNs in geometric values, but it should at least make GiST index insertion work reliably with such data. It's likely that the index search side of things still needs some work, and probably regular geometric operations too. But with this patch we're laying down a convention for how such cases ought to behave. Per bug #14238 from Guang-Dih Lei. Back-patch to 9.2; the code used before commit 7f3bd86843e5aad8 is quite different and doesn't lock up on my simple test case, nor on the submitter's dataset. Report: <20160708151747.1426.60150@wrigleys.postgresql.org> Discussion: <28685.1468246504@sss.pgh.pa.us>
*	Fix start WAL filename for concurrent backups from standby	Magnus Hagander	2016-07-11
\| \| \| \| \| \| \| \| \| \|	On a standby, ThisTimelineID is always 0, so we would generate a filename in timeline 0 even for other timelines. Instead, use starttli which we have retreived from the controlfile. Report by: Francesco Canovai in bug #14230 Author: Marco Nenciarini Reviewed by: Michael Paquier and Amit Kapila
*	Fix several mistakes around parallel workers and client_encoding.	Robert Haas	2016-06-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, workers sent data to the leader using the client encoding. That mostly worked, but the leader the converted the data back to the server encoding. Since not all encoding conversions are reversible, that could provoke failures. Fix by using the database encoding for all communication between worker and leader. Also, while temporary changes to GUC settings, as from the SET clause of a function, are in general OK for parallel query, changing client_encoding this way inside of a parallel worker is not OK. Previously, that would have confused the leader; with these changes, it would not confuse the leader, but it wouldn't do anything either. So refuse such changes in parallel workers. Also, the previous code naively assumed that when it received a NotifyResonse from the worker, it could pass that directly back to the user. But now that worker-to-leader communication always uses the database encoding, that's clearly no longer correct - though, actually, the old way was always broken for V2 clients. So disassemble and reconstitute the message instead. Issues reported by Peter Eisentraut. Patch by me, reviewed by Peter Eisentraut.
*	Remove unused arguments in two GiST subroutines	Alvaro Herrera	2016-06-28
\| \| \| \| \| \| \|	These arguments became unused in commit 2c03216d8311. Noticed while skimming code for unrelated development. This is cosmetic, so no backpatch.
*	Fix handling of multixacts predating pg_upgrade	Alvaro Herrera	2016-06-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After pg_upgrade, it is possible that some tuples' Xmax have multixacts corresponding to the old installation; such multixacts cannot have running members anymore. In many code sites we already know not to read them and clobber them silently, but at least when VACUUM tries to freeze a multixact or determine whether one needs freezing, there's an attempt to resolve it to its member transactions by calling GetMultiXactIdMembers, and if the multixact value is "in the future" with regards to the current valid multixact range, an error like this is raised: ERROR: MultiXactId 123 has not been created yet -- apparent wraparound and vacuuming fails. Per discussion with Andrew Gierth, it is completely bogus to try to resolve multixacts coming from before a pg_upgrade, regardless of where they stand with regards to the current valid multixact range. It's possible to get from under this problem by doing SELECT FOR UPDATE of the problem tuples, but if tables are large, this is slow and tedious, so a more thorough solution is desirable. To fix, we realize that multixacts in xmax created in 9.2 and previous have a specific bit pattern that is never used in 9.3 and later (we already knew this, per comments and infomask tests sprinkled in various places, but we weren't leveraging this knowledge appropriately). Whenever the infomask of the tuple matches that bit pattern, we just ignore the multixact completely as if Xmax wasn't set; or, in the case of tuple freezing, we act as if an unwanted value is set and clobber it without decoding. This guarantees that no errors will be raised, and that the values will be progressively removed until all tables are clean. Most callers of GetMultiXactIdMembers are patched to recognize directly that the value is a removable "empty" multixact and avoid calling GetMultiXactIdMembers altogether. To avoid changing the signature of GetMultiXactIdMembers() in back branches, we keep the "allow_old" boolean flag but rename it to "from_pgupgrade"; if the flag is true, we always return an empty set instead of looking up the multixact. (I suppose we could remove the argument in the master branch, but I chose not to do so in this commit). This was broken all along, but the error-facing message appeared first because of commit 8e9a16ab8f7f and was partially fixed in a25c2b7c4db3. This fix, backpatched all the way back to 9.3, goes approximately in the same direction as a25c2b7c4db3 but should cover all cases. Bug analysis by Andrew Gierth and Álvaro Herrera. A number of public reports match this bug: https://www.postgresql.org/message-id/20140330040029.GY4582@tamriel.snowman.net https://www.postgresql.org/message-id/538F3D70.6080902@publicrelay.com https://www.postgresql.org/message-id/556439CF.7070109@pscs.co.uk https://www.postgresql.org/message-id/SG2PR06MB0760098A111C88E31BD4D96FB3540@SG2PR06MB0760.apcprd06.prod.outlook.com https://www.postgresql.org/message-id/20160615203829.5798.4594@wrigleys.postgresql.org
*	Fix building of large (bigger than shared_buffers) hash indexes.	Tom Lane	2016-06-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the index is predicted to need more than NBuffers buckets, CREATE INDEX attempts to sort the index entries by hash key before insertion, so as to reduce thrashing. This code path got broken by commit 9f03ca915196dfc8, which overlooked that _hash_form_tuple() is not just an alias for index_form_tuple(). The index got built anyway, but with garbage data, so that searches for pre-existing tuples always failed. Fix by refactoring to separate construction of the indexable data from calling index_form_tuple(). Per bug #14210 from Daniel Newman. Back-patch to 9.5 where the bug was introduced. Report: <20160623162507.17237.39471@wrigleys.postgresql.org>
*	Finish up XLOG_HINT renaming	Alvaro Herrera	2016-06-17
\| \| \| \| \| \| \|	Commit b8fd1a09f3 renamed XLOG_HINT to XLOG_FPI, but neglected two places. Backpatch to 9.3, like that commit.
*	pg_visibility: Add pg_truncate_visibility_map function.	Robert Haas	2016-06-17
\| \| \| \| \| \| \| \|	This requires some core changes as well so that we can properly WAL-log the truncation. Specifically, it changes the format of the XLOG_SMGR_TRUNCATE WAL record, so bump XLOG_PAGE_MAGIC. Patch by me, reviewed but not fully endorsed by Andres Freund.
*	Fix typo.	Robert Haas	2016-06-17
\| \| \| \|	Thomas Munro
*	Remove PID from 'parallel worker' context message.	Robert Haas	2016-06-17
\| \| \| \|	Discussion: <bfd204ab-ab1a-792a-b345-0274a09a4b5f@2ndquadrant.com>
*	Fix fuzzy thinking in ReinitializeParallelDSM().	Tom Lane	2016-06-16
\| \| \| \| \| \| \| \| \|	The fact that no workers were successfully launched in the previous iteration does not excuse us from setting up properly to try again. This appears to explain crashes I saw in parallel regression testing due to error_mqh being NULL when it shouldn't be. Minor other cosmetic fixes too.
*	Fix lazy_scan_heap so that it won't mark pages all-frozen too soon.	Robert Haas	2016-06-15
\| \| \| \| \| \| \| \| \| \| \| \| \|	Commit a892234f830e832110f63fc0a2afce2fb21d1584 added a new bit per page to the visibility map fork indicating whether the page is all-frozen, but incorrectly assumed that if lazy_scan_heap chose to freeze a tuple then that tuple would not need to later be frozen again. This turns out to be false, because xmin and xmax (and conceivably xvac, if dealing with tuples from very old releases) could be frozen at separate times. Thanks to Andres Freund for help in uncovering and tracking down this issue.
*	Improve the situation for parallel query versus temp relations.	Tom Lane	2016-06-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Transmit the leader's temp-namespace state to workers. This is important because without it, the workers do not really have the same search path as the leader. For example, there is no good reason (and no extant code either) to prevent a worker from executing a temp function that the leader created previously; but as things stood it would fail to find the temp function, and then either fail or execute the wrong function entirely. We still prohibit a worker from creating a temp namespace on its own. In effect, a worker can only see the session's temp namespace if the leader had created it before starting the worker, which seems like the right semantics. Also, transmit the leader's BackendId to workers, and arrange for workers to use that when determining the physical file path of a temp relation belonging to their session. While the original intent was to prevent such accesses entirely, there were a number of holes in that, notably in places like dbsize.c which assume they can safely access temp rels of other sessions anyway. We might as well get this right, as a small down payment on someday allowing workers to access the leader's temp tables. (With this change, directly using "MyBackendId" as a relation or buffer backend ID is deprecated; you should use BackendIdForTempRelations() instead. I left a couple of such uses alone though, as they're not going to be reachable in parallel workers until we do something about localbuf.c.) Move the thou-shalt-not-access-thy-leader's-temp-tables prohibition down into localbuf.c, which is where it actually matters, instead of having it in relation_open(). This amounts to recognizing that access to temp tables' catalog entries is perfectly safe in a worker, it's only the data in local buffers that is problematic. Having done all that, we can get rid of the test in has_parallel_hazard() that says that use of a temp table's rowtype is unsafe in parallel workers. That test was unduly expensive, and if we really did need such a prohibition, that was not even close to being a bulletproof guard for it. (For example, any user-defined function executed in a parallel worker might have attempted such access.)
*	pgindent run for 9.6	Robert Haas	2016-06-09
\|
*	Eliminate "parallel degree" terminology.	Robert Haas	2016-06-09
\| \| \| \| \| \| \| \| \| \| \| \|	This terminology provoked widespread complaints. So, instead, rename the GUC max_parallel_degree to max_parallel_workers_per_gather (leaving room for a possible future GUC max_parallel_workers that acts as a system-wide limit), and rename the parallel_degree reloption to parallel_workers. Rename structure members to match. These changes create a dump/restore hazard for users of PostgreSQL 9.6beta1 who have set the reloption (or applied the GUC using ALTER USER or ALTER DATABASE).
*	Message style and wording fixes	Peter Eisentraut	2016-06-07
\|
*	Stop the executor if no more tuples can be sent from worker to leader.	Robert Haas	2016-06-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a Gather node has read as many tuples as it needs (for example, due to Limit) it may detach the queue connecting it to the worker before reading all of the worker's tuples. Rather than let the worker continue to generate and send all of the results, have it stop after sending the next tuple. More could be done here to stop the worker even quicker, but this is about as well as we can hope to do for 9.6. This is in response to a problem report from Andreas Seltenreich. Commit 44339b892a04e94bbb472235882dc6f7023bdc65 should be actually be sufficient to fix that example even without this change, but it seems better to do this, too, since we might otherwise waste quite a large amount of effort in one or more workers. Discussion: CAA4eK1KOKGqmz9bGu+Z42qhRwMbm4R5rfnqsLCNqFs9j14jzEA@mail.gmail.com Amit Kapila
*	Fix typo.	Robert Haas	2016-06-06
\| \| \| \|	Jim Nasby
*	Fix various common mispellings.	Greg Stark	2016-06-03
\| \| \| \| \| \| \| \| \| \|	Mostly these are just comments but there are a few in documentation and a handful in code and tests. Hopefully this doesn't cause too much unnecessary pain for backpatching. I relented from some of the most common like "thru" for that reason. The rest don't seem numerous enough to cause problems. Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings
*	Cosmetic improvements to freeze map code.	Robert Haas	2016-06-03
\| \| \| \| \| \| \|	Per post-commit review comments from Andres Freund, improve variable names, comments, and in one place, slightly improve the code structure. Masahiko Sawada
*	Fix btree mark/restore bug.	Kevin Grittner	2016-06-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 2ed5b87f96d473962ec5230fd820abfeaccb2069 introduced a bug in mark/restore, in an attempt to optimize repeated restores to the same page. This caused an assertion failure during a merge join which fed directly from an index scan, although the impact would not be limited to that case. Revert the bad chunk of code from that commit. While investigating this bug it was discovered that a particular "paranoia" set of the mark position field would not prevent bad behavior; it would just make it harder to diagnose. Change that into an assertion, which will draw attention to any future problem in that area more directly. Backpatch to 9.5, where the bug was introduced. Bug #14169 reported by Shinta Koyanagi. Preliminary analysis by Tom Lane identified which commit caused the bug.
*	Fix PageAddItem BRIN bug	Alvaro Herrera	2016-05-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BRIN was relying on the ability to remove a tuple from an index page, then putting another tuple in the same line pointer. But PageAddItem refuses to add a tuple beyond the first free item past the last used item, and in particular, it rejects an attempt to add an item to an empty page anywhere other than the first line pointer. PageAddItem issues a WARNING and indicates to the caller that it failed, which in turn causes the BRIN calling code to issue a PANIC, so the whole sequence looks like this: WARNING: specified item offset is too large PANIC: failed to add BRIN tuple To fix, create a new function PageAddItemExtended which is like PageAddItem except that the two boolean arguments become a flags bitmap; the "overwrite" and "is_heap" boolean flags in PageAddItem become PAI_OVERWITE and PAI_IS_HEAP flags in the new function, and a new flag PAI_ALLOW_FAR_OFFSET enables the behavior required by BRIN. PageAddItem() retains its original signature, for compatibility with third-party modules (other callers in core code are not modified, either). Also, in the belt-and-suspenders spirit, I added a new sanity check in brinGetTupleForHeapBlock to raise an error if an TID found in the revmap is not marked as live by the page header. This causes it to react with "ERROR: corrupted BRIN index" to the bug at hand, rather than a hard crash. Backpatch to 9.5. Bug reported by Andreas Seltenreich as detected by his handy sqlsmith fuzzer. Discussion: https://www.postgresql.org/message-id/87mvni77jh.fsf@elite.ansel.ydns.eu