postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
...
*	Avoid copying index tuples when building an index.	Robert Haas	2014-07-01
\| \| \| \| \| \| \| \| \| \| \| \|	The previous code, perhaps out of concern for avoid memory leaks, formed the tuple in one memory context and then copied it to another memory context. However, this doesn't appear to be necessary, since index_form_tuple and the functions it calls take precautions against leaking memory. In my testing, building the tuple directly inside the sort context shaves several percent off the index build time. Rearrange things so we do that. Patch by me. Review by Amit Kapila, Tom Lane, Andres Freund.
*	Check interrupts during logical decoding more frequently.	Andres Freund	2014-06-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When reading large amounts of preexisting WAL during logical decoding using the SQL interface we possibly could fail to check interrupts in due time. Similarly the same could happen on systems with a very high WAL volume while creating a new logical replication slot, independent of the used interface. Previously these checks where only performed in xlogreader's read_page callbacks, while waiting for new WAL to be produced. That's not sufficient though, if there's never a need to wait. Walsender's send loop already contains a interrupt check. Backpatch to 9.4 where the logical decoding feature was introduced.
*	Fix and enhance the assertion of no palloc's in a critical section.	Heikki Linnakangas	2014-06-30
\| \| \| \| \| \| \| \| \| \| \| \|	The assertion failed if WAL_DEBUG or LWLOCK_STATS was enabled; fix that by using separate memory contexts for the allocations made within those code blocks. This patch introduces a mechanism for marking any memory context as allowed in a critical section. Previously ErrorContext was exempt as a special case. Instead of a blanket exception of the checkpointer process, only exempt the memory context used for the pending ops hash table.
*	Remove use_json_as_text options from json_to_record/json_populate_record.	Tom Lane	2014-06-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "false" case was really quite useless since all it did was to throw an error; a definition not helped in the least by making it the default. Instead let's just have the "true" case, which emits nested objects and arrays in JSON syntax. We might later want to provide the ability to emit sub-objects in Postgres record or array syntax, but we'd be best off to drive that off a check of the target field datatype, not a separate argument. For the functions newly added in 9.4, we can just remove the flag arguments outright. We can't do that for json_populate_record[set], which already existed in 9.3, but we can ignore the argument and always behave as if it were "true". It helps that the flag arguments were optional and not documented in any useful fashion anyway.
*	Add cluster_name GUC which is included in process titles if set.	Andres Freund	2014-06-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When running several postgres clusters on one OS instance it's often inconveniently hard to identify which "postgres" process belongs to which postgres instance. Add the cluster_name GUC, whose value will be included as part of the process titles if set. With that processes can more easily identified using tools like 'ps'. To avoid problems with encoding mismatches between postgresql.conf, consoles, and individual databases replace non-ASCII chars in the name with question marks. The length is limited to NAMEDATALEN to make it less likely to truncate important information at the end of the status. Thomas Munro, with some adjustments by me and review by a host of people.
*	Remove Alpha and Tru64 support.	Andres Freund	2014-06-28
\| \| \| \| \| \| \| \| \| \| \| \| \|	Support for running postgres on Alpha hasn't been tested for a long while. Due to Alpha's uniquely lax cache coherency model it's a hard to develop for platform (especially blindly!) and thought to be unlikely to currently work correctly. As Alpha is the only supported architecture for Tru64 drop support for it as well. Tru64's support has ended 2012 and it has been in maintenance-only mode for much longer. Also remove stray references to __ksr__ and ultrix defines.
*	Allow pushdown of WHERE quals into subqueries with window functions.	Tom Lane	2014-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can allow this even without any specific knowledge of the semantics of the window function, so long as pushed-down quals will either accept every row in a given window partition, or reject every such row. Because window functions act only within a partition, such a case can't result in changing the window functions' outputs for any surviving row. Eliminating entire partitions in this way obviously can reduce the cost of the window-function computations substantially. The fly in the ointment is that it's hard to be entirely sure whether this is true for an arbitrary qual condition. This patch allows pushdown if (a) the qual references only partitioning columns, and (b) the qual contains no volatile functions. We are at risk of incorrect results if the qual can produce different answers for values that the partitioning equality operator sees as equal. While it's not hard to invent cases for which that can happen, it seems to seldom be a problem in practice, since no one has complained about a similar assumption that we've had for many years with respect to DISTINCT. The potential performance gains seem to be worth the risk. David Rowley, reviewed by Vik Fearing; some credit is due also to Thomas Mayer who did considerable preliminary investigation.
*	Have multixact be truncated by checkpoint, not vacuum	Alvaro Herrera	2014-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of truncating pg_multixact at vacuum time, do it only at checkpoint time. The reason for doing it this way is twofold: first, we want it to delete only segments that we're certain will not be required if there's a crash immediately after the removal; and second, we want to do it relatively often so that older files are not left behind if there's an untimely crash. Per my proposal in http://www.postgresql.org/message-id/20140626044519.GJ7340@eldon.alvh.no-ip.org we now execute the truncation in the checkpointer process rather than as part of vacuum. Vacuum is in only charge of maintaining in shared memory the value to which it's possible to truncate the files; that value is stored as part of checkpoints also, and so upon recovery we can reuse the same value to re-execute truncate and reset the oldest-value-still-safe-to-use to one known to remain after truncation. Per bug reported by Jeff Janes in the course of his tests involving bug #8673. While at it, update some comments that hadn't been updated since multixacts were changed. Backpatch to 9.3, where persistency of pg_multixact files was introduced by commit 0ac5ad5134f2.
*	Don't allow relminmxid to go backwards during VACUUM FULL	Alvaro Herrera	2014-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \|	We were allowing a table's pg_class.relminmxid value to move backwards when heaps were swapped by VACUUM FULL or CLUSTER. There is a similar protection against relfrozenxid going backwards, which we neglected to clone when the multixact stuff was rejiggered by commit 0ac5ad5134f276. Backpatch to 9.3, where relminmxid was introduced. As reported by Heikki in http://www.postgresql.org/message-id/52401AEA.9000608@vmware.com
*	Fix broken Assert() introduced by 8e9a16ab8f7f0e58	Alvaro Herrera	2014-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't assert MultiXactIdIsRunning if the multi came from a tuple that had been share-locked and later copied over to the new cluster by pg_upgrade. Doing that causes an error to be raised unnecessarily: MultiXactIdIsRunning is not open to the possibility that its argument came from a pg_upgraded tuple, and all its other callers are already checking; but such multis cannot, obviously, have transactions still running, so the assert is pointless. Noticed while investigating the bogus pg_multixact/offsets/0000 file left over by pg_upgrade, as reported by Andres Freund in http://www.postgresql.org/message-id/20140530121631.GE25431@alap3.anarazel.de Backpatch to 9.3, as the commit that introduced the buglet.
*	Disallow pushing volatile qual expressions down into DISTINCT subqueries.	Tom Lane	2014-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A WHERE clause applied to the output of a subquery with DISTINCT should theoretically be applied only once per distinct row; but if we push it into the subquery then it will be evaluated at each row before duplicate elimination occurs. If the qual is volatile this can give rise to observably wrong results, so don't do that. While at it, refactor a little bit to allow subquery_is_pushdown_safe to report more than one kind of restrictive condition without indefinitely expanding its argument list. Although this is a bug fix, it seems unwise to back-patch it into released branches, since it might de-optimize plans for queries that aren't giving any trouble in practice. So apply to 9.4 but not further back.
*	Rationalize error messages within jsonfuncs.c.	Tom Lane	2014-06-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I noticed that the functions in jsonfuncs.c sometimes printed error messages that claimed I'd called some other function. Investigation showed that this was from repurposing code into "worker" functions without taking much care as to whether it would mention the right SQL-level function if it threw an error. Moreover, there was a weird mismash of messages that contained a fixed function name, messages that used %s for a function name, and messages that constructed a function name out of spare parts, like "json%s_populate_record" (which, quite aside from being ugly as sin, wasn't even sufficient to cover all the cases). This would put an undue burden on our long-suffering translators. Standardize on inserting the SQL function name with %s so as to reduce the number of translatable strings, and pass function names around as needed to make sure we can report the right one. Fix up some gratuitous variations in wording, too.
*	Cosmetic improvements in jsonfuncs.c.	Tom Lane	2014-06-25
\| \| \| \| \| \|	Re-pgindent, remove a lot of random vertical whitespace, remove useless (if not counterproductive) inline markings, get rid of unnecessary zero-padding of strings for hashtable searches. No functional changes.
*	Fix handling of nested JSON objects in json_populate_recordset and friends.	Tom Lane	2014-06-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	populate_recordset_object_start() improperly created a new hash table (overwriting the link to the existing one) if called at nest levels greater than one. This resulted in previous fields not appearing in the final output, as reported by Matti Hameister in bug #10728. In 9.4 the problem also affects json_to_recordset. This perhaps missed detection earlier because the default behavior is to throw an error for nested objects: you have to pass use_json_as_text = true to see the problem. In addition, fix query-lifespan leakage of the hashtable created by json_populate_record(). This is pretty much the same problem recently fixed in dblink: creating an intended-to-be-temporary context underneath the executor's per-tuple context isn't enough to make it go away at the end of the tuple cycle, because MemoryContextReset is not MemoryContextResetAndDeleteChildren. Michael Paquier and Tom Lane
*	Don't allow foreign tables with OIDs.	Heikki Linnakangas	2014-06-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The syntax doesn't let you specify "WITH OIDS" for foreign tables, but it was still possible with default_with_oids=true. But the rest of the system, including pg_dump, isn't prepared to handle foreign tables with OIDs properly. Backpatch down to 9.1, where foreign tables were introduced. It's possible that there are databases out there that already have foreign tables with OIDs. There isn't much we can do about that, but at least we can prevent them from being created in the future. Patch by Etsuro Fujita, reviewed by Hadi Moshayedi.
*	Check for interrupts during tuple-insertion loops.	Robert Haas	2014-06-23
\| \| \| \| \| \| \| \|	Normally, this won't matter too much; but if I/O is really slow, for example because the system is overloaded, we might write many pages before checking for interrupts. A single toast insertion might write up to 1GB of data, and a multi-insert could write hundreds of tuples (and their corresponding TOAST data).
*	Fix bug in WAL_DEBUG.	Heikki Linnakangas	2014-06-23
\| \| \| \| \|	The record header was not copied correctly to the buffer that was passed to the rm_desc function. Broken by my rm_desc signature refactoring patch.
*	Add Asserts to verify that catalog cache keys are unique and not null.	Tom Lane	2014-06-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The catcache code is effectively assuming this already, so let's insist that the catalog and index are actually declared that way. Having done that, the comments in indexing.h about non-unique indexes not being used for catcaches are completely redundant not just mostly so; and we didn't have such a comment for every such index anyway. So let's get rid of them. Per discussion of whether we should identify primary keys for catalogs. We might or might not take that further step, but this change in itself will allow quicker detection of misdeclared catcaches, so it seems worth doing in any case.
*	Do all-visible handling in lazy_vacuum_page() outside its critical section.	Andres Freund	2014-06-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since fdf9e21196a lazy_vacuum_page() rechecks the all-visible status of pages in the second pass over the heap. It does so inside a critical section, but both visibilitymap_test() and heap_page_is_all_visible() perform operations that should not happen inside one. The former potentially performs IO and both potentially do memory allocations. To fix, simply move all the all-visible handling outside the critical section. Doing so means that the PD_ALL_VISIBLE on the page won't be included in the full page image of the HEAP2_CLEAN record anymore. But that's fine, the flag will be set by the HEAP2_VISIBLE logged later. Backpatch to 9.3 where the problem was introduced. The bug only came to light due to the assertion added in 4a170ee9 and isn't likely to cause problems in production scenarios. The worst outcome is a avoidable PANIC restart. This also gets rid of the difference in the order of operations between master and standby mentioned in 2a8e1ac5. Per reports from David Leverton and Keith Fiske in bug #10533.
*	Don't allow to disable backend assertions via the debug_assertions GUC.	Andres Freund	2014-06-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existance of the assert_enabled variable (backing the debug_assertions GUC) reduced the amount of knowledge some static code checkers (like coverity and various compilers) could infer from the existance of the assertion. That could have been solved by optionally removing the assertion_enabled variable from the Assert() et al macros at compile time when some special macro is defined, but the resulting complication doesn't seem to be worth the gain from having debug_assertions. Recompiling is fast enough. The debug_assertions GUC is still available, but readonly, as it's useful when diagnosing problems. The commandline/client startup option -A, which previously also allowed to enable/disable assertions, has been removed as it doesn't serve a purpose anymore. While at it, reduce code duplication in bufmgr.c and localbuf.c assertions checking for spurious buffer pins. That code had to be reindented anyway to cope with the assert_enabled removal.
*	Avoid leaking memory while evaluating arguments for a table function.	Tom Lane	2014-06-19
\| \| \| \| \| \| \| \| \| \| \| \| \|	ExecMakeTableFunctionResult evaluated the arguments for a function-in-FROM in the query-lifespan memory context. This is insignificant in simple cases where the function relation is scanned only once; but if the function is in a sub-SELECT or is on the inside of a nested loop, any memory consumed during argument evaluation can add up quickly. (The potential for trouble here had been foreseen long ago, per existing comments; but we'd not previously seen a complaint from the field about it.) To fix, create an additional temporary context just for this purpose. Per an example from MauMau. Back-patch to all active branches.
*	Don't allow data_directory to be set in postgresql.auto.conf by ALTER SYSTEM.	Fujii Masao	2014-06-19
\| \| \| \| \| \| \| \| \| \|	data_directory could be set both in postgresql.conf and postgresql.auto.conf so far. This could cause some problematic situations like circular definition. To avoid such situations, this commit forbids a user to set data_directory in postgresql.auto.conf. Backpatch this to 9.4 where ALTER SYSTEM command was introduced. Amit Kapila, reviewed by Abhijit Menon-Sen, with minor adjustments by me.
*	Improve our mechanism for controlling the Linux out-of-memory killer.	Tom Lane	2014-06-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Arrange for postmaster child processes to respond to two environment variables, PG_OOM_ADJUST_FILE and PG_OOM_ADJUST_VALUE, to determine whether they reset their OOM score adjustments and if so to what. This is superior to the previous design involving #ifdef's in several ways. The behavior is now available in a default build, and both ends of the adjustment --- the original adjustment of the postmaster's level and the subsequent readjustment by child processes --- can now be controlled in one place, namely the postmaster launch script. So it's no longer necessary for the launch script to act on faith that the server was compiled with the appropriate options. In addition, if someone wants to use an OOM score other than zero for the child processes, that doesn't take a recompile anymore; and we no longer have to cater separately to the two different historical kernel APIs for this adjustment. Gurjeet Singh, somewhat revised by me
*	Remove unnecessary check for jbvBinary in convertJsonbValue.	Andrew Dunstan	2014-06-18
\| \| \| \| \| \| \|	The check was confusing and is a condition that should never in fact happen. Per gripe from Dmitry Dolgov.
*	Fix weird spacing in error message.	Tom Lane	2014-06-18
\| \| \| \|	Seems to have been introduced in 1a3458b6d8d202715a83c88474a1b63726d0929e.
*	Implement UPDATE tab SET (col1,col2,...) = (SELECT ...), ...	Tom Lane	2014-06-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This SQL-standard feature allows a sub-SELECT yielding multiple columns (but only one row) to be used to compute the new values of several columns to be updated. While the same results can be had with an independent sub-SELECT per column, such a workaround can require a great deal of duplicated computation. The standard actually says that the source for a multi-column assignment could be any row-valued expression. The implementation used here is tightly tied to our existing sub-SELECT support and can't handle other cases; the Bison grammar would have some issues with them too. However, I don't feel too bad about this since other cases can be converted into sub-SELECTs. For instance, "SET (a,b,c) = row_valued_function(x)" could be written "SET (a,b,c) = (SELECT * FROM row_valued_function(x))".
*	Avoid recursion when processing simple lists of AND'ed or OR'ed clauses.	Tom Lane	2014-06-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since most of the system thinks AND and OR are N-argument expressions anyway, let's have the grammar generate a representation of that form when dealing with input like "x AND y AND z AND ...", rather than generating a deeply-nested binary tree that just has to be flattened later by the planner. This avoids stack overflow in parse analysis when dealing with queries having more than a few thousand such clauses; and in any case it removes some rather unsightly inconsistencies, since some parts of parse analysis were generating N-argument ANDs/ORs already. It's still possible to get a stack overflow with weirdly parenthesized input, such as "x AND (y AND (z AND ( ... )))", but such cases are not mainstream usage. The maximum depth of parenthesization is already limited by Bison's stack in such cases, anyway, so that the limit is probably fairly platform-independent. Patch originally by Gurjeet Singh, heavily revised by me
*	Change the signature of rm_desc so that it's passed a XLogRecord.	Heikki Linnakangas	2014-06-14
\| \| \| \|	Just feels more natural, and is more consistent with rm_redo.
*	Improve predtest.c's ability to reason about operator expressions.	Tom Lane	2014-06-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have for a long time been able to prove implications and refutations between clauses structured like "expr op const" with the same subexpression and btree-related operators; for example that "x < 4" implies "x <= 5". The implication machinery is needed to detect usability of partial indexes, and the refutation machinery is needed to implement constraint exclusion. This patch extends that machinery to make proofs for operator expressions involving the same two immutable-but-not-necessarily-just-Const input expressions, ie does "expr1 op1 expr2" prove or refute "expr1 op2 expr2" or "expr2 op2 expr1"? An important example is that we can now prove "x = y" given "y = x", which formerly the code could not deduce unless x or y was a constant. We can make use of the system's knowledge of operator commutator and negator pairs, and can also make use of btree opclass relationships, for example "x < y" implies "x <= y" and refutes "x > y" (notice that neither of these could be proven just from commutator or negator links). Inspired by a gripe from Brian Dunavant. This seems more like a new feature than a bug fix, though, so no back-patch.
*	Improve tuplestore's error messages for I/O failures.	Tom Lane	2014-06-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We should report the errno when we get a failure from functions like BufFileWrite. "ERROR: write failed" is unreasonably taciturn for a case that's well within the realm of possibility; I've seen it a couple times in the buildfarm recently, in situations that were probably out-of-disk-space, but it'd be good to see the errno to confirm it. I think this code was originally written without assuming that the buffile.c functions would return useful errno; but most other callers are assuming that, and a quick look at the buffile code gives no reason to suppose otherwise. Also, a couple of the old messages were phrased on the assumption that a short read might indicate a logic bug in tuplestore itself; but that code's pretty well tested by now, so a filesystem-level problem seems much more likely.
*	Preserve exposed type of subquery outputs when substituting NULLs.	Tom Lane	2014-06-12
\| \| \| \| \|	I thought I could get away with hardcoded int4 here, but the buildfarm says differently.
*	Rename lo_create(oid, bytea) to lo_from_bytea().	Tom Lane	2014-06-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous naming broke the query that libpq's lo_initialize() uses to collect the OIDs of the server-side functions it requires, because that query effectively assumes that there is only one function named lo_create in the pg_catalog schema (and likewise only one lo_open, etc). While we should certainly make libpq more robust about this, the naive query will remain in use in the field for the foreseeable future, so it seems the only workable choice is to use a different name for the new function. lo_from_bytea() won a small straw poll. Back-patch into 9.4 where the new function was introduced.
*	Fix typos	Alvaro Herrera	2014-06-12
\|
*	Remove unnecessary output expressions from unflattened subqueries.	Tom Lane	2014-06-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a sub-select-in-FROM gets flattened into the upper query, then we naturally get rid of any output columns that are defined in the sub-select text but not actually used in the upper query. However, this doesn't happen when it's not possible to flatten the subquery, for example because it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless output columns is often fairly harmless, but sometimes it has significant performance cost: the unused output might be an expensive expression, or it might be a Var from a relation that we could remove entirely (via the join-removal logic) if only we realized that we didn't really need that Var. Situations like this are common when expanding views, so it seems worth taking the trouble to detect and remove unused outputs. Because the upper query's Var numbering for subquery references depends on positions in the subquery targetlist, we don't want to renumber the items we leave behind. Instead, we can implement "removal" by replacing the unwanted expressions with simple NULL constants. This wastes a few cycles at runtime, but not enough to justify more work in the planner.
*	Consistency improvements for slot and decoding code.	Andres Freund	2014-06-12
\| \| \| \| \| \| \| \|	Change the order of checks in similar functions to be the same; remove a parameter that's not needed anymore; rename a memory context and expand a couple of comments. Per review comments from Amit Kapila
*	Fix typos in comments.	Noah Misch	2014-06-11
\|
*	Fix typos in comments.	Fujii Masao	2014-06-11
\|
*	Fix ancient encoding error in hungarian.stop.	Tom Lane	2014-06-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we grabbed this file off the Snowball project's website, we mistakenly supposed that it was in LATIN1 encoding, but evidently it was actually in LATIN2. This resulted in ő (o-double-acute, U+0151, which is code 0xF5 in LATIN2) being misconverted into õ (o-tilde, U+00F5), as complained of in bug #10589 from Zoltán Sörös. We'd have messed up u-double-acute too, but there aren't any of those in the file. Other characters used in the file have the same codes in LATIN1 and LATIN2, which no doubt helped hide the problem for so long. The error is not only ours: the Snowball project also was confused about which encoding is required for Hungarian. But dealing with that will require source-code changes that I'm not at all sure we'll wish to back-patch. Fixing the stopword file seems reasonably safe to back-patch however.
*	Fix infinite loop when splitting inner tuples in SPGiST text indexes.	Tom Lane	2014-06-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the code used a node label of zero both for strings that contain no bytes beyond the inner tuple's prefix, and for cases where an "allTheSame" inner tuple has to be split to allow a string with a different next byte to be inserted into it. Failing to distinguish these cases meant that if a string ending with the current prefix needed to be inserted into an allTheSame tuple, we got into an infinite loop, because after splitting the tuple we'd descend into the child allTheSame tuple and then find we need to split again. To fix, instead use -1 and -2 as the node labels for these two cases. This requires widening the node label type from "char" to int2, but fortunately SPGiST stores all pass-by-value node label types in their Datum representation, which means that this change is transparently upward compatible so far as the on-disk representation goes. We continue to recognize zero as a dummy node label for reading purposes, but will not attempt to push new index entries down into such a label, so that the loop won't occur even when dealing with an existing index. Per report from Teodor Sigaev. Back-patch to 9.2 where the faulty code was introduced.
*	Wrap multixact/members correctly during extension, take 2	Alvaro Herrera	2014-06-09
\| \| \| \| \| \| \| \|	In a50d97625497b7 I already changed this, but got it wrong for the case where the number of members is larger than the number of entries that fit in the last page of the last segment. As reported by Serge Negodyuck in a followup to bug #8673.
*	Fix off-by-one in decoding causing one-record events to be skipped.	Andres Freund	2014-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A ReorderBufferTransaction's end_lsn, the sentPtr advocated by walsender keepalive messages, and the end location remembered by the decoding get_changes SQL functions all use the location of the last read record + 1. I.e. the LSN points to the beginning of the next record. That cannot realistically be changed without changing the replication protocol because that's how keepalive messages have worked since 9.0. The bug is that the logic inside the snapshot builder, which decides whether a transaction's contents should be decoded, assumed the start location would point towards the last byte of the last record. The reason this didn't actually cause visible problems is that currently that decision is only made for commit records. Since interesting transactions always have at least one additional record - containing actual data - we'd never skip a transaction. But if there ever were transactions, or other events, with just one record containing important information, we'd skip them after stopping and restarting logical decoding.
*	Add defenses against running with a wrong selection of LOBLKSIZE.	Tom Lane	2014-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's critical that the backend's idea of LOBLKSIZE match the way data has actually been divided up in pg_largeobject. While we don't provide any direct way to adjust that value, doing so is a one-line source code change and various people have expressed interest recently in changing it. So, just as with TOAST_MAX_CHUNK_SIZE, it seems prudent to record the value in pg_control and cross-check that the backend's compiled-in setting matches the on-disk data. Also tweak the code in inv_api.c so that fetches from pg_largeobject explicitly verify that the length of the data field is not more than LOBLKSIZE. Formerly we just had Asserts() for that, which is no protection at all in production builds. In some of the call sites an overlength data value would translate directly to a security-relevant stack clobber, so it seems worth one extra runtime comparison to be sure. In the back branches, we can't change the contents of pg_control; but we can still make the extra checks in inv_api.c, which will offer some amount of protection against running with the wrong value of LOBLKSIZE.
*	Consistently spell a replication slot's name as slot_name.	Andres Freund	2014-06-05
\| \| \| \| \| \| \| \| \| \| \|	Previously there's been a mix between 'slotname' and 'slot_name'. It's not nice to be unneccessarily inconsistent in a new feature. As a post beta1 initdb now is required in the wake of eeca4cd35e, fix the inconsistencies. Most the changes won't affect usage of replication slots because the majority of changes is around function parameter names. The prominent exception to that is that the recovery.conf parameter 'primary_slotname' is now named 'primary_slot_name'.
*	Adjust SP-GiST WAL record formats to reduce alignment padding.	Heikki Linnakangas	2014-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The way the code was written, the padding was copied from uninitialized memory areas.. Because the structs are local variables in the code where the WAL records are constructed, making them larger and zeroing the padding bytes would not make the code very pretty, so rather than fixing this directly by zeroing out the padding bytes, it seems more clear to not try to align the tuples in the WAL records. The redo functions are taught to copy the tuple header to a local variable to avoid unaligned access. Stable-branches have the same problem, but we can't change the WAL format there, so fix in master only. Reading a few random extra bytes at the stack is harmless in practice, so it's not worth crafting a different back-patchable fix. Per reports from Kevin Grittner and Andres Freund, using clang static analyzer and Valgrind, respectively.
*	Add btree and hash opclasses for pg_lsn.	Tom Lane	2014-06-04
\| \| \| \| \| \| \| \| \| \| \| \|	This is needed to allow ORDER BY, DISTINCT, etc to work as expected for pg_lsn values. We had previously decided to put this off for 9.5, but in view of commit eeca4cd35e284c72b2ea1b4494e64e7738896e81 there's no reason to avoid a catversion bump for 9.4beta2, and this does make a pretty significant usability difference for pg_lsn. Michael Paquier, with fixes from Andres Freund and Tom Lane
*	Fix longstanding bug in HeapTupleSatisfiesVacuum().	Andres Freund	2014-06-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HeapTupleSatisfiesVacuum() didn't properly discern between DELETE_IN_PROGRESS and INSERT_IN_PROGRESS for rows that have been inserted in the current transaction and deleted in a aborted subtransaction of the current backend. At the very least that caused problems for CLUSTER and CREATE INDEX in transactions that had aborting subtransactions producing rows, leading to warnings like: WARNING: concurrent delete in progress within table "..." possibly in an endless, uninterruptible, loop. Instead of treating InProgress xmins the same as IsCurrent ones, treat them as being distinct like the other visibility routines. As implemented this separatation can cause a behaviour change for rows that have been inserted and deleted in another, still running, transaction. HTSV will now return INSERT_IN_PROGRESS instead of DELETE_IN_PROGRESS for those. That's both, more in line with the other visibility routines and arguably more correct. The latter because a INSERT_IN_PROGRESS will make callers look at/wait for xmin, instead of xmax. The only current caller where that's possibly worse than the old behaviour is heap_prune_chain() which now won't mark the page as prunable if a row has concurrently been inserted and deleted. That's harmless enough. As a cautionary measure also insert a interrupt check before the gotos in IndexBuildHeapScan() that lead to the uninterruptible loop. There are other possible causes, like a row that several sessions try to update and all fail, for repeated loops and the cost of doing so in the retry case is low. As this bug goes back all the way to the introduction of subtransactions in 573a71a5da backpatch to all supported releases. Reported-By: Sandro Santilli
*	Save pg_stat_statements statistics file into $PGDATA/pg_stat directory at ↵	Fujii Masao	2014-06-04
\| \| \| \| \| \| \| \| \| \| \| \| \|	shutdown. 187492b6c2e8cafc5b39063ca3b67846e8155d24 changed pgstat.c so that the stats files were saved into $PGDATA/pg_stat directory when the server was shutdowned. But it accidentally forgot to change the location of pg_stat_statements permanent stats file. This commit fixes pg_stat_statements so that its stats file is also saved into $PGDATA/pg_stat at shutdown. Since this fix changes the file layout, we don't back-patch it to 9.3 where this oversight was introduced.
*	Use EncodeDateTime instead of to_char to render JSON timestamps.	Andrew Dunstan	2014-06-03
\| \| \| \| \| \| \| \| \| \|	Per gripe from Peter Eisentraut and Tom Lane. The output is slightly different, but still ISO 8601 compliant: to_char doesn't output the minutes when time zone offset is an integer number of hours, while EncodeDateTime outputs ":00". The code is slightly adapted from code in xml.c
*	Do not escape a unicode sequence when escaping JSON text.	Andrew Dunstan	2014-06-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, any backslash in text being escaped for JSON was doubled so that the result was still valid JSON. However, this led to some perverse results in the case of Unicode sequences, These are now detected and the initial backslash is no longer escaped. All other backslashes are still escaped. No validity check is performed, all that is looked for is \uXXXX where X is a hexidecimal digit. This is a change from the 9.2 and 9.3 behaviour as noted in the Release notes. Per complaint from Teodor Sigaev.
*	Output timestamps in ISO 8601 format when rendering JSON.	Andrew Dunstan	2014-06-03
\| \| \| \| \| \| \| \| \| \| \|	Many JSON processors require timestamp strings in ISO 8601 format in order to convert the strings. When converting a timestamp, with or without timezone, to a JSON datum we therefore now use such a format rather than the type's default text output, in functions such as to_json(). This is a change in behaviour from 9.2 and 9.3, as noted in the release notes.