postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Fix planning of SELECT FOR UPDATE on child table with partial index.	Tom Lane	2014-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ordinarily we can omit checking of a WHERE condition that matches a partial index's condition, when we are using an indexscan on that partial index. However, in SELECT FOR UPDATE we must include the "redundant" filter condition in the plan so that it gets checked properly in an EvalPlanQual recheck. The planner got this mostly right, but improperly omitted the filter condition if the index in question was on an inheritance child table. In READ COMMITTED mode, this could result in incorrectly returning just-updated rows that no longer satisfy the filter condition. The cause of the error is using get_parse_rowmark() when get_plan_rowmark() is what should be used during planning. In 9.3 and up, also fix the same mistake in contrib/postgres_fdw. It's currently harmless there (for lack of inheritance support) but wrong is wrong, and the incorrect code might get copied to someplace where it's more significant. Report and fix by Kyotaro Horiguchi. Back-patch to all supported branches.
*	Fix corner case where SELECT FOR UPDATE could return a row twice.	Tom Lane	2014-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In READ COMMITTED mode, if a SELECT FOR UPDATE discovers it has to redo WHERE-clause checking on rows that have been updated since the SELECT's snapshot, it invokes EvalPlanQual processing to do that. If this first occurs within a non-first child table of an inheritance tree, the previous coding could accidentally re-return a matching row from an earlier, already-scanned child table. (And, to add insult to injury, I think this could make it miss returning a row that should have been returned, if the updated row that this happens on should still have passed the WHERE qual.) Per report from Kyotaro Horiguchi; the added isolation test is based on his test case. This has been broken for quite awhile, so back-patch to all supported branches.
*	Guard against bad "dscale" values in numeric_recv().	Tom Lane	2014-12-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were not checking to see if the supplied dscale was valid for the given digit array when receiving binary-format numeric values. While dscale can validly be more than the number of nonzero fractional digits, it shouldn't be less; that case causes fractional digits to be hidden on display even though they're there and participate in arithmetic. Bug #12053 from Tommaso Sala indicates that there's at least one broken client library out there that sometimes supplies an incorrect dscale value, leading to strange behavior. This suggests that simply throwing an error might not be the best response; it would lead to failures in applications that might seem to be working fine today. What seems the least risky fix is to truncate away any digits that would be hidden by dscale. This preserves the existing behavior in terms of what will be printed for the transmitted value, while preventing subsequent arithmetic from producing results inconsistent with that. In passing, throw a specific error for the case of dscale being outside the range that will fit into a numeric's header. Before you got "value overflows numeric format", which is a bit misleading. Back-patch to all supported branches.
*	Sync unlogged relations to disk after they have been reset.	Andres Freund	2014-11-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlogged relations are only reset when performing a unclean restart. That means they have to be synced to disk during clean shutdowns. During normal processing that's achieved by registering a buffer's file to be fsynced at the next checkpoint when flushed. But ResetUnloggedRelations() doesn't go through the buffer manager, so nothing will force reset relations to disk before the next shutdown checkpoint. So just make ResetUnloggedRelations() fsync the newly created main forks to disk. Discussion: 20140912112246.GA4984@alap3.anarazel.de Backpatch to 9.1 where unlogged tables were introduced. Abhijit Menon-Sen and Andres Freund
*	Ensure unlogged tables are reset even if crash recovery errors out.	Andres Freund	2014-11-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlogged relations are reset at the end of crash recovery as they're only synced to disk during a proper shutdown. Unfortunately that and later steps can fail, e.g. due to running out of space. This reset was, up to now performed after marking the database as having finished crash recovery successfully. As out of space errors trigger a crash restart that could lead to the situation that not all unlogged relations are reset. Once that happend usage of unlogged relations could yield errors like "could not open file "...": No such file or directory". Luckily clusters that show the problem can be fixed by performing a immediate shutdown, and starting the database again. To fix, just call ResetUnloggedRelations(UNLOGGED_RELATION_INIT) earlier, before marking the database as having successfully recovered. Discussion: 20140912112246.GA4984@alap3.anarazel.de Backpatch to 9.1 where unlogged tables were introduced. Abhijit Menon-Sen and Andres Freund
*	Backport "Expose fsync_fname as a public API".	Andres Freund	2014-11-15
\| \| \| \| \|	Backport commit cc52d5b33ff5df29de57dcae9322214cfe9c8464 back to 9.1 to allow backpatching some unlogged table fixes that use fsync_fname.
*	Fix race condition between hot standby and restoring a full-page image.	Heikki Linnakangas	2014-11-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was a window in RestoreBackupBlock where a page would be zeroed out, but not yet locked. If a backend pinned and locked the page in that window, it saw the zeroed page instead of the old page or new page contents, which could lead to missing rows in a result set, or errors. To fix, replace RBM_ZERO with RBM_ZERO_AND_LOCK, which atomically pins, zeroes, and locks the page, if it's not in the buffer cache already. In stable branches, the old RBM_ZERO constant is renamed to RBM_DO_NOT_USE, to avoid breaking any 3rd party extensions that might use RBM_ZERO. More importantly, this avoids renumbering the other enum values, which would cause even bigger confusion in extensions that use ReadBufferExtended, but haven't been recompiled. Backpatch to all supported versions; this has been racy since hot standby was introduced.
*	Fix dependency searching for case where column is visited before table.	Tom Lane	2014-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the recursive search in dependency.c visits a column and then later visits the whole table containing the column, it needs to propagate the drop-context flags for the table to the existing target-object entry for the column. Otherwise we might refuse the DROP (if not CASCADE) on the incorrect grounds that there was no automatic drop pathway to the column. Remarkably, this has not been reported before, though it's possible at least when an extension creates both a datatype and a table using that datatype. Rather than just marking the column as allowed to be dropped, it might seem good to skip the DROP COLUMN step altogether, since the later DROP of the table will surely get the job done. The problem with that is that the datatype would then be dropped before the table (since the whole situation occurred because we visited the datatype, and then recursed to the dependent column, before visiting the table). That seems pretty risky, and the case is rare enough that it doesn't seem worth expending a lot of effort or risk to make the drops happen in a safe order. So we just play dumb and delete the column separately according to the existing drop ordering rules. Per report from Petr Jelinek, though this is different from his proposed patch. Back-patch to 9.1, where extensions were introduced. There's currently no evidence that such cases can arise before 9.1, and in any case we would also need to back-patch cb5c2ba2d82688d29b5902d86b993a54355cad4d to 9.0 if we wanted to back-patch this.
*	Cope with more than 64K phrases in a thesaurus dictionary.	Tom Lane	2014-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dict_thesaurus stored phrase IDs in uint16 fields, so it would get confused and even crash if there were more than 64K entries in the configuration file. It turns out to be basically free to widen the phrase IDs to uint32, so let's just do so. This was complained of some time ago by David Boutin (in bug #7793); he later submitted an informal patch but it was never acted on. We now have another complaint (bug #11901 from Luc Ouellette) so it's time to make something happen. This is basically Boutin's patch, but for future-proofing I also added a defense against too many words per phrase. Note that we don't need any explicit defense against overflow of the uint32 counters, since before that happens we'd hit array allocation sizes that repalloc rejects. Back-patch to all supported branches because of the crash risk.
*	Prevent the unnecessary creation of .ready file for the timeline history file.	Fujii Masao	2014-11-06
\| \| \| \| \| \| \| \| \| \| \|	Previously .ready file was created for the timeline history file at the end of an archive recovery even when WAL archiving was not enabled. This creation is unnecessary and causes .ready file to remain infinitely. This commit changes an archive recovery so that it creates .ready file for the timeline history file only when WAL archiving is enabled. Backpatch to all supported versions.
*	Drop no-longer-needed buffers during ALTER DATABASE SET TABLESPACE.	Tom Lane	2014-11-04
\| \| \| \| \| \| \| \| \| \| \| \| \|	The previous coding assumed that we could just let buffers for the database's old tablespace age out of the buffer arena naturally. The folly of that is exposed by bug #11867 from Marc Munro: the user could later move the database back to its original tablespace, after which any still-surviving buffers would match lookups again and appear to contain valid data. But they'd be missing any changes applied while the database was in the new tablespace. This has been broken since ALTER SET TABLESPACE was introduced, so back-patch to all supported branches.
*	Test IsInTransactionChain, not IsTransactionBlock, in vac_update_relstats.	Tom Lane	2014-10-30
\| \| \| \| \| \| \| \| \| \| \|	As noted by Noah Misch, my initial cut at fixing bug #11638 didn't cover all cases where ANALYZE might be invoked in an unsafe context. We need to test the result of IsInTransactionChain not IsTransactionBlock; which is notationally a pain because IsInTransactionChain requires an isTopLevel flag, which would have to be passed down through several levels of callers. I chose to pass in_outer_xact (ie, the result of IsInTransactionChain) rather than isTopLevel per se, as that seemed marginally more apropos for the intermediate functions to know about.
*	Avoid corrupting tables when ANALYZE inside a transaction is rolled back.	Tom Lane	2014-10-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VACUUM and ANALYZE update the target table's pg_class row in-place, that is nontransactionally. This is OK, more or less, for the statistical columns, which are mostly nontransactional anyhow. It's not so OK for the DDL hint flags (relhasindex etc), which might get changed in response to transactional changes that could still be rolled back. This isn't a problem for VACUUM, since it can't be run inside a transaction block nor in parallel with DDL on the table. However, we allow ANALYZE inside a transaction block, so if the transaction had earlier removed the last index, rule, or trigger from the table, and then we roll back the transaction after ANALYZE, the table would be left in a corrupted state with the hint flags not set though they should be. To fix, suppress the hint-flag updates if we are InTransactionBlock(). This is safe enough because it's always OK to postpone hint maintenance some more; the worst-case consequence is a few extra searches of pg_index et al. There was discussion of instead using a transactional update, but that would change the behavior in ways that are not all desirable: in most scenarios we're better off keeping ANALYZE's statistical values even if the ANALYZE itself rolls back. In any case we probably don't want to change this behavior in back branches. Per bug #11638 from Casey Shobe. This has been broken for a good long time, so back-patch to all supported branches. Tom Lane and Michael Paquier, initial diagnosis by Andres Freund
*	Fix two bugs in tsquery @> operator.	Heikki Linnakangas	2014-10-27
\| \| \| \| \| \| \| \| \| \| \| \| \|	1. The comparison for matching terms used only the CRC to decide if there's a match. Two different terms with the same CRC gave a match. 2. It assumed that if the second operand has more terms than the first, it's never a match. That assumption is bogus, because there can be duplicate terms in either operand. Rewrite the implementation in a way that doesn't have those bugs. Backpatch to all supported versions.
*	Improve ispell dictionary's defenses against bad affix files.	Tom Lane	2014-10-23
\| \| \| \| \| \| \| \| \| \| \| \| \|	Don't crash if an ispell dictionary definition contains flags but not any compound affixes. (This isn't a security issue since only superusers can install affix files, but still it's a bad thing.) Also, be more careful about detecting whether an affix-file FLAG command is old-format (ispell) or new-format (myspell/hunspell). And change the error message about mixed old-format and new-format commands into something intelligible. Per bug #11770 from Emre Hasegeli. Back-patch to all supported branches.
*	Flush unlogged table's buffers when copying or moving databases.	Andres Freund	2014-10-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CREATE DATABASE and ALTER DATABASE .. SET TABLESPACE copy the source database directory on the filesystem level. To ensure the on disk state is consistent they block out users of the affected database and force a checkpoint to flush out all data to disk. Unfortunately, up to now, that checkpoint didn't flush out dirty buffers from unlogged relations. That bug means there could be leftover dirty buffers in either the template database, or the database in its old location. Leading to problems when accessing relations in an inconsistent state; and to possible problems during shutdown in the SET TABLESPACE case because buffers belonging files that don't exist anymore are flushed. This was reported in bug #10675 by Maxim Boguk. Fix by Pavan Deolasee, modified somewhat by me. Reviewed by MauMau and Fujii Masao. Backpatch to 9.1 where unlogged tables were introduced.
*	Avoid core dump in _outPathInfo() for Path without a parent RelOptInfo.	Tom Lane	2014-10-17
\| \| \| \| \| \| \| \|	Nearly all Paths have parents, but a ResultPath representing an empty FROM clause does not. Avoid a core dump in such cases. I believe this is only a hazard for debugging usage, not for production, else we'd have heard about it before. Nonetheless, back-patch to 9.1 where the troublesome code was introduced. Noted while poking at bug #11703.
*	Support timezone abbreviations that sometimes change.	Tom Lane	2014-10-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Up to now, PG has assumed that any given timezone abbreviation (such as "EDT") represents a constant GMT offset in the usage of any particular region; we had a way to configure what that offset was, but not for it to be changeable over time. But, as with most things horological, this view of the world is too simplistic: there are numerous regions that have at one time or another switched to a different GMT offset but kept using the same timezone abbreviation. Almost the entire Russian Federation did that a few years ago, and later this month they're going to do it again. And there are similar examples all over the world. To cope with this, invent the notion of a "dynamic timezone abbreviation", which is one that is referenced to a particular underlying timezone (as defined in the IANA timezone database) and means whatever it currently means in that zone. For zones that use or have used daylight-savings time, the standard and DST abbreviations continue to have the property that you can specify standard or DST time and get that time offset whether or not DST was theoretically in effect at the time. However, the abbreviations mean what they meant at the time in question (or most recently before that time) rather than being absolutely fixed. The standard abbreviation-list files have been changed to use this behavior for abbreviations that have actually varied in meaning since 1970. The old simple-numeric definitions are kept for abbreviations that have not changed, since they are a bit faster to resolve. While this is clearly a new feature, it seems necessary to back-patch it into all active branches, because otherwise use of Russian zone abbreviations is going to become even more problematic than it already was. This change supersedes the changes in commit 513d06ded et al to modify the fixed meanings of the Russian abbreviations; since we've not shipped that yet, this will avoid an undesirably incompatible (not to mention incorrect) change in behavior for timestamps between 2011 and 2014. This patch makes some cosmetic changes in ecpglib to keep its usage of datetime lookup tables as similar as possible to the backend code, but doesn't do anything about the increasingly obsolete set of timezone abbreviation definitions that are hard-wired into ecpglib. Whatever we do about that will likely not be appropriate material for back-patching. Also, a potential free() of a garbage pointer after an out-of-memory failure in ecpglib has been fixed. This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that caused it to produce unexpected results near a timezone transition, if both the "before" and "after" states are marked as standard time. We'd only ever thought about or tested transitions between standard and DST time, but that's not what's happening when a zone simply redefines their base GMT offset. In passing, update the SGML documentation to refer to the Olson/zoneinfo/ zic timezone database as the "IANA" database, since it's now being maintained under the auspices of IANA.
*	Cannot rely on %z printf length modifier.	Heikki Linnakangas	2014-10-05
\| \| \| \| \| \| \|	Before version 9.4, we didn't require sprintf to support the %z length modifier. Use %lu instead. Reported by Peter Eisentraut. Apply to 9.3 and earlier.
*	Update time zone data files to tzdata release 2014h.	Tom Lane	2014-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most zones in the Russian Federation are subtracting one or two hours as of 2014-10-26. Update the meanings of the abbreviations IRKT, KRAT, MAGT, MSK, NOVT, OMST, SAKT, VLAT, YAKT, YEKT to match. The IANA timezone database has adopted abbreviations of the form AxST/AxDT for all Australian time zones, reflecting what they believe to be current majority practice Down Under. These names do not conflict with usage elsewhere (other than ACST for Acre Summer Time, which has been in disuse since 1994). Accordingly, adopt these names into our "Default" timezone abbreviation set. The "Australia" abbreviation set now contains only CST,EAST,EST,SAST,SAT,WST, all of which are thought to be mostly historical usage. Note that SAST has also been changed to be South Africa Standard Time in the "Default" abbreviation set. Add zone abbreviations SRET (Asia/Srednekolymsk) and XJT (Asia/Urumqi), and use WSST/WSDT for western Samoa. Also a DST law change in the Turks & Caicos Islands (America/Grand_Turk), and numerous corrections for historical time zone data.
*	Don't balance vacuum cost delay when per-table settings are in effect	Alvaro Herrera	2014-10-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When there are cost-delay-related storage options set for a table, trying to make that table participate in the autovacuum cost-limit balancing algorithm produces undesirable results: instead of using the configured values, the global values are always used, as illustrated by Mark Kirkwood in http://www.postgresql.org/message-id/52FACF15.8020507@catalyst.net.nz Since the mechanism is already complicated, just disable it for those cases rather than trying to make it cope. There are undesirable side-effects from this too, namely that the total I/O impact on the system will be higher whenever such tables are vacuumed. However, this is seen as less harmful than slowing down vacuum, because that would cause bloat to accumulate. Anyway, in the new system it is possible to tweak options to get the precise behavior one wants, whereas with the previous system one was simply hosed. This has been broken forever, so backpatch to all supported branches. This might affect systems where cost_limit and cost_delay have been set for individual tables.
*	Check for GiST index tuples that don't fit on a page.	Heikki Linnakangas	2014-10-03
\| \| \| \| \| \| \| \| \|	The page splitting code would go into infinite recursion if you try to insert an index tuple that doesn't fit even on an empty page. Per analysis and suggested fix by Andrew Gierth. Fixes bug #11555, reported by Bryan Seitz (analysis happened over IRC). Backpatch to all supported versions.
*	Fix some more problems with nested append relations.	Tom Lane	2014-10-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As of commit a87c72915 (which later got backpatched as far as 9.1), we're explicitly supporting the notion that append relations can be nested; this can occur when UNION ALL constructs are nested, or when a UNION ALL contains a table with inheritance children. Bug #11457 from Nelson Page, as well as an earlier report from Elvis Pranskevichus, showed that there were still nasty bugs associated with such cases: in particular the EquivalenceClass mechanism could try to generate "join" clauses connecting an appendrel child to some grandparent appendrel, which would result in assertion failures or bogus plans. Upon investigation I concluded that all current callers of find_childrel_appendrelinfo() need to be fixed to explicitly consider multiple levels of parent appendrels. The most complex fix was in processing of "broken" EquivalenceClasses, which are ECs for which we have been unable to generate all the derived equality clauses we would like to because of missing cross-type equality operators in the underlying btree operator family. That code path is more or less entirely untested by the regression tests to date, because no standard opfamilies have such holes in them. So I wrote a new regression test script to try to exercise it a bit, which turned out to be quite a worthwhile activity as it exposed existing bugs in all supported branches. The present patch is essentially the same as far back as 9.2, which is where parameterized paths were introduced. In 9.0 and 9.1, we only need to back-patch a small fragment of commit 5b7b5518d, which fixes failure to propagate out the original WHERE clauses when a broken EC contains constant members. (The regression test case results show that these older branches are noticeably stupider than 9.2+ in terms of the quality of the plans generated; but we don't really care about plan quality in such cases, only that the plan not be outright wrong. A more invasive fix in the older branches would not be a good idea anyway from a plan-stability standpoint.)
*	Fix VPATH builds of the replication parser from git for some !gcc compilers.	Andres Freund	2014-09-25
\| \| \| \| \| \| \| \| \| \| \| \| \|	Some compilers don't automatically search the current directory for included files. 9cc2c182fc2 fixed that for builds from tarballs by adding an include to the source directory. But that doesn't work when the scanner is generated in the VPATH directory. Use the same search path as the other parsers in the tree. One compiler that definitely was affected is solaris' sun cc. Backpatch to 9.1 which introduced using an actual parser for replication commands.
*	Fix power_var_int() for large integer exponents.	Tom Lane	2014-09-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code for raising a NUMERIC value to an integer power wasn't very careful about large powers. It got an outright wrong answer for an exponent of INT_MIN, due to failure to consider overflow of the Abs(exp) operation; which is fixable by using an unsigned rather than signed exponent value after that point. Also, even though the number of iterations of the power-computation loop is pretty limited, it's easy for the repeated squarings to result in ridiculously enormous intermediate values, which can take unreasonable amounts of time/memory to process, or even overflow the internal "weight" field and so produce a wrong answer. We can forestall misbehaviors of that sort by bailing out as soon as the weight value exceeds what will fit in int16, since then the final answer must overflow (if exp > 0) or underflow (if exp < 0) the packed numeric format. Per off-list report from Pavel Stehule. Back-patch to all supported branches.
*	Fix spinlock implementation for some !solaris sparc platforms.	Andres Freund	2014-09-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some Sparc CPUs can be run in various coherence models, ranging from RMO (relaxed) over PSO (partial) to TSO (total). Solaris has always run CPUs in TSO mode while in userland, but linux didn't use to and the various *BSDs still don't. Unfortunately the sparc TAS/S_UNLOCK were only correct under TSO. Fix that by adding the necessary memory barrier instructions. On sparcv8+, which should be all relevant CPUs, these are treated as NOPs if the current consistency model doesn't require the barriers. Discussion: 20140630222854.GW26930@awork2.anarazel.de Will be backpatched to all released branches once a few buildfarm cycles haven't shown up problems. As I've no access to sparc, this is blindly written.
*	Fix segmentation fault that an empty prepared statement could cause.	Fujii Masao	2014-09-05
\| \| \| \| \| \|	Back-patch to all supported branches. Per bug #11335 from Haruka Takatsuka
*	Fix failure to follow the directions when "init" fork was added.	Fujii Masao	2014-08-11
\| \| \| \| \| \| \| \| \| \|	Specifically this commit updates forkname_to_number() so that the HINT message includes "init" fork, and also adds the description of "init" fork into pg_relation_size() document. This is a part of the commit 2d00190495b22e0d0ba351b2cda9c95fb2e3d083 which has fixed the same oversight in master and 9.4. Back-patch to 9.1 where "init" fork was added.
*	Reject duplicate column names in foreign key referenced-columns lists.	Tom Lane	2014-08-09
\| \| \| \| \| \| \| \| \| \| \| \| \|	Such cases are disallowed by the SQL spec, and even if we wanted to allow them, the semantics seem ambiguous: how should the FK columns be matched up with the columns of a unique index? (The matching could be significant in the presence of opclasses with different notions of equality, so this issue isn't just academic.) However, our code did not previously reject such cases, but instead would either fail to match to any unique index, or generate a bizarre opclass-lookup error because of sloppy thinking in the index-matching code. David Rowley
*	Avoid wholesale autovacuuming when autovacuum is nominally off.	Tom Lane	2014-07-30
\| \| \| \| \| \| \| \| \| \| \| \| \|	When autovacuum is nominally off, we will still launch autovac workers to vacuum tables that are at risk of XID wraparound. But after we'd done that, an autovac worker would proceed to autovacuum every table in the targeted database, if they meet the usual thresholds for autovacuuming. This is at best pretty unexpected; at worst it delays response to the wraparound threat. Fix it so that if autovacuum is nominally off, we only do forced vacuums and not any other work. Per gripe from Andrey Zhidenkov. This has been like this all along, so back-patch to all supported branches.
*	Treat 2PC commit/abort the same as regular xacts in recovery.	Heikki Linnakangas	2014-07-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There were several oversights in recovery code where COMMIT/ABORT PREPARED records were ignored: * pg_last_xact_replay_timestamp() (wasn't updated for 2PC commits) * recovery_min_apply_delay (2PC commits were applied immediately) * recovery_target_xid (recovery would not stop if the XID used 2PC) The first of those was reported by Sergiy Zuban in bug #11032, analyzed by Tom Lane and Andres Freund. The bug was always there, but was masked before commit d19bd29f07aef9e508ff047d128a4046cc8bc1e2, because COMMIT PREPARED always created an extra regular transaction that was WAL-logged. Backpatch to all supported versions (older versions didn't have all the features and therefore didn't have all of the above bugs).
*	Check block number against the correct fork in get_raw_page().	Tom Lane	2014-07-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	get_raw_page tried to validate the supplied block number against RelationGetNumberOfBlocks(), which of course is only right when accessing the main fork. In most cases, the main fork is longer than the others, so that the check was too weak (allowing a lower-level error to be reported, but no real harm to be done). However, very small tables could have an FSM larger than their heap, in which case the mistake prevented access to some FSM pages. Per report from Torsten Foertsch. In passing, make the bad-block-number error into an ereport not elog (since it's certainly not an internal error); and fix sloppily maintained comment for RelationGetNumberOfBlocksInFork. This has been wrong since we invented relation forks, so back-patch to all supported branches.
*	Reject out-of-range numeric timezone specifications.	Tom Lane	2014-07-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 631dc390f49909a5c8ebd6002cfb2bcee5415a9d, we started to handle simple numeric timezone offsets via the zic library instead of the old CTimeZone/HasCTZSet kluge. However, we overlooked the fact that the zic code will reject UTC offsets exceeding a week (which seems a bit arbitrary, but not because it's too tight ...). This led to possibly setting session_timezone to NULL, which results in crashes in most timezone-related operations as of 9.4, and crashes in a small number of places even before that. So check for NULL return from pg_tzset_offset() and report an appropriate error message. Per bug #11014 from Duncan Gillis. Back-patch to all supported branches, like the previous patch. (Unfortunately, as of today that no longer includes 8.4.)
*	Translation updates	Peter Eisentraut	2014-07-21
\|
*	Fix two low-probability memory leaks in regular expression parsing.	Tom Lane	2014-07-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If pg_regcomp failed after having invoked markst/cleanst, it would leak any "struct subre" nodes it had created. (We've already detected all regex syntax errors at that point, so the only likely causes of later failure would be query cancel or out-of-memory.) To fix, make sure freesrnode knows the difference between the pre-cleanst and post-cleanst cleanup procedures. Add some documentation of this less-than-obvious point. Also, newlacon did the wrong thing with an out-of-memory failure from realloc(), so that the previously allocated array would be leaked. Both of these are pretty low-probability scenarios, but a bug is a bug, so patch all the way back. Per bug #10976 from Arthur O'Dwyer.
*	Fix REASSIGN OWNED for text search objects	Alvaro Herrera	2014-07-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trying to reassign objects owned by a user that had text search dictionaries or configurations used to fail with: ERROR: unexpected classid 3600 or ERROR: unexpected classid 3602 Fix by adding cases for those object types in a switch in pg_shdepend.c. Both REASSIGN OWNED and text search objects go back all the way to 8.1, so backpatch to all supported branches. In 9.3 the alter-owner code was made generic, so the required change in recent branches is pretty simple; however, for 9.2 and older ones we need some additional reshuffling to enable specifying objects by OID rather than name. Text search templates and parsers are not owned objects, so there's no change required for them. Per bug #9749 reported by Michal Novotný
*	Reset master xmin when hot_standby_feedback disabled.	Simon Riggs	2014-07-15
\| \| \| \| \| \|	If walsender has xmin of standby then ensure we reset the value to 0 when we change from hot_standby_feedback=on to hot_standby_feedback=off.
*	Fix bug with whole-row references to append subplans.	Tom Lane	2014-07-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ExecEvalWholeRowVar incorrectly supposed that it could "bless" the source TupleTableSlot just once per query. But if the input is coming from an Append (or, perhaps, other cases?) more than one slot might be returned over the query run. This led to "record type has not been registered" errors when a composite datum was extracted from a non-blessed slot. This bug has been there a long time; I guess it escaped notice because when dealing with subqueries the planner tends to expand whole-row Vars into RowExprs, which don't have the same problem. It is possible to trigger the problem in all active branches, though, as illustrated by the added regression test.
*	Don't assume a subquery's output is unique if there's a SRF in its tlist.	Tom Lane	2014-07-08
\| \| \| \| \| \| \| \| \| \| \| \| \|	While the x output of "select x from t group by x" can be presumed unique, this does not hold for "select x, generate_series(1,10) from t group by x", because we may expand the set-returning function after the grouping step. (Perhaps that should be re-thought; but considering all the other oddities involved with SRFs in targetlists, it seems unlikely we'll change it.) Put a check in query_is_distinct_for() so it's not fooled by such cases. Back-patch to all supported branches. David Rowley
*	Add some errdetail to checkRuleResultList().	Tom Lane	2014-07-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function wasn't originally thought to be really user-facing, because converting a table to a view isn't something we expect people to do manually. So not all that much effort was spent on the error messages; in particular, while the code will complain that you got the column types wrong it won't say exactly what they are. But since we repurposed the code to also check compatibility of rule RETURNING lists, it's definitely user-facing. It now seems worthwhile to add errdetail messages showing exactly what the conflict is when there's a mismatch of column names or types. This is prompted by bug #10836 from Matthias Raffelsieper, which might have been forestalled if the error message had reported the wrong column type as being "record". Per Alvaro's advice, back-patch to branches before 9.4, but resist the temptation to rephrase any existing strings there. Adding new strings is not really a translation degradation; anyway having the info presented in English is better than not having it at all.
*	Back-patch "Fix EquivalenceClass processing for nested append relations".	Tom Lane	2014-06-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we committed a87c729153e372f3731689a7be007bc2b53f1410, we somehow failed to notice that it didn't merely improve plan quality for expression indexes; there were very closely related cases that failed outright with "could not find pathkey item to sort". The failing cases seem to be those where the planner was already capable of selecting a MergeAppend plan, and there was inheritance involved: the lack of appropriate eclass child members would prevent prepare_sort_from_pathkeys() from succeeding on the MergeAppend's child plan nodes for inheritance child tables. Accordingly, back-patch into 9.1 through 9.3, along with an extra regression test case covering the problem. Per trouble report from Michael Glaesemann.
*	Don't allow foreign tables with OIDs.	Heikki Linnakangas	2014-06-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The syntax doesn't let you specify "WITH OIDS" for foreign tables, but it was still possible with default_with_oids=true. But the rest of the system, including pg_dump, isn't prepared to handle foreign tables with OIDs properly. Backpatch down to 9.1, where foreign tables were introduced. It's possible that there are databases out there that already have foreign tables with OIDs. There isn't much we can do about that, but at least we can prevent them from being created in the future. Patch by Etsuro Fujita, reviewed by Hadi Moshayedi.
*	Avoid leaking memory while evaluating arguments for a table function.	Tom Lane	2014-06-19
\| \| \| \| \| \| \| \| \| \| \| \| \|	ExecMakeTableFunctionResult evaluated the arguments for a function-in-FROM in the query-lifespan memory context. This is insignificant in simple cases where the function relation is scanned only once; but if the function is in a sub-SELECT or is on the inside of a nested loop, any memory consumed during argument evaluation can add up quickly. (The potential for trouble here had been foreseen long ago, per existing comments; but we'd not previously seen a complaint from the field about it.) To fix, create an additional temporary context just for this purpose. Per an example from MauMau. Back-patch to all active branches.
*	Fix ancient encoding error in hungarian.stop.	Tom Lane	2014-06-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we grabbed this file off the Snowball project's website, we mistakenly supposed that it was in LATIN1 encoding, but evidently it was actually in LATIN2. This resulted in ő (o-double-acute, U+0151, which is code 0xF5 in LATIN2) being misconverted into õ (o-tilde, U+00F5), as complained of in bug #10589 from Zoltán Sörös. We'd have messed up u-double-acute too, but there aren't any of those in the file. Other characters used in the file have the same codes in LATIN1 and LATIN2, which no doubt helped hide the problem for so long. The error is not only ours: the Snowball project also was confused about which encoding is required for Hungarian. But dealing with that will require source-code changes that I'm not at all sure we'll wish to back-patch. Fixing the stopword file seems reasonably safe to back-patch however.
*	Add defenses against running with a wrong selection of LOBLKSIZE.	Tom Lane	2014-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's critical that the backend's idea of LOBLKSIZE match the way data has actually been divided up in pg_largeobject. While we don't provide any direct way to adjust that value, doing so is a one-line source code change and various people have expressed interest recently in changing it. So, just as with TOAST_MAX_CHUNK_SIZE, it seems prudent to record the value in pg_control and cross-check that the backend's compiled-in setting matches the on-disk data. Also tweak the code in inv_api.c so that fetches from pg_largeobject explicitly verify that the length of the data field is not more than LOBLKSIZE. Formerly we just had Asserts() for that, which is no protection at all in production builds. In some of the call sites an overlength data value would translate directly to a security-relevant stack clobber, so it seems worth one extra runtime comparison to be sure. In the back branches, we can't change the contents of pg_control; but we can still make the extra checks in inv_api.c, which will offer some amount of protection against running with the wrong value of LOBLKSIZE.
*	Fix longstanding bug in HeapTupleSatisfiesVacuum().	Andres Freund	2014-06-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HeapTupleSatisfiesVacuum() didn't properly discern between DELETE_IN_PROGRESS and INSERT_IN_PROGRESS for rows that have been inserted in the current transaction and deleted in a aborted subtransaction of the current backend. At the very least that caused problems for CLUSTER and CREATE INDEX in transactions that had aborting subtransactions producing rows, leading to warnings like: WARNING: concurrent delete in progress within table "..." possibly in an endless, uninterruptible, loop. Instead of treating InProgress xmins the same as IsCurrent ones, treat them as being distinct like the other visibility routines. As implemented this separatation can cause a behaviour change for rows that have been inserted and deleted in another, still running, transaction. HTSV will now return INSERT_IN_PROGRESS instead of DELETE_IN_PROGRESS for those. That's both, more in line with the other visibility routines and arguably more correct. The latter because a INSERT_IN_PROGRESS will make callers look at/wait for xmin, instead of xmax. The only current caller where that's possibly worse than the old behaviour is heap_prune_chain() which now won't mark the page as prunable if a row has concurrently been inserted and deleted. That's harmless enough. As a cautionary measure also insert a interrupt check before the gotos in IndexBuildHeapScan() that lead to the uninterruptible loop. There are other possible causes, like a row that several sessions try to update and all fail, for repeated loops and the cost of doing so in the retry case is low. As this bug goes back all the way to the introduction of subtransactions in 573a71a5da backpatch to all supported releases. Reported-By: Sandro Santilli
*	Set the process latch when processing recovery conflict interrupts.	Andres Freund	2014-06-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because RecoveryConflictInterrupt() didn't set the process latch anything using the latter to wait for events didn't get notified about recovery conflicts. Most latch users are never the target of recovery conflicts, which explains the lack of reports about this until now. Since 9.3 two possible affected users exist though: The sql callable pg_sleep() now uses latches to wait and background workers are expected to use latches in their main loop. Both would currently wait until the end of WaitLatch's timeout. Fix by adding a SetLatch() to RecoveryConflictInterrupt(). It'd also be possible to fix the issue by having each latch user set set_latch_on_sigusr1. That seems failure prone and though, as most of these callsites won't often receive recovery conflicts and thus will likely only be tested against normal query cancels et al. It'd also be unnecessarily verbose. Backpatch to 9.1 where latches were introduced. Arguably 9.3 would be sufficient, because that's where pg_sleep() was converted to waiting on the latch and background workers got introduced; but there could be user level code making use of the latch pre 9.3.
*	Revert "Fix bogus %name-prefix option syntax in all our Bison files."	Tom Lane	2014-05-28
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 4c5fde4e288983f30dae09a7eea8e6a9e6145477. It turns out that the %name-prefix syntax without "=" does not work at all in pre-2.4 Bison. We are not prepared to make such a large jump in minimum required Bison version just to suppress a warning message in a version hardly any developers are using yet. When 3.0 gets more popular, we'll figure out a way to deal with this. In the meantime, BISONFLAGS=-Wno-deprecated is recommendable for anyone using 3.0 who doesn't want to see the warning.
*	Fix bogus %name-prefix option syntax in all our Bison files.	Tom Lane	2014-05-28
\| \| \| \| \| \| \| \| \| \| \| \|	%name-prefix doesn't use an "=" sign according to the Bison docs, but it silently accepted one anyway, until Bison 3.0. This was originally a typo of mine in commit 012abebab1bc72043f3f670bf32e91ae4ee04bd2, and we seem to have slavishly copied the error into all the other grammar files. Per report from Vik Fearing; analysis by Peter Eisentraut. Back-patch to all active branches, since somebody might try to build a back branch with up-to-date tools.
*	Ensure cleanup in case of early errors in streaming base backups	Magnus Hagander	2014-05-28
\| \| \| \| \| \| \| \| \| \|	Move the code that sends the initial status information as well as the calculation of paths inside the ENSURE_ERROR_CLEANUP block. If this code failed, we would "leak" a counter of number of concurrent backups, thereby making the system always believe it was in backup mode. This could happen if the sending failed (which it probably never did given that the small amount of data to send would never cause a flush). It is very low risk, but all operations after do_pg_start_backup should be protected.