postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Fix Xmax freeze conditions	Alvaro Herrera	2013-02-08
\| \| \| \| \| \| \|	I broke this in 0ac5ad5134; previously, freezing a tuple marked with an IS_MULTI xmax was not necessary. Per brokenness report from Jeff Janes.
*	Fix another typo in a comment	Magnus Hagander	2013-02-08
\| \| \| \|	Noted by Thom Brown
*	Fix typo in comment	Magnus Hagander	2013-02-08
\| \| \| \|	Etsuro Fujita
*	Fix performance issue in EXPLAIN (ANALYZE, TIMING OFF).	Tom Lane	2013-02-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit af7914c6627bcf0b0ca614e9ce95d3f8056602bf, which added the TIMING option to EXPLAIN, had an oversight: if the TIMING option is disabled then control in InstrStartNode() goes through an elog(DEBUG2) call, which typically does nothing but takes a noticeable amount of time to do it. Tweak the logic to avoid that. In HEAD, also change the elog(DEBUG2)'s in instrument.c to elog(ERROR). It's not very clear why they weren't like that to begin with, but this episode shows that not complaining more vociferously about misuse is likely to do little except allow bugs to remain hidden. While at it, adjust some code that was making possibly-dangerous assumptions about flag bits being in the rightmost byte of the instrument_options word. Problem reported by Pavel Stehule (via Tomas Vondra).
*	Repair bugs in GiST page splitting code for multi-column indexes.	Tom Lane	2013-02-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
*	Fix possible failure to send final transaction counts to stats collector.	Tom Lane	2013-02-07
\| \| \| \| \| \| \| \| \| \| \| \| \|	Normally, we suppress sending a tabstats message to the collector unless there were some actual table stats to send. However, during backend exit we should force out the message if there are any transaction commit/abort counts to send, else the session's last few commit/abort counts will never get reported at all. We had logic for this, but the short-circuit test at the top of pgstat_report_stat() ignored the "force" flag, with the consequence that session-ending transactions that touched no database-local tables would not get counted. Seems to be an oversight in my commit 641912b4d17fd214a5e5bae4e7bb9ddbc28b144b, which added the "force" flag. That was back in 8.3, so back-patch to all supported versions.
*	Rely only on checkpoint 1 at end of recovery.	Simon Riggs	2013-02-07
\| \| \| \| \| \| \|	Searching for checkpoint 2 (previous) is not correct in all cases. Bug report from Heikki Linnakangas
*	Enable building with Microsoft Visual Studio 2012.	Andrew Dunstan	2013-02-06
\| \| \| \| \| \|	Backpatch to release 9.2 Brar Piening and Noah Misch, reviewed by Craig Ringer.
*	Split out list of XLog resource managers	Alvaro Herrera	2013-02-06
\| \| \| \| \| \| \| \| \| \|	The new rmgrlist.h header, containing all necessary data about built-in resource managers, allows other pieces of code to access them. In particular, this allows a future pg_xlogdump program to extract rm_desc function pointers, without having to keep a duplicate list of them.
*	Improve error message wording	Alvaro Herrera	2013-02-06
\| \| \| \| \| \|	The wording changes applied in 0ac5ad513 were universally disliked. Per gripe from Andrew Dunstan
*	Prevent execution of enum_recv() from SQL.	Tom Lane	2013-02-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function was misdeclared to take cstring when it should take internal. This at least allows crashing the server, and in principle an attacker might be able to use the function to examine the contents of server memory. The correct fix is to adjust the system catalog contents (and fix the regression tests that should have caught this but failed to). However, asking users to correct the catalog contents in existing installations is a pain, so as a band-aid fix for the back branches, install a check in enum_recv() to make it throw error if called with a cstring argument. We will later revert this in HEAD in favor of correcting the catalogs. Our thanks to Sumit Soni (via Secunia SVCRP) for reporting this issue. Security: CVE-2013-0255
*	Reset vacuum_defer_cleanup_age to PGC_SIGHUP.	Simon Riggs	2013-02-04
\| \| \| \|	Revert commit 84725aa5efe11688633b553e58113efce4181f2e
*	Reset master xmin when hot_standby_feedback disabled.	Simon Riggs	2013-02-04
\| \| \| \| \| \|	If walsender has xmin of standby then ensure we reset the value to 0 when we change from hot_standby_feedback=on to hot_standby_feedback=off.
*	Perform line wrapping and indenting by default in ruleutils.c.	Tom Lane	2013-02-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes pg_get_viewdef() and allied functions so that PRETTY_INDENT processing is always enabled. Per discussion, only the PRETTY_PAREN processing (that is, stripping of "unnecessary" parentheses) poses any real forward-compatibility risk, so we may as well make dump output look as nice as we safely can. Also, set the default wrap length to zero (i.e, wrap after each SELECT or FROM list item), since there's no very principled argument for the former default of 80-column wrapping, and most people seem to agree this way looks better. Marko Tiikkaja, reviewed by Jeevan Chalke, further hacking by Tom Lane
*	Mark vacuum_defer_cleanup_age as PGC_POSTMASTER.	Simon Riggs	2013-02-02
\| \| \| \|	Following bug analysis of #7819 by Tom Lane
*	Adjust COPY FREEZE error message to be more accurate and consistent.	Bruce Momjian	2013-02-02
\| \| \| \|	Per suggestions from Noah and Tom.
*	Fix typo in freeze_table_age implementation	Alvaro Herrera	2013-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original code used freeze_min_age instead of freeze_table_age. The main consequence of this mistake is that lowering freeze_min_age would cause full-table scans to occur much more frequently, which causes serious issues because the number of writes required is much larger. That feature (freeze_min_age) is supposed to affect only how soon tuples are frozen; some pages should still be skipped due to the visibility map. Backpatch to 8.4, where the freeze_table_age feature was introduced. Report and patch from Andres Freund
*	Fill tuple before HeapSatisfiesHOTAndKeyUpdate	Alvaro Herrera	2013-02-01
\| \| \| \| \| \| \| \| \| \| \| \|	Failing to do this results in almost all updates to system catalogs being non-HOT updates, because the OID column would differ (not having been set for the new tuple), which is an indexed column. While at it, make sure to set the tableoid early in both old and new tuples as well. This isn't of much consequence, since that column is seldom (never?) indexed. Report and patch from Andres Freund.
*	Add CREATE RECURSIVE VIEW syntax	Peter Eisentraut	2013-01-31
\| \| \| \| \| \| \| \|	This is specified in the SQL standard. The CREATE RECURSIVE VIEW specification is transformed into a normal CREATE VIEW statement with a WITH RECURSIVE clause. reviewed by Abhijit Menon-Sen and Stephen Frost
*	Restrict infomask bits to set on multixacts	Alvaro Herrera	2013-01-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We must only set the bit(s) for the strongest lock held in the tuple; otherwise, a multixact containing members with exclusive lock and key-share lock will behave as though only a share lock is held. This bug was introduced in commit 0ac5ad5134, somewhere along development, when we allowed a singleton FOR SHARE lock to be implemented without a MultiXact by using a multi-bit pattern. I overlooked that GetMultiXactIdHintBits() needed to be tweaked as well. Previously, we could have the bits for FOR KEY SHARE and FOR UPDATE simultaneously set and it wouldn't cause a problem. Per report from digoal@126.com
*	Switch timelines if we crash soon after promotion.	Simon Riggs	2013-01-31
\| \| \| \| \| \| \| \| \|	Previous patch to skip checkpoints at end of recovery didn't correctly perform crash recovery, fumbling the timeline switch. Now we record the minRecoveryPointTLI of the newly selected timeline, so that we crash recover to the correct timeline. Bug report from Fujii Masao, investigated by me.
*	Reject nonzero day fields in AT TIME ZONE INTERVAL functions.	Tom Lane	2013-01-31
\| \| \| \| \| \| \| \| \| \|	It's not sensible for an interval that's used as a time zone value to be larger than a day. When we changed the interval type to contain a separate day field, check_timezone() was adjusted to reject nonzero day values, but timetz_izone(), timestamp_izone(), and timestamptz_izone() evidently were overlooked. While at it, make the error messages for these three cases consistent.
*	Fix plpgsql's reporting of plan-time errors in possibly-simple expressions.	Tom Lane	2013-01-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	exec_simple_check_plan and exec_eval_simple_expr attempted to call GetCachedPlan directly. This meant that if an error was thrown during planning, the resulting context traceback would not include the line normally contributed by _SPI_error_callback. This is already inconsistent, but just to be really odd, a re-execution of the very same expression would show the additional context line, because we'd already have cached the plan and marked the expression as non-simple. The problem is easy to demonstrate in 9.2 and HEAD because planning of a cached plan doesn't occur at all until GetCachedPlan is done. In earlier versions, it could only be an issue if initial planning had succeeded, then a replan was forced (already somewhat improbable for a simple expression), and the replan attempt failed. Since the issue is mainly cosmetic in older branches anyway, it doesn't seem worth the risk of trying to fix it there. It is worth fixing in 9.2 since the instability of the context printout can affect the results of GET STACKED DIAGNOSTICS, as per a recent discussion on pgsql-novice. To fix, introduce a SPI function that wraps GetCachedPlan while installing the correct callback function. Use this instead of calling GetCachedPlan directly from plpgsql. Also introduce a wrapper function for extracting a SPI plan's CachedPlanSource list. This lets us stop including spi_priv.h in pl_exec.c, which was never a very good idea from a modularity standpoint. In passing, fix a similar inconsistency that could occur in SPI_cursor_open, which was also calling GetCachedPlan without setting up a context callback.
*	Fix grammar for subscripting or field selection from a sub-SELECT result.	Tom Lane	2013-01-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Such cases should work, but the grammar failed to accept them because of our ancient precedence hacks to convince bison that extra parentheses around a sub-SELECT in an expression are unambiguous. (Formally, they are ambiguous, but we don't especially care whether they're treated as part of the sub-SELECT or part of the expression. Bison cares, though.) Fix by adding a redundant-looking production for this case. This is a fine example of why fixing shift/reduce conflicts via precedence declarations is more dangerous than it looks: you can easily cause the parser to reject cases that should work. This has been wrong since commit 3db4056e22b0c6b2adc92543baf8408d2894fe91 or maybe before, and apparently some people have been working around it by inserting no-op casts. That method introduces a dump/reload hazard, as illustrated in bug #7838 from Jan Mate. Hence, back-patch to all active branches.
*	Provide database object names as separate fields in error messages.	Tom Lane	2013-01-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch addresses the problem that applications currently have to extract object names from possibly-localized textual error messages, if they want to know for example which index caused a UNIQUE_VIOLATION failure. It adds new error message fields to the wire protocol, which can carry the name of a table, table column, data type, or constraint associated with the error. (Since the protocol spec has always instructed clients to ignore unrecognized field types, this should not create any compatibility problem.) Support for providing these new fields has been added to just a limited set of error reports (mainly, those in the "integrity constraint violation" SQLSTATE class), but we will doubtless add them to more calls in future. Pavel Stehule, reviewed and extensively revised by Peter Geoghegan, with additional hacking by Tom Lane.
*	Skip truncating ON COMMIT DELETE ROWS temp tables, if the transaction hasn't	Heikki Linnakangas	2013-01-29
\| \| \| \| \| \| \| \|	touched any temporary tables. We could try harder, and keep track of whether we've inserted to any temp tables, rather than accessed them, and which temp tables have been inserted to. But this is dead simple, and already covers many interesting scenarios.
*	Fast promote mode skips checkpoint at end of recovery.	Simon Riggs	2013-01-29
\| \| \| \| \| \| \| \| \| \| \|	pg_ctl promote -m fast will skip the checkpoint at end of recovery so that we can achieve very fast failover when the apply delay is low. Write new WAL record XLOG_END_OF_RECOVERY to allow us to switch timeline correctly for downstream log readers. If we skip synchronous end of recovery checkpoint we request a normal spread checkpoint so that the window of re-recovery is low. Simon Riggs and Kyotaro Horiguchi, with input from Fujii Masao. Review by Heikki Linnakangas
*	REASSIGN OWNED: handle shared objects, too	Alvaro Herrera	2013-01-28
\| \| \| \| \| \| \| \| \| \| \|	Give away ownership of shared objects (databases, tablespaces) along with local objects, per original code intention. Try to make the documentation clearer, too. Per discussion about DROP OWNED's brokenness, in bug #7748. This is not backpatched because it'd require some refactoring of the ALTER/SET OWNER code for databases and tablespaces.
*	DROP OWNED: don't try to drop tablespaces/databases	Alvaro Herrera	2013-01-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My "fix" for bugs #7578 and #6116 on DROP OWNED at fe3b5eb08a1 not only misstated that it applied to REASSIGN OWNED (which it did not affect), but it also failed to fix the problems fully, because I didn't test the case of owned shared objects. Thus I created a new bug, reported by Thomas Kellerer as #7748, which would cause DROP OWNED to fail with a not-for-user-consumption error message. The code would attempt to drop the database, which not only fails to work because the underlying code does not support that, but is a pretty dangerous and undesirable thing to be doing as well. This patch fixes that bug by having DROP OWNED only attempt to process shared objects when grants on them are found, ignoring ownership. Backpatch to 8.3, which is as far as the previous bug was backpatched.
*	Make LATERAL implicit for functions in FROM.	Tom Lane	2013-01-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SQL standard does not have general functions-in-FROM, but it does allow UNNEST() there (see the <collection derived table> production), and the semantics of that are defined to include lateral references. So spec compliance requires allowing lateral references within UNNEST() even without an explicit LATERAL keyword. Rather than making UNNEST() a special case, it seems best to extend this flexibility to any function-in-FROM. We'll still allow LATERAL to be written explicitly for clarity's sake, but it's now a noise word in this context. In theory this change could result in a change in behavior of existing queries, by allowing what had been an outer reference in a function-in-FROM to be captured by an earlier FROM-item at the same level. However, all pre-9.3 PG releases have a bug that causes them to match variable references to earlier FROM-items in preference to outer references (and then throw an error). So no previously-working query could contain the type of ambiguity that would risk a change of behavior. Per a suggestion from Andrew Gierth, though I didn't use his patch.
*	Update comments in new DROP IF EXISTS code; commit message update	Bruce Momjian	2013-01-26
\| \| \| \| \| \| \|	DROP IF EXISTS with a missing schema in commit 7e2322dff30c04d90c0602d2b5ae24b4881db88b applies not only to tables, but to DROP IF EXISTS with missing schemas for indexes, views, sequences, and foreign tables. Yeah!
*	Update LookupExplicitNamespace() comments; commit message update	Bruce Momjian	2013-01-26
\| \| \| \| \|	Also, commit 7e2322dff30c04d90c0602d2b5ae24b4881db88b affected DROP TABLE IF EXISTS, not CREATE TABLE IF EXISTS.
*	Issue ERROR if FREEZE mode can't be honored by COPY	Bruce Momjian	2013-01-26
\| \| \| \| \| \|	Previously non-honored FREEZE mode was ignored. This also issues an appropriate error message based on the cause of the failure, per suggestion from Tom. Additional regression test case added.
*	Allow CREATE TABLE IF EXIST so succeed if the schema is nonexistent	Bruce Momjian	2013-01-26
\| \| \| \| \| \|	Previously, CREATE TABLE IF EXIST threw an error if the schema was nonexistent. This was done by passing 'missing_ok' to the function that looks up the schema oid.
*	Change plan caching to honor, not resist, changes in search_path.	Tom Lane	2013-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the initial implementation of plan caching, we saved the active search_path when a plan was first cached, then reinstalled that path anytime we needed to reparse or replan. The idea of that was to try to reselect the same referenced objects, in somewhat the same way that views continue to refer to the same objects in the face of schema or name changes. Of course, that analogy doesn't bear close inspection, since holding the search_path fixed doesn't cope with object drops or renames. Moreover sticking with the old path seems to create more surprises than it avoids. So instead of doing that, consider that the cached plan depends on search_path, and force reparse/replan if the active search_path is different than it was when we last saved the plan. This gets us fairly close to having "transparency" of plan caching, in the sense that the cached statement acts the same as if you'd just resubmitted the original query text for another execution. There are still some corner cases where this fails though: a new object added in the search path schema(s) might capture a reference in the query text, but we'd not realize that and force a reparse. We might try to fix that in the future, but for the moment it looks too expensive and complicated.
*	Add some randomness to the choice of which GiST page to insert to.	Heikki Linnakangas	2013-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When descending the tree for an insert, and there are multiple equally good pages we could insert to, make the choice in random. Previously, we would always choose the tuple with lowest offset number. That meant that when two non-leaf pages overlap - in the extreme case they might have exactly the same key - all but the first such page went unused. That wasn't optimal for space usage; if you deleted some tuples from the non-first pages, the space would never be reused. With this patch, the other pages are sometimes chosen too, although there's still a heavy bias towards low-offset tuples, so that we don't lose cache locality when doing a lot of inserts with similar keys. Original idea by Alexander Korotkov, although this patch version was written by me and copy-edited by Tom Lane.
*	Fix concat() and format() to handle VARIADIC-labeled arguments correctly.	Tom Lane	2013-01-25
\| \| \| \| \| \| \| \|	Previously, the VARIADIC labeling was effectively ignored, but now these functions act as though the array elements had all been given as separate arguments. Pavel Stehule
*	Fix SPI documentation for new handling of ExecutorRun's count parameter.	Tom Lane	2013-01-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since 9.0, the count parameter has only limited the number of tuples actually returned by the executor. It doesn't affect the behavior of INSERT/UPDATE/DELETE unless RETURNING is specified, because without RETURNING, the ModifyTable plan node doesn't return control to execMain.c for each tuple. And we only check the limit at the top level. While this behavioral change was unintentional at the time, discussion of bug #6572 led us to the conclusion that we prefer the new behavior anyway, and so we should just adjust the docs to match rather than change the code. Accordingly, do that. Back-patch as far as 9.0 so that the docs match the code in each branch.
*	Fix rare missing cancellations in Hot Standby.	Simon Riggs	2013-01-24
\| \| \| \| \| \| \| \| \| \| \| \|	The machinery around XLOG_HEAP2_CLEANUP_INFO failed to correctly pass through the necessary information on latestRemovedXid, avoiding cancellations in some infrequent concurrent update/cleanup scenarios. Backpatchable fix to 9.0 Detailed bug report and fix by Noah Misch, backpatchable version by me.
*	Also fix rotation of csvlog on Windows.	Heikki Linnakangas	2013-01-24
\| \| \| \|	Backpatch to 9.2, like the previous fix.
*	Fix failure to rotate postmaster log file for size reasons on Windows.	Tom Lane	2013-01-23
\| \| \| \| \| \| \| \| \| \| \| \|	When we eliminated "unnecessary" wakeups of the syslogger process, we broke size-based logfile rotation on Windows, because on that platform data transfer is done in a separate thread. While non-Windows platforms would recheck the output file size after every log message, Windows only did so when the control thread woke up for some other reason, which might be quite infrequent. Per bug #7814 from Tsunezumi. Back-patch to 9.2 where the problem was introduced. Jeff Janes
*	Improve concurrency of foreign key locking	Alvaro Herrera	2013-01-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov
*	Fix more issues with cascading replication and timeline switches.	Heikki Linnakangas	2013-01-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a standby server follows the master using WAL archive, and it chooses a new timeline (recovery_target_timeline='latest'), it only fetches the timeline history file for the chosen target timeline, not any other history files that might be missing from pg_xlog. For example, if the current timeline is 2, and we choose 4 as the new recovery target timeline, the history file for timeline 3 is not fetched, even if it's part of this server's history. That's enough for the standby itself - the history file for timeline 4 includes timeline 3 as well - but if a cascading standby server wants to recover to timeline 3, it needs the history file. To fix, when a new recovery target timeline is chosen, try to copy any missing history files from the archive to pg_xlog between the old and new target timeline. A second similar issue was with the WAL files. When a standby recovers from archive, and it reaches a segment that contains a switch to a new timeline, recovery fetches only the WAL file labelled with the new timeline's ID. The file from the new timeline contains a copy of the WAL from the old timeline up to the point where the switch happened, and recovery recovers it from the new file. But in streaming replication, walsender only tries to read it from the old timeline's file. To fix, change walsender to read it from the new file, so that it behaves the same as recovery in that sense, and doesn't try to open the possibly nonexistent file with the old timeline's ID.
*	Fix a few small bugs in yesterday's event trigger patch.	Robert Haas	2013-01-22
\| \| \| \|	Dimitri Fontaine
*	Add infrastructure for storing a VARIADIC ANY function's VARIADIC flag.	Tom Lane	2013-01-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Originally we didn't bother to mark FuncExprs with any indication whether VARIADIC had been given in the source text, because there didn't seem to be any need for it at runtime. However, because we cannot fold a VARIADIC ANY function's arguments into an array (since they're not necessarily all the same type), we do actually need that information at runtime if VARIADIC ANY functions are to respond unsurprisingly to use of the VARIADIC keyword. Add the missing field, and also fix ruleutils.c so that VARIADIC ANY function calls are dumped properly. Extracted from a larger patch that also fixes concat() and format() (the only two extant VARIADIC ANY functions) to behave properly when VARIADIC is specified. This portion seems appropriate to review and commit separately. Pavel Stehule
*	Add ddl_command_end support for event triggers.	Robert Haas	2013-01-21
\| \| \| \|	Dimitri Fontaine, with slight changes by me
*	Refactor ALTER some-obj RENAME implementation	Alvaro Herrera	2013-01-21
\| \| \| \| \| \| \| \| \|	Remove duplicate implementations of catalog munging and miscellaneous privilege checks. Instead rely on already existing data in objectaddress.c to do the work. Author: KaiGai Kohei, changes by me Reviewed by: Robert Haas, Álvaro Herrera, Dimitri Fontaine
*	Fix error-checking typo in check_TSCurrentConfig().	Tom Lane	2013-01-20
\| \| \| \| \| \|	The code failed to detect an out-of-memory failure. Xi Wang
*	Fix an O(N^2) performance issue for sessions modifying many relations.	Tom Lane	2013-01-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AtEOXact_RelationCache() scanned the entire relation cache at the end of any transaction that created a new relation or assigned a new relfilenode. Thus, clients such as pg_restore had an O(N^2) performance problem that would start to be noticeable after creating 10000 or so tables. Since typically only a small number of relcache entries need any cleanup, we can fix this by keeping a small list of their OIDs and doing hash_searches for them. We fall back to the full-table scan if the list overflows. Ideally, the maximum list length would be set at the point where N hash_searches would cost just less than the full-table scan. Some quick experimentation says that point might be around 50-100; I (tgl) conservatively set MAX_EOXACT_LIST = 32. For the case that we're worried about here, which is short single-statement transactions, it's unlikely there would ever be more than about a dozen list entries anyway; so it's probably not worth being too tense about the value. We could avoid the hash_searches by instead keeping the target relcache entries linked into a list, but that would be noticeably more complicated and bug-prone because of the need to maintain such a list in the face of relcache entry drops. Since a relcache entry can only need such cleanup after a somewhat-heavyweight filesystem operation, trying to save a hash_search per cleanup doesn't seem very useful anyway --- it's the scan over all the not-needing-cleanup entries that we wish to avoid here. Jeff Janes, reviewed and tweaked a bit by Tom Lane
*	Protect against SnapshotNow race conditions in pg_tablespace scans.	Tom Lane	2013-01-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use of SnapshotNow is known to expose us to race conditions if the tuple(s) being sought could be updated by concurrently-committing transactions. CREATE DATABASE and DROP DATABASE are particularly exposed because they do heavyweight filesystem operations during their scans of pg_tablespace, so that the scans run for a very long time compared to most. Furthermore, the potential consequences of a missed or twice-visited row are nastier than average: * createdb() could fail with a bogus "file already exists" error, or silently fail to copy one or more tablespace's worth of files into the new database. * remove_dbtablespaces() could miss one or more tablespaces, thus failing to free filesystem space for the dropped database. * check_db_file_conflict() could likewise miss a tablespace, leading to an OID conflict that could result in data loss either immediately or in future operations. (This seems of very low probability, though, since a duplicate database OID would be unlikely to start with.) Hence, it seems worth fixing these three places to use MVCC snapshots, even though this will someday be superseded by a generic solution to SnapshotNow race conditions. Back-patch to all active branches. Stephen Frost and Tom Lane