postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Move BKP_REMOVABLE bit from individual WAL records to WAL page headers.	Tom Lane	2011-12-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removing this bit from xl_info allows us to restore the old limit of four (not three) separate pages touched by a WAL record, which is needed for the upcoming SP-GiST feature, and will likely be useful elsewhere in future. When we implemented XLR_BKP_REMOVABLE in 2007, we had to do it like that because no special WAL-visible action was taken when starting a backup. However, now we force a segment switch when starting a backup, so a compressing WAL archiver (such as pglesslog) that uses the state shown in the current page header will not be fooled as to removability of backup blocks. The only downside is that the archiver will not return to compressing mode for up to one WAL page after the backup is over, which is a small price to pay for getting back the extra xl_info bit. In any case the archiver could look for XLOG_BACKUP_END records if it thought it was worth the trouble to do so. Bump XLOG_PAGE_MAGIC since this is effectively a change in WAL format.
*	Revert the behavior of inet/cidr functions to not unpack the arguments.	Heikki Linnakangas	2011-12-12
\| \| \| \| \| \| \| \| \| \| \|	I forgot to change the functions to use the PG_GETARG_INET_PP() macro, when I changed DatumGetInetP() to unpack the datum, like Datum*P macros usually do. Also, I screwed up the definition of the PG_GETARG_INET_PP() macro, and didn't notice because it wasn't used. This fixes the memory leak when sorting inet values, as reported by Jochen Erwied and debugged by Andres Freund. Backpatch to 8.3, like the previous patch that broke it.
*	Miscellaneous cleanup to silence compiler warnings seen on Mingw.	Andrew Dunstan	2011-12-10
\| \| \| \| \|	Remove some dead code, conditionally declare some items or call some code, and fix one or two declarations.
*	Add ALTER FOREIGN DATA WRAPPER / RENAME and ALTER SERVER / RENAME	Peter Eisentraut	2011-12-09
\|
*	Don't set reachedMinRecoveryPoint during crash recovery. In crash recovery,	Heikki Linnakangas	2011-12-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	we don't reach consistency before replaying all of the WAL. Rename the variable to reachedConsistency, to make its intention clearer. In master, that was an active bug because of the recent patch to immediately PANIC if a reference to a missing page is found in WAL after reaching consistency, as Tom Lane's test case demonstrated. In 9.1 and 9.0, the only consequence was a misleading "consistent recovery state reached at %X/%X" message in the log at the beginning of crash recovery (the database is not consistent at that point yet). In 8.4, the log message was not printed in crash recovery, even though there was a similar reachedMinRecoveryPoint local variable that was also set early. So, backpatch to 9.1 and 9.0.
*	Cancel running query if it is detected that the connection to the client is	Heikki Linnakangas	2011-12-09
\| \| \| \| \| \| \|	lost. The only way we detect that at the moment is when write() fails when we try to write to the socket. Florian Pflug with small changes by me, reviewed by Greg Jaskiewicz.
*	Add const qualifiers to node inspection functions	Peter Eisentraut	2011-12-07
\| \| \| \|	Thomas Munro
*	Fix corner cases in readlink() usage.	Tom Lane	2011-12-07
\| \| \| \| \| \|	Make sure all calls are protected by HAVE_READLINK, and get the buffer overflow tests right. Be a bit more paranoid about string length in _tarWriteHeader(), too.
*	Better error reporting if the link target is too long	Magnus Hagander	2011-12-07
\| \| \| \| \|	This situation won't set errno, so using %m will give an incorrect error message.
*	Avoid using readlink() on platforms that don't support it	Magnus Hagander	2011-12-07
\| \| \| \| \| \| \|	We don't have any such platforms now, but might in the future. Also, detect cases when a tablespace symlink points to a path that is longer than we can handle, and give a warning.
*	Remove spclocation field from pg_tablespace	Magnus Hagander	2011-12-07
\| \| \| \| \| \| \| \|	Instead, add a function pg_tablespace_location(oid) used to return the same information, and do this by reading the symbolic link. Doing it this way makes it possible to relocate a tablespace when the database is down by simply changing the symbolic link.
*	Create a "sort support" interface API for faster sorting.	Tom Lane	2011-12-07
\| \| \| \| \| \| \| \| \| \| \| \|	This patch creates an API whereby a btree index opclass can optionally provide non-SQL-callable support functions for sorting. In the initial patch, we only use this to provide a directly-callable comparator function, which can be invoked with a bit less overhead than the traditional SQL-callable comparator. While that should be of value in itself, the real reason for doing this is to provide a datatype-extensible framework for more aggressive optimizations, as in Peter Geoghegan's recent work. Robert Haas and Tom Lane
*	Typo fixes for commit 2ad36c4e44c8b513f6155656e1b7a8d26715bb94.	Robert Haas	2011-12-06
\| \| \| \|	Noted during post-commit review by by Noah Misch.
*	Remove troublesome Asserts in cost_mergejoin().	Tom Lane	2011-12-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While logically correct, these two Asserts could fail depending on the vagaries of floating-point arithmetic. In particular, on machines with floating-point registers wider than standard "double" values, it was possible for the compiler to compare a rounded-to-double value already stored in memory with an unrounded long double value still in a register. Given the preceding checks, these assertions aren't adding much, so let's just get rid of them rather than try to find a compiler-proof fix. Per report from Pavel Stehule. Given the lack of previous complaints, and the fact that only developers would be likely to trip over it, I'm only going to change this in HEAD, even though the code has been like this for a long time.
*	During recovery, if we reach consistent state and still have entries in the	Heikki Linnakangas	2011-12-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	invalid-page hash table, PANIC immediately. Immediate PANIC is much better than waiting for end-of-recovery, which is what we did before, because the end-of-recovery might not come until months later if this is a standby server. Also refrain from creating a restartpoint if there are invalid-page entries in the hash table. Restarting recovery from such a restartpoint would not see the invalid references, and wouldn't be able to cross-check them when consistency is reached. That wouldn't matter when things are going smoothly, but the more sanity checks you have the better. Fujii Masao
*	Fix getTypeIOParam to support type record[].	Tom Lane	2011-12-01
\| \| \| \| \| \| \| \| \| \| \| \| \|	Since record[] uses array_in, it needs to have its element type passed as typioparam. In HEAD and 9.1, this fix essentially reverts commit 9bc933b2125a5358722490acbc50889887bf7680, which was a hack that is no longer needed since domains don't set their typelem anymore. Before that, adjust the logic so that only domains are excluded from being treated like arrays, rather than assuming that only base types should be included. Add a regression test to demonstrate the need for this. Per report from Maxim Boguk. Back-patch to 8.4, where type record[] was added.
*	Improve table locking behavior in the face of current DDL.	Robert Haas	2011-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
*	Tweak previous patch to ensure edata->filename always gets initialized.	Tom Lane	2011-11-30
\| \| \| \| \| \|	On a platform that isn't supplying __FILE__, previous coding would either crash or give a stale result for the filename string. Not sure how likely that is, but the original code catered for it, so let's keep doing so.
*	Strip file names reported in error messages in vpath builds	Peter Eisentraut	2011-11-30
\| \| \| \| \| \| \|	In vpath builds, the __FILE__ macro that is used in verbose error reports contains the full absolute file name, which makes the error messages excessively verbose. So keep only the base name, thus matching the behavior of non-vpath builds.
*	Prevent autovacuum transactions from running in serializable mode.	Tom Lane	2011-11-29
\| \| \| \| \| \| \| \| \| \| \| \| \|	Force the transaction isolation level to READ COMMITTED in autovacuum worker and launcher processes. There is no benefit to using a higher isolation level, and doing so could result in delaying foreground transactions (or maybe even causing unnecessary serialization failures?). Noted by Dan Ports. Also, make sure we disable zero_damaged_pages and statement_timeout in the autovac launcher, not only workers. Now that the launcher can run transactions, these settings could affect its behavior, and it seems like the same arguments apply to the launcher as the workers.
*	When a row fails a not-null constraint, show row's contents in errdetail.	Tom Lane	2011-11-29
\| \| \| \|	Simple extension of previous patch for CHECK constraints.
*	When a row fails a CHECK constraint, show row's contents in errdetail.	Tom Lane	2011-11-29
\| \| \| \| \| \| \| \| \| \| \| \|	This should make it easier to identify which row is problematic when an insert or update is processing many rows. The formatting is similar to that for unique-index violation messages, except that we limit field widths to 64 bytes since otherwise the message could get unreasonably long. (In particular, there's currently no attempt to quote or escape field values that contain commas etc.) Jan Kundrát, reviewed by Royce Ausburn, somewhat rewritten by me.
*	Make some minor formatting improvements to what pgindent did.	Tom Lane	2011-11-28
\| \| \| \| \| \|	Moving the code two full tab stops to the right requires rethinking of cosmetic code layout choices, which pgindent isn't really able to do for us. Whitespace and comment adjustments only, no code changes.
*	Disallow deletion of CurrentExtensionObject while running extension script.	Tom Lane	2011-11-28
\| \| \| \| \| \| \| \| \| \|	While the deletion in itself wouldn't break things, any further creation of objects in the script would result in dangling pg_depend entries being added by recordDependencyOnCurrentExtension(). An example from Phil Sorber convinced me that this is just barely likely enough to be worth expending a couple lines of code to defend against. The resulting error message might be confusing, but it's better than leaving corrupted catalog contents for the user to deal with.
*	Pgindent clauses.c, per request from Tom.	Bruce Momjian	2011-11-28
\|
*	Convert eval_const_expressions's long series of IsA tests into a switch.	Tom Lane	2011-11-28
\| \| \| \| \| \| \| \| \|	This function has now grown enough cases that a switch seems appropriate. This results in a measurable speed improvement on some platforms, and should certainly not hurt. The code's in need of a pgindent run now, though. Andres Freund
*	Ensure that whole-row junk Vars are always of composite type.	Tom Lane	2011-11-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The EvalPlanQual machinery assumes that whole-row Vars generated for the outputs of non-table RTEs will be of composite types. However, for the case where the RTE is a function call returning a scalar type, we were doing the wrong thing, as a result of sharing code with a parser case where the function's scalar output is wanted. (Or at least, that's what that case has done historically; it does seem a bit inconsistent.) To fix, extend makeWholeRowVar's API so that it can support both use-cases. This fixes Belinda Cussen's report of crashes during concurrent execution of UPDATEs involving joins to the result of UNNEST() --- in READ COMMITTED mode, we'd run the EvalPlanQual machinery after a conflicting row update commits, and it was expecting to get a HeapTuple not a scalar datum from the "wholerowN" variable referencing the function RTE. Back-patch to 9.0 where the current EvalPlanQual implementation appeared. In 9.1 and up, this patch also fixes failure to attach the correct collation to the Var generated for a scalar-result case. An example: regression=# select upper(x.*) from textcat('ab', 'cd') x; ERROR: could not determine which collation to use for upper() function
*	Use IEEE infinity, not 1e10, for null-and-not-null case in gistpenalty().	Tom Lane	2011-11-27
\| \| \| \| \| \| \|	Use of a randomly chosen large value was never exactly graceful, and now that there are penalty functions that are intentionally using infinity, it doesn't seem like a good idea for null-vs-not-null to be using something less.
*	Improve GiST range-contained-by searches by adding a flag for empty ranges.	Tom Lane	2011-11-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the original implementation, a range-contained-by search had to scan the entire index because an empty range could be lurking anywhere. Improve that by adding a flag to upper GiST entries that says whether the represented subtree contains any empty ranges. Also, make a simple mod to the penalty function to discourage empty ranges from getting pushed into subtrees without any. This needs more work, and the picksplit function should be taught about it too, but that code can be improved without causing an on-disk compatibility break; so we'll leave it for another day. Since we're breaking on-disk compatibility of range values anyway, I took the opportunity to reorganize the range flags bits; the unused RANGE_xB_NULL bits are now adjacent, which might open the door for using them in some other way later. In passing, remove the GiST range opclass entry for <>, which doesn't seem like it can really be indexed usefully. Alexander Korotkov, with some editorializing by Tom
*	Make GiST index searches smarter about queries against empty ranges.	Tom Lane	2011-11-26
\| \| \| \| \| \| \|	In the cases where the result of the called proc is negated, we should explicitly test both inputs for empty, to ensure we'll never return "true" for an unsatisfiable query. In other cases we can rely on the called proc to say the right thing.
*	Take fillfactor into account in the new COPY bulk heap insert code.	Heikki Linnakangas	2011-11-26
\| \| \| \|	Jeff Janes
*	Improve logging of autovacuum I/O activity	Alvaro Herrera	2011-11-25
\| \| \| \| \| \| \| \| \|	This adds some I/O stats to the logging of autovacuum (when the operation takes long enough that log_autovacuum_min_duration causes it to be logged), so that it is easier to tune. Notably, it adds buffer I/O counts (hits, misses, dirtied) and read and write rate. Authors: Greg Smith and Noah Misch
*	Fix erroneous replay of GIN_UPDATE_META_PAGE WAL records.	Tom Lane	2011-11-25
\| \| \| \| \| \| \| \| \| \| \| \| \|	A simple thinko in ginRedoUpdateMetapage, namely failing to increment a loop counter, led to inserting records into the last pending-list page in the wrong order (the opposite of that intended). So far as I can tell, this would not upset the code that eventually flushes pending items into the main part of the GIN index. But it did break the code that searched the pending list for matches, resulting in transient failure to find matching entries during index lookups, as illustrated in bug #6307 from Maksym Boguk. Back-patch to 8.4 where the incorrect code was introduced.
*	Move "hot" members of PGPROC into a separate PGXACT array.	Robert Haas	2011-11-25
\| \| \| \| \| \| \| \| \| \| \| \|	This speeds up snapshot-taking and reduces ProcArrayLock contention. Also, the PGPROC (and PGXACT) structures used by two-phase commit are now allocated as part of the main array, rather than in a separate array, and we keep ProcArray sorted in pointer order. These changes are intended to minimize the number of cache lines that must be pulled in to take a snapshot, and testing shows a substantial increase in performance on both read and write workloads at high concurrencies. Pavan Deolasee, Heikki Linnakangas, Robert Haas
*	Fix unsupported options in CREATE TABLE ... AS EXECUTE.	Tom Lane	2011-11-24
\| \| \| \| \| \| \| \| \| \| \|	The WITH [NO] DATA option was not supported, nor the ability to specify replacement column names; the former limitation wasn't even documented, as per recent complaint from Naoya Anzai. Fix by moving the responsibility for supporting these options into the executor. It actually takes less code this way ... catversion bump due to change in representation of IntoClause, which might affect stored rules.
*	Adjust range_adjacent to support different canonicalization rules.	Tom Lane	2011-11-23
\| \| \| \| \| \| \| \| \| \| \|	The original coding would not work for discrete ranges in which the canonicalization rule is to produce symmetric boundaries (either [] or () style), as noted by Jeff Davis. Florian Pflug pointed out that we could fix that by invoking the canonicalization function to see if the range "between" the two given ranges normalizes to empty. This implementation of Florian's idea is a tad slower than the original code, but only in the case where there actually is a canonicalization function --- if not, it's essentially the same logic as before.
*	Creator of a range type must have permission to call support functions.	Tom Lane	2011-11-23
\| \| \| \| \| \| \| \| \| \| \| \|	Since range types can be created by non-superusers, we need to consider their permissions. Ideally we'd check this when the type is used, not when it's created, but that seems like much more trouble than it's worth. The existing restriction that the support functions be immutable already prevents most cases where an unauthorized call to a function might be thought a security issue, and the fact that the user has no access to the results of the system's calls to subtype_diff closes off the other plausible reason for concern. So this check is basically pro-forma, but let's make it anyway.
*	Remove user-selectable ANALYZE option for range types.	Tom Lane	2011-11-23
\| \| \| \| \| \| \| \| \|	It's not clear that a per-datatype typanalyze function would be any more useful than a generic typanalyze for ranges. What is clear is that letting unprivileged users select typanalyze functions is a crash risk or worse. So remove the option from CREATE TYPE AS RANGE, and instead put in a generic typanalyze function for ranges. The generic function does nothing as yet, but hopefully we'll improve that before 9.2 release.
*	Remove zero- and one-argument range constructor functions.	Tom Lane	2011-11-22
\| \| \| \| \| \| \| \| \| \| \| \|	Per discussion, the zero-argument forms aren't really worth the catalog space (just write 'empty' instead). The one-argument forms have some use, but they also have a serious problem with looking too much like functional cast notation; to the point where in many real use-cases, the parser would misinterpret what was wanted. Committing this as a separate patch, with the thought that we might want to revert part or all of it if we can think of some way around the cast ambiguity.
*	Improve implementation of range-contains-element tests.	Tom Lane	2011-11-22
\| \| \| \| \| \| \| \| \| \| \| \|	Implement these tests directly instead of constructing a singleton range and then applying range-contains. This saves a range serialize/deserialize cycle as well as a couple of redundant bound-comparison steps, and adds very little code on net. Remove elem_contained_by_range from the GiST opclass: it doesn't belong there because there is no way to use it in an index clause (where the indexed column would have to be on the left). Its commutator is in the opclass, and that's what counts.
*	Check for INSERT privileges in SELECT INTO / CREATE TABLE AS.	Robert Haas	2011-11-22
\| \| \| \| \| \| \| \| \| \| \| \|	In the normal course of events, this matters only if ALTER DEFAULT PRIVILEGES has been used to revoke default INSERT permission. Whether or not the new behavior is more or less likely to be what the user wants when dealing only with the built-in privilege facilities is arguable, but it's clearly better when using a loadable module such as sepgsql that may use the hook in ExecCheckRTPerms to enforce additional permissions checks. KaiGai Kohei, reviewed by Albe Laurenz
*	Still more review for range-types patch.	Tom Lane	2011-11-22
\| \| \| \| \| \| \| \| \| \|	Per discussion, relax the range input/construction rules so that the only hard error is lower bound > upper bound. Cases where the lower bound is <= upper bound, but the range nonetheless normalizes to empty, are now permitted. Fix core dump in range_adjacent when bounds are infinite. Marginal cleanup of regression test cases, some more code commenting.
*	Continue to allow VACUUM to mark last block of index dirty	Simon Riggs	2011-11-22
\| \| \| \| \|	even when there is no work to do. Further analysis required. Revert of patch c1458cc495ff800cd176a1c2e56d8b62680d9b71
*	More code review for rangetypes patch.	Tom Lane	2011-11-21
\| \| \| \| \| \| \| \| \| \| \|	Fix up some infelicitous coding in DefineRange, and add some missing error checks. Rearrange operator strategy number assignments for GiST anyrange opclass so that they don't make such a mess of opr_sanity's table of operator names associated with different strategy numbers. Assign hopefully-temporary selectivity estimators to range operators that didn't have one --- poor as the estimates are, they're still a lot better than the default 0.5 estimate, and they'll shut up the opr_sanity test that wants to see selectivity estimators on all built-in operators.
*	Further code review for range types patch.	Tom Lane	2011-11-20
\| \| \| \| \|	Fix some bugs in coercion logic and pg_dump; more comment cleanup; minor cosmetic improvements.
*	Avoid floating-point underflow while tracking buffer allocation rate.	Tom Lane	2011-11-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the system is idle for awhile after activity, the "smoothed_alloc" state variable in BgBufferSync converges slowly to zero. With standard IEEE float arithmetic this results in several iterations with denormalized values, which causes kernel traps and annoying log messages on some poorly-designed platforms. There's no real need to track such small values of smoothed_alloc, so we can prevent the kernel traps by forcing it to zero as soon as it's too small to be interesting for our purposes. This issue is purely cosmetic, since the iterations don't happen fast enough for the kernel traps to pose any meaningful performance problem, but still it seems worth shutting up the log messages. The kernel log messages were previously reported by a number of people, but kudos to Greg Matthews for tracking down exactly where they were coming from.
*	Avoid marking buffer dirty when VACUUM has no work to do.	Simon Riggs	2011-11-18
\| \| \| \| \| \| \|	When wal_level = 'hot_standby' we touched the last page of the relation during a VACUUM, even if nothing else had happened. That would alter the LSN of the last block and set the mtime of the relation file unnecessarily. Noted by Thom Brown.
*	Further consolidation of DROP statement handling.	Robert Haas	2011-11-17
\| \| \| \| \| \| \| \| \| \| \|	This gets rid of an impressive amount of duplicative code, with only minimal behavior changes. DROP FOREIGN DATA WRAPPER now requires object ownership rather than superuser privileges, matching the documentation we already have. We also eliminate the historical warning about dropping a built-in function as unuseful. All operations are now performed in the same order for all object types handled by dropcmds.c. KaiGai Kohei, with minor revisions by me
*	Extend the unknowns-are-same-as-known-inputs type resolution heuristic.	Tom Lane	2011-11-17
\| \| \| \| \| \| \| \| \| \| \| \| \|	For a very long time, one of the parser's heuristics for resolving ambiguous operator calls has been to assume that unknown-type literals are of the same type as the other input (if it's known). However, this was only used in the first step of quickly checking for an exact-types match, and thus did not help in resolving matches that require coercion, such as matches to polymorphic operators. As we add more polymorphic operators, this becomes more of a problem. This patch adds another use of the same heuristic as a last-ditch check before failing to resolve an ambiguous operator or function call. In particular this will let us define the range inclusion operator in a less limited way (to come in a follow-on patch).
*	Fix range_cmp_bounds for the case of equal-valued exclusive bounds.	Tom Lane	2011-11-17
\| \| \| \| \| \|	Also improve its comments and related regression tests. Jeff Davis, with some further adjustments by Tom