postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Fix multicolumn GIN's wrong results with fastupdate enabled.	Teodor Sigaev	2009-11-13
\| \| \| \| \| \| \| \|	User-defined consistent functions believes the check array contains at least one true element which was not a true for scanning pending list. Per report from Yury Don <yura@vpcit.ru>
*	Dept of second thoughts: after studying index_getnext() a bit more I realize	Tom Lane	2009-11-01
\| \| \| \| \| \|	that it can scribble on scan->xs_ctup.t_self while following HOT chains, so we can't rely on that to stay valid between hashgettuple() calls. Introduce a private variable in HashScanOpaque, instead.
*	Fix two serious bugs introduced into hash indexes by the 8.4 patch that made	Tom Lane	2009-11-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	hash indexes keep entries sorted by hash value. First, the original plans for concurrency assumed that insertions would happen only at the end of a page, which is no longer true; this could cause scans to transiently fail to find index entries in the presence of concurrent insertions. We can compensate by teaching scans to re-find their position after re-acquiring read locks. Second, neither the bucket split nor the bucket compaction logic had been fixed to preserve hashvalue ordering, so application of either of those processes could lead to permanent corruption of an index, in the sense that searches might fail to find entries that are present. This patch fixes the split and compaction logic to preserve hashvalue ordering, but it cannot do anything about pre-existing corruption. We will need to recommend reindexing all hash indexes in the 8.4.2 release notes. To buy back the performance loss hereby induced in split and compaction, fix them to use PageIndexMultiDelete instead of retail PageIndexDelete operations. We might later want to do something with qsort'ing the page contents rather than doing a binary search for each insertion, but that seemed more invasive than I cared to risk in a back-patch. Per bug #5157 from Jeff Janes and subsequent investigation.
*	Make sure that GIN fast-insert and regular code paths enforce the same	Tom Lane	2009-10-02
\| \| \| \| \| \| \| \| \| \| \|	tuple size limit. Improve the error message for index-tuple-too-large so that it includes the actual size, the limit, and the index name. Sync with the btree occurrences of the same error. Back-patch to 8.4 because it appears that the out-of-sync problem is occurring in the field. Teodor and Tom
*	Fix incorrect arguments for gist_box_penalty call. The bug could be observed	Teodor Sigaev	2009-09-18
\| \| \| \| \| \|	only for secondary page split (i.e. for non-first columns of index) Patch by Paul Ramsey <pramsey@opengeo.org>
*	Fix two distinct errors in creation of GIN_INSERT_LISTPAGE xlog records.	Tom Lane	2009-09-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In practice these mistakes were always masked when full_page_writes was on, because XLogInsert would always choose to log the full page, and then ginRedoInsertListPage wouldn't try to do anything. But with full_page_writes off a WAL replay failure was certain. The GIN_INSERT_LISTPAGE record type could probably be eliminated entirely in favor of using XLOG_HEAP_NEWPAGE, but I refrained from doing that now since it would have required a significantly more invasive patch. In passing do a little bit of code cleanup, including making the accounting for free space on GIN list pages more precise. (This wasn't a bug as the errors were always in the conservative direction.) Per report from Simon. Back-patch to 8.4 which contains the identical code.
*	Don't error out if recycling or removing an old WAL segment fails at the end	Heikki Linnakangas	2009-09-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of checkpoint. Although the checkpoint has been written to WAL at that point already, so that all data is safe, and we'll retry removing the WAL segment at the next checkpoint, if such a failure persists we won't be able to remove any other old WAL segments either and will eventually run out of disk space. It's better to treat the failure as non-fatal, and move on to clean any other WAL segment and continue with any other end-of-checkpoint cleanup. We don't normally expect any such failures, but on Windows it can happen with some anti-virus or backup software that lock files without FILE_SHARE_DELETE flag. Also, the loop in pgrename() to retry when the file is locked was broken. If a file is locked on Windows, you get ERROR_SHARE_VIOLATION, not ERROR_ACCESS_DENIED, at least on modern versions. Fix that, although I left the check for ERROR_ACCESS_DENIED in there as well (presumably it was correct in some environment), and added ERROR_LOCK_VIOLATION to be consistent with similar checks in pgwin32_open(). Reduce the timeout on the loop from 30s to 10s, on the grounds that since it's been broken, we've effectively had a timeout of 0s and no-one has complained, so a smaller timeout is actually closer to the old behavior. A longer timeout would mean that if recycling a WAL file fails because it's locked for some reason, InstallXLogFileSegment() will hold ControlFileLock for longer, potentially blocking other backends, so a long timeout isn't totally harmless. While we're at it, set errno correctly in pgrename(). Backpatch to 8.2, which is the oldest version supported on Windows. The xlog.c changes would make sense on other platforms and thus on older versions as well, but since there's no such locking issues on other platforms, it's not worth it.
*	On Windows, when a file is deleted and another process still has an open	Heikki Linnakangas	2009-09-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	file handle on it, the file goes into "pending deletion" state where it still shows up in directory listing, but isn't accessible otherwise. That confuses RemoveOldXLogFiles(), making it think that the file hasn't been archived yet, while it actually was, and it was deleted along with the .done file. Fix that by renaming the file with ".deleted" extension before deleting it. Also check the return value of rename() and unlink(), so that if the removal fails for any reason (e.g another process is holding the file locked), we don't delete the .done file until the WAL file is really gone. Backpatch to 8.2, which is the oldest version supported on Windows.
*	Fix handling of autovacuum reloptions.	Alvaro Herrera	2009-08-27
\| \| \| \| \| \| \| \|	In the original coding, setting a single reloption would cause default values to be used for all the other reloptions. This is a problem particularly for autovacuum reloptions. Itagaki Takahiro
*	In the checkpoint written at the end of archive recovery, the WAL page header	Heikki Linnakangas	2009-08-27
\| \| \| \| \| \| \| \| \| \|	was incorrectly initialized with timeline ID 0. That rendered the WAL page unrecoverable, making a subsequent archive recovery stop at that point. ThisTimeLineID needs to be initialized before calling AdvanceXLInsertBuffer(). This fixes bug #5011 reported by James Bardin. Backpatch to 8.4, as the bug was introduced by the changes to use of bgwriter for writing the end-of-archive-recovery checkpoint. Patch by Tom Lane.
*	Fix a violation of WAL coding rules in the recent patch to include an	Tom Lane	2009-08-24
\| \| \| \| \| \| \| \| \| \| \|	"all tuples visible" flag in heap page headers. The flag update must be applied before calling XLogInsert, but heap_update and the tuple moving routines in VACUUM FULL were ignoring this rule. A crash and replay could therefore leave the flag incorrectly set, causing rows to appear visible in seqscans when they should not be. This might explain recent reports of data corruption from Jeff Ross and others. In passing, do a bit of editorialization on comments in visibilitymap.c.
*	Document that LocalSetXLogInsertAllowed can be re-executed.	Tom Lane	2009-08-08
\| \| \| \|	Per comment from Simon.
*	rm_cleanup functions need to be allowed to write WAL entries. This oversight	Tom Lane	2009-08-07
\| \| \| \| \|	appears to explain the recent reports of "PANIC: cannot make new WAL entries during recovery".
*	Cleanup and code review for the patch that made bgwriter active during	Tom Lane	2009-06-26
\| \| \| \| \| \| \| \| \| \| \| \| \|	archive recovery. Invent a separate state variable and inquiry function for XLogInsertAllowed() to clarify some tests and make the management of writing the end-of-recovery checkpoint less klugy. Fix several places that were incorrectly testing InRecovery when they should be looking at RecoveryInProgress or XLogInsertAllowed (because they will now be executed in the bgwriter not startup process). Clarify handling of bad LSNs passed to XLogFlush during recovery. Use a spinlock for setting/testing SharedRecoveryInProgress. Improve quite a lot of comments. Heikki and Tom
*	Fix some serious bugs in archive recovery, now that bgwriter is active	Heikki Linnakangas	2009-06-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	during it: When bgwriter is active, the startup process can't perform mdsync() correctly because it won't see the fsync requests accumulated in bgwriter's private pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery checkpoint as well, when it's active. When bgwriter is active (= archive recovery), the startup process must not accumulate fsync requests to its own pendingOpsTable, since bgwriter won't see them there when it performs restartpoints. Make startup process drop its pendingOpsTable when bgwriter is launched to avoid that. Update minimum recovery point one last time when leaving archive recovery. It won't be updated by the end-of-recovery checkpoint because XLogFlush() sees us as out of recovery already. This fixes bug #4879 reported by Fujii Masao.
*	The code to unlink dropped relations in FinishPreparedTransaction() was	Heikki Linnakangas	2009-06-25
\| \| \| \| \| \|	acting like runs inside WAL recovery, but it doesn't. I must've copy-pasted this from a redo-function in the relation forks patch. Noticed by Tom Lane while he was looking through callers of smgrdounlink().
*	Correct grammar in picksplit debug messages	Peter Eisentraut	2009-06-24
\|
*	Fix a few errors in comments. Patch by Fujii Masao, plus the one in	Heikki Linnakangas	2009-06-18
\| \| \| \|	visibilitymap.c by me.
*	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list	Bruce Momjian	2009-06-11
\| \| \| \|	provided by Andrew.
*	Improve capitalization and punctuation in recently added GiST message.	Peter Eisentraut	2009-06-10
\|
*	Keep rs_startblock the same during heap_rescan, so that a rescan of a SeqScan	Tom Lane	2009-06-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	node starts from the same place as the first scan did. This avoids surprising behavior of scrollable and WITH HOLD cursors, as seen in Mark Kirkwood's bug report of yesterday. It's not entirely clear whether a rescan should be forced to drop out of the syncscan mode, but for the moment I left the code behaving the same on that point. Any change there would only be a performance and not a correctness issue, anyway. Back-patch to 8.3, since the unstable behavior was created by the syncscan patch.
*	Improve the IndexVacuumInfo/IndexBulkDeleteResult API to allow somewhat sane	Tom Lane	2009-06-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	behavior in cases where we don't know the heap tuple count accurately; in particular partial vacuum, but this also makes the API a bit more useful for ANALYZE. This patch adds "estimated_count" flags to both structs so that an approximate count can be flagged as such, and adjusts the logic so that approximate counts are not used for updating pg_class.reltuples. This fixes my previous complaint that VACUUM was putting ridiculous values into pg_class.reltuples for indexes. The actual impact of that bug is limited, because the planner only pays attention to reltuples for an index if the index is partial; which probably explains why beta testers hadn't noticed a degradation in plan quality from it. But it needs to be fixed. The whole thing is a bit messy and should be redesigned in future, because reltuples now has the potential to drift quite far away from reality when a long period elapses with no non-partial vacuums. But this is as good as it's going to get for 8.4.
*	Fix a serious bug introduced into GIN in 8.4: now that MergeItemPointers()	Tom Lane	2009-06-06
\| \| \| \| \| \| \| \| \|	is supposed to remove duplicate heap TIDs, we have to be sure to reduce the tuple size and posting-item count accordingly in addItemPointersToTuple(). Failing to do so resulted in the effective injection of garbage TIDs into the index contents, ie, whatever happened to be in the memory palloc'd for the new tuple. I'm not sure that this fully explains the index corruption reported by Tatsuo Ishii, but the test case I'm using no longer fails.
*	Only recycle normal files in pg_xlog as WAL segments. pg_standby creates	Heikki Linnakangas	2009-06-02
\| \| \| \| \| \| \| \|	symbolic links with the -l option, and as Fujii Masao pointed out we ended up overwriting files in the archive directory before this patch. Patch by Aidan Van Dyk, Fujii Masao and me. Backpatch to 8.3, where pg_standby was introduced.
*	When archiving is enabled, rotate the last WAL segment at shutdown so that	Heikki Linnakangas	2009-05-28
\| \| \| \| \| \|	all transactions are archived. Original patch by Guillaume Smet.
*	Use more-portable coding for the check on handing out the last available	Tom Lane	2009-05-24
\| \| \| \|	relopt_kind value in add_reloption_kind(). Per Zdenek Kotala.
*	Fix bug #4814 (wrong subscript in consistent-function call), and add some	Tom Lane	2009-05-19
\| \| \| \|	minimal regression test coverage for matchPartialInPendingList().
*	Fix all the server-side SIGQUIT handlers (grumble ... why so many identical	Tom Lane	2009-05-15
\| \| \| \| \| \| \|	copies?) to ensure they really don't run proc_exit/shmem_exit callbacks, as was intended. I broke this behavior recently by installing atexit callbacks without thinking about the one case where we truly don't want to run those callback functions. Noted in an example from Dave Page.
*	Include recovery_end_command in recovery.conf.sample.	Tom Lane	2009-05-14
\| \| \| \|	Per suggestion of Jaime Casanova.
*	Improve a couple of comments.	Tom Lane	2009-05-14
\|
*	Add recovery_end_command option to recovery.conf. recovery_end_command	Heikki Linnakangas	2009-05-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	is run at the end of archive recovery, providing a chance to do external cleanup. Modify pg_standby so that it no longer removes the trigger file, that is to be done using the recovery_end_command now. Provide a "smart" failover mode in pg_standby, where we don't fail over immediately, but only after recovering all unapplied WAL from the archive. That gives you zero data loss assuming all WAL was archived before failover, which is what most users of pg_standby actually want. recovery_end_command by Simon Riggs, pg_standby changes by Fujii Masao and myself.
*	Rewrite xml.c's memory management (yet again). Give up on the idea of	Tom Lane	2009-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	redirecting libxml's allocations into a Postgres context. Instead, just let it use malloc directly, and add PG_TRY blocks as needed to be sure we release libxml data structures in error recovery code paths. This is ugly but seems much more likely to play nicely with third-party uses of libxml, as seen in recent trouble reports about using Perl XML facilities in pl/perl and bug #4774 about contrib/xml2. I left the code for allocation redirection in place, but it's only built/used if you #define USE_LIBXMLCONTEXT. This is because I found it useful to corral libxml's allocations in a palloc context when hunting for libxml memory leaks, and we're surely going to have more of those in the future with this type of approach. But we don't want it turned on in a normal build because it breaks exactly what we need to fix. I have not re-indented most of the code sections that are now wrapped by PG_TRY(); that's for ease of review. pg_indent will fix it. This is a pre-existing bug in 8.3, but I don't dare back-patch this change until it's gotten a reasonable amount of field testing.
*	Fix LOCK TABLE to eliminate the race condition that could make it give weird	Tom Lane	2009-05-12
\| \| \| \| \| \| \| \| \|	errors when tables are concurrently dropped. To do this we must take lock on each relation before we check its privileges. The old code was trying to do that the other way around, which is a bit pointless when there are lots of other commands that lock relations before checking privileges. I did keep it checking each relation's privilege before locking the next relation, which is a detail that ALTER TABLE isn't too picky about.
*	Request XLOG switch before writing checkpoint in pg_start_backup(). Otherwise	Heikki Linnakangas	2009-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	you can end up with an unrecoverable backup if you start a new base backup right after finishing archive recovery. In that scenario, the redo pointer of the checkpoint that pg_start_backup() writes points to the XLOG segment where the timeline-changing end-of-archive-recovery checkpoint is. The beginning of that segment contains pages with the old timeline ID, and we don't accept that in recovery unless we find a history file covering the old timeline ID. If you omit pg_xlog from the base backup and clear the archive directory before starting the backup, there will be no such history file available. The bug is present in all versions since PITR was introduced in 8.0, but I'm back-patching only back to 8.2. Earlier versions didn't have XLOG switch records, making this fix unfeasible. Given the lack of reports until now, it doesn't seem worthwhile to spend more effort to fix 8.0 and 8.1. Per report and suggestion by Mikael Krantz
*	Insert CHECK_FOR_INTERRUPTS() calls into btree and hash index scans at the	Tom Lane	2009-05-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	points where we step right or left to the next page. This should ensure reasonable response time to a query cancel request during an unsuccessful index scan, as seen in recent gripe from Marc Cousin. It's a bit trickier than it might seem at first glance, because CHECK_FOR_INTERRUPTS() is a no-op if executed while holding a buffer lock. So we have to do it just at the point where we've dropped one page lock and not yet acquired the next. Remove CHECK_FOR_INTERRUPTS calls at the top level of btgetbitmap and hashgetbitmap, since they're pointless given the added checks. I think that GIST is okay already --- at least, there's a CHECK_FOR_INTERRUPTS at a plausible-looking place in gistnext(). I don't claim to know GIN well enough to try to poke it for this, if indeed it has a problem at all. This is a pre-existing issue, but in view of the lack of prior complaints I'm not going to risk back-patching.
*	Update comment for _bt_relandgetbuf.	Tom Lane	2009-05-05
\|
*	Change the default value of max_prepared_transactions to zero, and add	Tom Lane	2009-04-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	documentation warnings against setting it nonzero unless active use of prepared transactions is intended and a suitable transaction manager has been installed. This should help to prevent the type of scenario we've seen several times now where a prepared transaction is forgotten and eventually causes severe maintenance problems (or even anti-wraparound shutdown). The only real reason we had the default be nonzero in the first place was to support regression testing of the feature. To still be able to do that, tweak pg_regress to force a nonzero value during "make check". Since we cannot force a nonzero value in "make installcheck", add a variant regression test "expected" file that shows the results that will be obtained when max_prepared_transactions is zero. Also, extend the HINT messages for transaction wraparound warnings to mention the possibility that old prepared transactions are causing the problem. All per today's discussion.
*	After archive recovery, mark the last WAL segment from the parent timeline	Heikki Linnakangas	2009-04-22
\| \| \| \| \| \| \|	ready for archival. It was marked at the next checkpoint anyway, but waiting for the next checkpoint is an unnecessary delay. Fujii Masao
*	Add an optional parameter to pg_start_backup() that specifies whether to do	Tom Lane	2009-04-07
\| \| \| \| \| \|	the checkpoint in immediate or lazy mode. This is to address complaints that pg_start_backup() takes a long time even when there's no need to minimize its I/O consumption.
*	Fix 'all at one page bug' in picksplit method of R-tree emulation. Add defense	Teodor Sigaev	2009-04-06
\| \| \| \|	from buggy user-defined picksplit to GiST.
*	Fix infinite loop while checking of partial match in pending list.	Teodor Sigaev	2009-04-05
\| \| \| \| \|	Improve comments. Now GIN-indexable operators should be strict. Per Tom's questions/suggestions.
*	Remove the recently added node types ReloptElem and OptionDefElem in favor	Tom Lane	2009-04-04
\| \| \| \| \| \|	of adding optional namespace and action fields to DefElem. Having three node types that do essentially the same thing bloats the code and leads to errors of confusion, such as in yesterday's bug report from Khee Chin.
*	Disallow setting fillfactor for TOAST tables.	Alvaro Herrera	2009-04-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	To implement this without almost duplicating the reloption table, treat relopt_kind as a bitmask instead of an integer value. This decreases the range of allowed values, but it's not clear that there's need for that much values anyway. This patch also makes heap_reloptions explicitly a no-op for relation kinds other than heap and TOAST tables. Patch by ITAGAKI Takahiro with minor edits from me. (In particular I removed the bit about adding relation kind to an error message, which I intend to commit separately.)
*	Revert DTrace patch from Robert Lor	Bruce Momjian	2009-04-02
\|
*	Add support for additional DTrace probes.	Bruce Momjian	2009-04-02
\| \| \| \|	Robert Lor
*	Fix an oversight in the support for storing/retrieving "minimal tuples" in	Tom Lane	2009-03-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TupleTableSlots. We have functions for retrieving a minimal tuple from a slot after storing a regular tuple in it, or vice versa; but these were implemented by converting the internal storage from one format to the other. The problem with that is it invalidates any pass-by-reference Datums that were already fetched from the slot, since they'll be pointing into the just-freed version of the tuple. The known problem cases involve fetching both a whole-row variable and a pass-by-reference value from a slot that is fed from a tuplestore or tuplesort object. The added regression tests illustrate some simple cases, but there may be other failure scenarios traceable to the same bug. Note that the added tests probably only fail on unpatched code if it's built with --enable-cassert; otherwise the bug leads to fetching from freed memory, which will not have been overwritten without additional conditions. Fix by allowing a slot to contain both formats simultaneously; which turns out not to complicate the logic much at all, if anything it seems less contorted than before. Back-patch to 8.2, where minimal tuples were introduced.
*	Adjust the APIs for GIN opclass support functions to allow the extractQuery()	Tom Lane	2009-03-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	method to pass extra data to the consistent() and comparePartial() methods. This is the core infrastructure needed to support the soon-to-appear contrib/btree_gin module. The APIs are still upward compatible with the definitions used in 8.3 and before, although not with the previous 8.4devel function definitions. catversion bump for changes in pg_proc entries (although these are just cosmetic, since GIN doesn't actually look at the function signature before calling it...) Teodor Sigaev and Oleg Bartunov
*	Install a search tree depth limit in GIN bulk-insert operations, to prevent	Tom Lane	2009-03-24
\| \| \| \| \| \| \| \| \| \| \| \|	them from degrading badly when the input is sorted or nearly so. In this scenario the tree is unbalanced to the point of becoming a mere linked list, so insertions become O(N^2). The easiest and most safely back-patchable solution is to stop growing the tree sooner, ie limit the growth of N. We might later consider a rebalancing tree algorithm, but it's not clear that the benefit would be worth the cost and complexity. Per report from Sergey Burladyan and an earlier complaint from Heikki. Back-patch to 8.2; older versions didn't have GIN indexes.
*	Implement "fastupdate" support for GIN indexes, in which we try to accumulate	Tom Lane	2009-03-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	multiple index entries in a holding area before adding them to the main index structure. This helps because bulk insert is (usually) significantly faster than retail insert for GIN. This patch also removes GIN support for amgettuple-style index scans. The API defined for amgettuple is difficult to support with fastupdate, and the previously committed partial-match feature didn't really work with it either. We might eventually figure a way to put back amgettuple support, but it won't happen for 8.4. catversion bumped because of change in GIN's pg_am entry, and because the format of GIN indexes changed on-disk (there's a metapage now, and possibly a pending list). Teodor Sigaev
*	Const-ify the parse table passed to fillRelOptions. The previous coding	Tom Lane	2009-03-23
\| \| \| \|	meant it had to be built on-the-fly at each entry to default_reloptions.