postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	On GiST page split, release the locks on child pages before recursing up.	Heikki Linnakangas	2012-05-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When inserting the downlinks for a split gist page, we used hold the locks on the child pages until the insertion into the parent - and recursively its parent if it had to be split too - were all completed. Change that so that the locks on child pages are released after the insertion in the immediate parent is done, before recursing further up the tree. This reduces the number of lwlocks that are held simultaneously. Holding many locks is bad for concurrency, and in extreme cases you can even hit the limit of 100 simultaneously held lwlocks in a backend. If you're really unlucky, you can hit the limit while in a critical section, which brings down the whole system. This fixes bug #6629 reported by Tom Forbes. Backpatch to 9.1. The page splitting code was rewritten in 9.1, and the old code did not have this problem.
*	Throw error sooner for unlogged GiST indexes.	Tom Lane	2012-02-08
\| \| \| \| \| \|	Throwing an error only after we've built the main index fork is pretty unfriendly when the table already contains data. Per gripe from Jay Levitt.
*	gistendscan() forgot to free so->giststate.	Tom Lane	2011-09-16
\| \| \| \| \| \| \| \| \| \|	This oversight led to a massive memory leak --- upwards of 10KB per tuple --- during creation-time verification of an exclusion constraint based on a GIST index. In most other scenarios it'd just be a leak of 10KB that would be recovered at end of query, so not too significant; though perhaps the leak would be noticeable in a situation where a GIST index was being used in a nestloop inner indexscan. In any case, it's a real leak of long standing, so patch all supported branches. Per report from Harald Fuchs.
*	Fix two ancient bugs in GiST code to re-find a parent after page split:	Heikki Linnakangas	2011-07-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	First, when following a right-link, we incorrectly marked the current page as the parent of the right sibling. In reality, the parent of the right page is the same as the parent of the current page (or some page to the right of it, gistFindCorrectParent() will sort that out). Secondly, when we follow a right-link, we must prepend, not append, the right page to our list of pages to visit. That's because we assume that once we hit a leaf page in the list, all the rest are leaf pages too, and give up. To hit these bugs, you need concurrent actions and several unlucky accidents. Another backend must split the root page, while you're in process of splitting a lower-level page. Furthermore, while you scan the internal nodes to re-find the parent, another backend needs to again split some more internal pages. Even then, the bugs don't necessarily manifest as user-visible errors or index corruption. While we're at it, make the error reporting a bit better if gistFindPath() fails to re-find the parent. It used to be an assertion, but an elog() seems more appropriate. Backpatch to all supported branches.
*	Message style and spelling improvements	Peter Eisentraut	2011-06-22
\|
*	Pgindent run before 9.1 beta2.	Bruce Momjian	2011-06-09
\|
*	Protect GIST logic that assumes penalty values can't be negative.	Tom Lane	2011-05-31
\| \| \| \| \| \| \| \| \| \|	Apparently sane-looking penalty code might return small negative values, for example because of roundoff error. This will confuse places like gistchoose(). Prevent problems by clamping negative penalty values to zero. (Just to be really sure, I also made it force NaNs to zero.) Back-patch to all supported branches. Alexander Korotkov
*	Spell checking and markup refinement	Peter Eisentraut	2011-05-19
\|
*	Make GIN and GIST pass the index collation to all their support functions.	Tom Lane	2011-04-22
\| \| \| \| \| \| \|	Experimentation with contrib/btree_gist shows that the majority of the GIST support functions potentially need collation information. Safest policy seems to be to pass it to all of them, instead of making assumptions about which ones could possibly need it.
*	Pass collations to functions in FunctionCallInfoData, not FmgrInfo.	Tom Lane	2011-04-12
\| \| \| \| \| \| \| \| \| \| \|	Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...
*	Clean up most -Wunused-but-set-variable warnings from gcc 4.6	Peter Eisentraut	2011-04-11
\| \| \| \| \| \|	This warning is new in gcc 4.6 and part of -Wall. This patch cleans up most of the noise, but there are some still warnings that are trickier to remove.
*	pgindent run before PG 9.1 beta 1.	Bruce Momjian	2011-04-10
\|
*	Clean up cruft around collation initialization for tupdescs and scankeys.	Tom Lane	2011-03-26
\| \| \| \| \|	I found actual bugs in GiST and plpgsql; the rest of this is cosmetic but meant to decrease the odds of future bugs of omission.
*	Fix crash in the new GiST insertion code, when an update splits the root page.	Heikki Linnakangas	2011-01-09
\| \| \| \|	This bug was exercised by contrib/intarray/bench, as noted by Tom Lane.
*	Stamp copyrights for year 2011.	Bruce Momjian	2011-01-01
\|
*	Support unlogged tables.	Robert Haas	2010-12-29
\| \| \| \| \| \| \|	The contents of an unlogged table are WAL-logged; thus, they are not available on standby servers and are truncated whenever the database system enters recovery. Indexes on unlogged tables are also unlogged. Unlogged GiST indexes are not currently supported.
*	Rewrite the GiST insertion logic so that we don't need the post-recovery	Heikki Linnakangas	2010-12-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cleanup stage to finish incomplete inserts or splits anymore. There was two reasons for the cleanup step: 1. When a new tuple was inserted to a leaf page, the downlink in the parent needed to be updated to contain (ie. to be consistent with) the new key. Updating the parent in turn might require recursively updating the parent of the parent. We now handle that by updating the parent while traversing down the tree, so that when we insert the leaf tuple, all the parents are already consistent with the new key, and the tree is consistent at every step. 2. When a page is split, we need to insert the downlink for the new right page(s), and update the downlink for the original page to not include keys that moved to the right page(s). We now handle that by setting a new flag, F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is set, scans always follow the rightlink, regardless of the NSN mechanism used to detect concurrent page splits. That way the tree is consistent right after split, even though the downlink is still missing. This is very similar to the way B-tree splits are handled. When the downlink is inserted in the parent, the flag is cleared. To keep the insertion algorithm simple, when an insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it finishes the split before doing anything else. These changes allow removing the whole "invalid tuple" mechanism, but I retained the scan code to still follow invalid tuples correctly. While we don't create any such tuples anymore, we want to handle them gracefully in case you pg_upgrade a GiST index that has them. If we encounter any on an insert, though, we just throw an error saying that you need to REINDEX. The issue that got me into doing this is that if you did a checkpoint while an insert or split was in progress, and the checkpoint finishes quickly so that there is no WAL record related to the insert between RedoRecPtr and the checkpoint record, recovery from that checkpoint would not know to finish the incomplete insert. IOW, we have the same issue we solved with the rm_safe_restartpoint mechanism during normal operation too. It's highly unlikely to happen in practice, and this fix is far too large to backpatch, so we're just going to live with in previous versions, but this refactoring fixes it going forward. With this patch, you don't get the annoying 'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices anymore if you crash at an unfortunate moment.
*	Generalize concept of temporary relations to "relation persistence".	Robert Haas	2010-12-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit replaces pg_class.relistemp with pg_class.relpersistence; and also modifies the RangeVar node type to carry relpersistence rather than istemp. It also removes removes rd_istemp from RelationData and instead performs the correct computation based on relpersistence. For clarity, we add three new macros: RelationNeedsWAL(), RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we can clarify the purpose of each check that previous depended on rd_istemp. This is intended as infrastructure for the upcoming unlogged tables patch, as well as for future possible work on global temporary tables.
*	Fix two small bugs in new gistget.c logic.	Tom Lane	2010-12-04
\| \| \| \| \| \| \| \| \| \| \| \| \|	1. Complain, rather than silently doing nothing, if an "invalid" tuple is found on a leaf page. Per off-list discussion with Heikki. 2. Fix oversight in code that removes a GISTSearchItem from the search queue: we have to reset lastHeap if this was the last heap item in the parent GISTSearchTreeItem. Otherwise subsequent additions will do the wrong thing. This was probably masked in early testing because in typical cases the parent item would now be completely empty and would be deleted on next call. You'd need a queued non-leaf page at exactly the same distance as a heap tuple to expose the bug.
*	Add external documentation for KNNGIST.	Tom Lane	2010-12-03
\|
*	Put back gistgettuple's check for backwards scan request.	Tom Lane	2010-12-03
\| \| \| \| \|	On reflection it's a bad idea for the KNNGIST patch to have removed that. We don't want it silently returning incorrect answers.
*	KNNGIST, otherwise known as order-by-operator support for GIST.	Tom Lane	2010-12-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit represents a rather heavily editorialized version of Teodor's builtin_knngist_itself-0.8.2 and builtin_knngist_proc-0.8.1 patches. I redid the opclass API to add a separate Distance method instead of turning the Consistent method into an illogical mess, fixed some bit-rot in the rbtree interfaces, and generally worked over the code style and comments. There's still no non-code documentation to speak of, but I'll work on that separately. Some contrib-module changes are also yet to come (right now, point <-> point is the only KNN-ified operator). Teodor Sigaev and Tom Lane
*	Create core infrastructure for KNNGIST.	Tom Lane	2010-12-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a heavily revised version of builtin_knngist_core-0.9. The ordering operators are no longer mixed in with actual quals, which would have confused not only humans but significant parts of the planner. Instead, ordering operators are carried separately throughout planning and execution. Since the API for ambeginscan and amrescan functions had to be changed anyway, this commit takes the opportunity to rationalize that a bit. RelationGetIndexScan no longer forces a premature index_rescan call; instead, callers of index_beginscan must call index_rescan too. Aside from making the AM-side initialization logic a bit less peculiar, this has the advantage that we do not make a useless extra am_rescan call when there are runtime key values. AMs formerly could not assume that the key values passed to amrescan were actually valid; now they can. Teodor Sigaev and Tom Lane
*	Remove useless whitespace at end of lines	Peter Eisentraut	2010-11-23
\|
*	The GiST scan algorithm uses LSNs to detect concurrent pages splits, but	Heikki Linnakangas	2010-11-16
\| \| \| \| \| \| \| \| \| \| \| \| \|	temporary indexes are not WAL-logged. We used a constant LSN for temporary indexes, on the assumption that we don't need to worry about concurrent page splits in temporary indexes because they're only visible to the current session. But that assumption is wrong, it's possible to insert rows and split pages in the same session, while a scan is in progress. For example, by opening a cursor and fetching some rows, and INSERTing new rows before fetching some more. Fix by generating fake increasing LSNs, used in place of real LSNs in temporary GiST indexes.
*	Cleanup various comparisons with the constant "true".	Robert Haas	2010-11-14
\| \| \| \|	Itagaki Takahiro, with slight modifications.
*	Remove cvs keywords from all files.	Magnus Hagander	2010-09-20
\|
*	Typo fix. Kevin Grittner.	Robert Haas	2010-04-14
\|
*	pgindent run for 9.0	Bruce Momjian	2010-02-26
\|
*	Remove some more dead VACUUM-FULL-only code.	Tom Lane	2010-02-08
\|
*	Remove old-style VACUUM FULL (which was known for a little while as	Tom Lane	2010-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity. Per discussion, the use case for this method of vacuuming is no longer large enough to justify maintaining it; not to mention that we don't wish to invest the work that would be needed to make it play nicely with Hot Standby. Aside from the code directly related to old-style VACUUM FULL, this commit removes support for certain WAL record types that could only be generated within VACUUM FULL, redirect-pointer removal in heap_page_prune, and nontransactional generation of cache invalidation sinval messages (the last being the sticking point for Hot Standby). We still have to retain all code that copes with finding HEAP_MOVED_OFF and HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long as we want to support in-place update from pre-9.0 databases.
*	Add point_ops opclass for GiST.	Teodor Sigaev	2010-01-14
\|
*	Update copyright for the year 2010.	Bruce Momjian	2010-01-02
\|
*	Support "x IS NOT NULL" clauses as indexscan conditions. This turns out	Tom Lane	2010-01-01
\| \| \| \| \| \| \| \| \| \| \|	to be just a minor extension of the previous patch that made "x IS NULL" indexable, because we can treat the IS NOT NULL condition as if it were "x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option), just like IS NULL is treated like "x = NULL". Aside from any possible usefulness in its own right, this is an important improvement for index-optimized MAX/MIN aggregates: it is now reliably possible to get a column's min or max value cheaply, even when there are a lot of nulls cluttering the interesting end of the index.
*	Fix wrong WAL info value generated when gistContinueInsert() performs an	Tom Lane	2009-12-24
\| \| \| \| \| \| \| \|	index page split. This would result in index corruption, or even more likely an error during WAL replay, if we were unlucky enough to crash during end-of-recovery cleanup after having completed an incomplete GIST insertion. Yoichi Hirai
*	Allow read only connections during recovery, known as Hot Standby.	Simon Riggs	2009-12-19
\| \| \| \| \| \| \| \| \| \| \| \|	Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
*	Remove very ancient tuple-counting infrastructure (IncrRetrieved() and	Tom Lane	2009-10-08
\| \| \| \| \| \| \| \| \|	friends). This code has all been ifdef'd out for many years, and doesn't seem to have any prospect of becoming any more useful in the future. EXPLAIN ANALYZE is what people use in practice, and I think if we did want process-wide counters we'd be more likely to put in dtrace events for that than try to resurrect this code. Get rid of it so as to have one less detail to worry about while refactoring execMain.c.
*	Fix incorrect arguments for gist_box_penalty call. The bug could be observed	Teodor Sigaev	2009-09-18
\| \| \| \| \| \|	only for secondary page split (i.e. for non-first columns of index) Patch by Paul Ramsey <pramsey@opengeo.org>
*	Support deferrable uniqueness constraints.	Tom Lane	2009-07-29
\| \| \| \| \| \| \| \| \| \|	The current implementation fires an AFTER ROW trigger for each tuple that looks like it might be non-unique according to the index contents at the time of insertion. This works well as long as there aren't many conflicts, but won't scale to massive unique-key reassignments. Improving that case is a TODO item. Dean Rasheed
*	Correct grammar in picksplit debug messages	Peter Eisentraut	2009-06-24
\|
*	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list	Bruce Momjian	2009-06-11
\| \| \| \|	provided by Andrew.
*	Improve capitalization and punctuation in recently added GiST message.	Peter Eisentraut	2009-06-10
\|
*	Improve the IndexVacuumInfo/IndexBulkDeleteResult API to allow somewhat sane	Tom Lane	2009-06-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	behavior in cases where we don't know the heap tuple count accurately; in particular partial vacuum, but this also makes the API a bit more useful for ANALYZE. This patch adds "estimated_count" flags to both structs so that an approximate count can be flagged as such, and adjusts the logic so that approximate counts are not used for updating pg_class.reltuples. This fixes my previous complaint that VACUUM was putting ridiculous values into pg_class.reltuples for indexes. The actual impact of that bug is limited, because the planner only pays attention to reltuples for an index if the index is partial; which probably explains why beta testers hadn't noticed a degradation in plan quality from it. But it needs to be fixed. The whole thing is a bit messy and should be redesigned in future, because reltuples now has the potential to drift quite far away from reality when a long period elapses with no non-partial vacuums. But this is as good as it's going to get for 8.4.
*	Fix 'all at one page bug' in picksplit method of R-tree emulation. Add defense	Teodor Sigaev	2009-04-06
\| \| \| \|	from buggy user-defined picksplit to GiST.
*	Implement "fastupdate" support for GIN indexes, in which we try to accumulate	Tom Lane	2009-03-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	multiple index entries in a holding area before adding them to the main index structure. This helps because bulk insert is (usually) significantly faster than retail insert for GIN. This patch also removes GIN support for amgettuple-style index scans. The API defined for amgettuple is difficult to support with fastupdate, and the previously committed partial-match feature didn't really work with it either. We might eventually figure a way to put back amgettuple support, but it won't happen for 8.4. catversion bumped because of change in GIN's pg_am entry, and because the format of GIN indexes changed on-disk (there's a metapage now, and possibly a pending list). Teodor Sigaev
*	Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock should	Heikki Linnakangas	2009-01-20
\| \| \| \| \| \| \| \| \|	be used instead of the normal exclusive lock, and make WAL redo functions responsible for calling RestoreBkpBlocks(). They know better what kind of a lock they need. At the moment, this just moves things around with no functional change, but makes the hot standby patch that's under review cleaner.
*	Change the reloptions machinery to use a table-based parser, and provide	Alvaro Herrera	2009-01-05
\| \| \| \| \| \| \| \|	a more complete framework for writing custom option processing routines by user-defined access methods. Catalog version bumped due to the general API changes, which are going to affect user-defined "amoptions" routines.
*	Update copyright for 2009.	Bruce Momjian	2009-01-01
\|
*	Initialize GISTScanOpaque->qual_ok even if there is no conditions.	Teodor Sigaev	2008-12-04
\|
*	Rethink the way FSM truncation works. Instead of WAL-logging FSM	Heikki Linnakangas	2008-11-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To make that cleaner from modularity point of view, move the WAL-logging one level up to RelationTruncate, and move RelationTruncate and all the related WAL-logging to new src/backend/catalog/storage.c file. Introduce new RelationCreateStorage and RelationDropStorage functions that are used instead of calling smgrcreate/smgrscheduleunlink directly. Move the pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new functions. This leaves smgr.c as a thin wrapper around md.c; all the transactional stuff is now in storage.c. This will make it easier to add new forks with similar truncation logic, like the visibility map.