aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/heap/pruneheap.c
Commit message (Collapse)AuthorAge
* Fix rare missing cancellations in Hot Standby.Simon Riggs2013-01-24
| | | | | | | | | | | | The machinery around XLOG_HEAP2_CLEANUP_INFO failed to correctly pass through the necessary information on latestRemovedXid, avoiding cancellations in some infrequent concurrent update/cleanup scenarios. Backpatchable fix to 9.0 Detailed bug report and fix by Noah Misch, backpatchable version by me.
* Improve concurrency of foreign key lockingAlvaro Herrera2013-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Split tuple struct defs from htup.h to htup_details.hAlvaro Herrera2012-08-30
| | | | | | | | | | | | This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.
* Split heapam_xlog.h from heapam.hAlvaro Herrera2012-08-28
| | | | | | | | | | | | The heapam XLog functions are used by other modules, not all of which are interested in the rest of the heapam API. With this, we let them get just the XLog stuff in which they are interested and not pollute them with unrelated includes. Also, since heapam.h no longer requires xlog.h, many files that do include heapam.h no longer get xlog.h automatically, including a few headers. This is useful because heapam.h is getting pulled in by execnodes.h, which is in turn included by a lot of files.
* Delete inaccurate C comment about FSM and adding pages, per Robert Haas.Bruce Momjian2012-08-16
|
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Remove unnecessary #include references, per pgrminclude script.Bruce Momjian2011-09-01
|
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Generalize concept of temporary relations to "relation persistence".Robert Haas2010-12-13
| | | | | | | | | | | | | | | This commit replaces pg_class.relistemp with pg_class.relpersistence; and also modifies the RangeVar node type to carry relpersistence rather than istemp. It also removes removes rd_istemp from RelationData and instead performs the correct computation based on relpersistence. For clarity, we add three new macros: RelationNeedsWAL(), RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we can clarify the purpose of each check that previous depended on rd_istemp. This is intended as infrastructure for the upcoming unlogged tables patch, as well as for future possible work on global temporary tables.
* Reduce spurious Hot Standby conflicts from never-visible records.Simon Riggs2010-12-09
| | | | | | | | | | | | Hot Standby conflicts only with tuples that were visible at some point. So ignore tuples from aborted transactions or for tuples updated/deleted during the inserting transaction when generating the conflict transaction ids. Following detailed analysis and test case by Noah Misch. Original report covered btree delete records, correctly observed by Heikki Linnakangas that this applies to other cases also. Fix covers all sources of cleanup records via common code.
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* pgindent run for 9.0, second runBruce Momjian2010-07-06
|
* Further reductions in Hot Standby conflict processing. TheseSimon Riggs2010-04-22
| | | | | | | | | come from the realistion that HEAP2_CLEAN records don't always remove user visible data, so conflict processing for them can be skipped. Confirm validity using Assert checks, clarify circumstances under which we log heap_cleanup_info records. Tuning arises from bug fixing of earlier safety check failures.
* Fix oversight in collecting values for cleanup_info records.Simon Riggs2010-04-21
| | | | | | | vacuum_log_cleanup_info() now generates log records with a valid latestRemovedXid set in all cases. Also be careful not to zero the value when we do a round of vacuuming part-way through lazy_scan_heap(). Incidentally, this reduces frequency of conflicts in Hot Standby.
* pgindent run for 9.0Bruce Momjian2010-02-26
|
* Remove old-style VACUUM FULL (which was known for a little while asTom Lane2010-02-08
| | | | | | | | | | | | | | | | | VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity. Per discussion, the use case for this method of vacuuming is no longer large enough to justify maintaining it; not to mention that we don't wish to invest the work that would be needed to make it play nicely with Hot Standby. Aside from the code directly related to old-style VACUUM FULL, this commit removes support for certain WAL record types that could only be generated within VACUUM FULL, redirect-pointer removal in heap_page_prune, and nontransactional generation of cache invalidation sinval messages (the last being the sticking point for Hot Standby). We still have to retain all code that copes with finding HEAP_MOVED_OFF and HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long as we want to support in-place update from pre-9.0 databases.
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Allow read only connections during recovery, known as Hot Standby.Simon Riggs2009-12-19
| | | | | | | | | | | | Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Clean up the use of some page-header-access macros: principally, useTom Lane2008-07-13
| | | | | | | | | | SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that makes the code clearer, and avoid casting between Page and PageHeader where possible. Zdenek Kotala, with some additional cleanup by Heikki Linnakangas. I did not apply the parts of the proposed patch that would have resulted in slightly changing the on-disk format of hash indexes; it seems to me that's not a win as long as there's any chance of having in-place upgrade for 8.4.
* Improve our #include situation by moving pointer types away from theAlvaro Herrera2008-06-19
| | | | | | | corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.
* Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relationHeikki Linnakangas2008-06-12
| | | | | | | | | | forks. XLogOpenRelation() and the associated light-weight relation cache in xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument, instead of Relation. For functions that still need a Relation struct during WAL replay, there's a new function called CreateFakeRelcacheEntry() that returns a fake entry like XLogOpenRelation() used to.
* Move BufferGetPageSize and BufferGetPage from bufpage.h to bufmgr.h. It isAlvaro Herrera2008-06-08
| | | | | | | | | | more logical that way, and also it reduces the amount of unnecessary includes in bufpage.h, which is widely used. Zdenek Kotala. My previous patch to bufpage.h should also have credited him as author, but I forgot (sorry about that).
* This is the patch replace offnum++ by OffsetNumberNext, to beBruce Momjian2008-05-13
| | | | | | consistent. OffsetNumberNext() has some casting that makes it useful. Fujii Masao
* Put back bufmgr.h in bufpage.h -- it is needed by some macros.Alvaro Herrera2008-05-12
| | | | | Remove #include bufmgr.h from (most?) source files which already include bufpage.h.
* Restructure some header files a bit, in particular heapam.h, by removing someAlvaro Herrera2008-05-12
| | | | | | | | | | | | unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.
* Move the HTSU_Result enum definition into snapshot.h, to avoid includingAlvaro Herrera2008-03-26
| | | | | | tqual.h into heapam.h. This makes all inclusion of tqual.h explicit. I also sorted alphabetically the includes on some source files.
* Fix heap_page_prune's problem with failing to send cache invalidationTom Lane2008-03-13
| | | | | | | | | | | messages if the calling transaction aborts later on. Collapsing out line pointer redirects is a done deal as soon as we complete the page update, so syscache *must* be notified even if the VACUUM FULL as a whole doesn't complete. To fix, add some functionality to inval.c to allow the pending inval messages to be sent immediately while heap_page_prune is still running. The implementation is a bit chintzy: it will only work in the context of VACUUM FULL. But that's all we need now, and it can always be extended later if needed. Per my trouble report of a week ago.
* Refactor heap_page_prune so that instead of changing item states on-the-fly,Tom Lane2008-03-08
| | | | | | | | | | | | | | | | | it accumulates the set of changes to be made and then applies them. It had to accumulate the set of changes anyway to prepare a WAL record for the pruning action, so this isn't an enormous change; the only new complexity is to not doubly mark tuples that are visited twice in the scan. The main advantage is that we can substantially reduce the scope of the critical section in which the changes are applied, thus avoiding PANIC in foreseeable cases like running out of memory in inval.c. A nice secondary advantage is that it is now far clearer that WAL replay will actually do the same thing that the original pruning did. This commit doesn't do anything about the open problem that CacheInvalidateHeapTuple doesn't have the right semantics for a CTID change caused by collapsing out a redirect pointer. But whatever we do about that, it'll be a good idea to not do it inside a critical section.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* Re-run pgindent with updated list of typedefs. (Updated README shouldBruce Momjian2007-11-15
| | | | avoid this problem in the future.)
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Keep heap_page_prune from marking the buffer dirty when it didn'tTom Lane2007-10-24
| | | | | really change anything. Per report from Itagaki Takahiro. Fix by Pavan Deolasee.
* Improve handling of prune/no-prune decisions by storing a page's oldestTom Lane2007-09-21
| | | | | | unpruned XMAX in its header. At the cost of 4 bytes per page, this keeps us from performing heap_page_prune when there's no chance of pruning anything. Seems to be necessary per Heikki's preliminary performance testing.
* HOT updates. When we update a tuple without changing any of its indexedTom Lane2007-09-20
columns, and the new version can be stored on the same heap page, we no longer generate extra index entries for the new version. Instead, index searches follow the HOT-chain links to ensure they find the correct tuple version. In addition, this patch introduces the ability to "prune" dead tuples on a per-page basis, without having to do a complete VACUUM pass to recover space. VACUUM is still needed to clean up dead index entries, however. Pavan Deolasee, with help from a bunch of other people.