postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Add noreturn attributes to some error reporting functions	Peter Eisentraut	2013-02-12
\|
*	Improve concurrency of foreign key locking	Alvaro Herrera	2013-01-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov
*	Update copyrights for 2013	Bruce Momjian	2013-01-01
\| \| \| \| \|	Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
*	Fix performance problems with autovacuum truncation in busy workloads.	Kevin Grittner	2012-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck
*	Cleanup VirtualXact at end of Hot Standby.	Simon Riggs	2012-11-29
\|
*	Add a small cache of locks owned by a resource owner in ResourceOwner.	Heikki Linnakangas	2012-06-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This speeds up reassigning locks to the parent owner, when the transaction holds a lot of locks, but only a few of them belong to the current resource owner. This is particularly helps pg_dump when dumping a large number of objects. The cache can hold up to 15 locks in each resource owner. After that, the cache is marked as overflowed, and we fall back to the old method of scanning the whole local lock table. The tradeoff here is that the cache has to be scanned whenever a lock is released, so if the cache is too large, lock release becomes more expensive. 15 seems enough to cover pg_dump, and doesn't have much impact on lock release. Jeff Janes, reviewed by Amit Kapila and Heikki Linnakangas.
*	Run pgindent on 9.2 source tree in preparation for first 9.3	Bruce Momjian	2012-06-10
\| \| \| \|	commit-fest.
*	Overdue code review for transaction-level advisory locks patch.	Tom Lane	2012-05-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 62c7bd31c8878dd45c9b9b2429ab7a12103f3590 had assorted problems, most visibly that it broke PREPARE TRANSACTION in the presence of session-level advisory locks (which should be ignored by PREPARE), as per a recent complaint from Stephen Rees. More abstractly, the patch made the LockMethodData.transactional flag not merely useless but outright dangerous, because in point of fact that flag no longer tells you anything at all about whether a lock is held transactionally. This fix therefore removes that flag altogether. We now rely entirely on the convention already in use in lock.c that transactional lock holds must be owned by some ResourceOwner, while session holds are never so owned. Setting the locallock struct's owner link to NULL thus denotes a session hold, and there is no redundant marker for that. PREPARE TRANSACTION now works again when there are session-level advisory locks, and it is also able to transfer transactional advisory locks to the prepared transaction, but for implementation reasons it throws an error if we hold both types of lock on a single lockable object. Perhaps it will be worth improving that someday. Assorted other minor cleanup and documentation editing, as well. Back-patch to 9.1, except that in the 9.1 branch I did not remove the LockMethodData.transactional flag for fear of causing an ABI break for any external code that might be examining those structs.
*	Finish rename of FastPathStrongLocks to FastPathStrongRelationLocks.	Robert Haas	2012-04-18
\| \| \| \| \| \| \| \|	Commit 8e5ac74c1249820ca55481223a95b9124b4a4f95 tried to do this renaming, but I relied on gcc to tell me where I needed to make changes, instead of grep. Noted by Jeff Davis.
*	Tighten up error recovery for fast-path locking.	Robert Haas	2012-04-18
\| \| \| \| \| \| \| \| \|	The previous code could cause a backend crash after BEGIN; SAVEPOINT a; LOCK TABLE foo (interrupted by ^C or statement timeout); ROLLBACK TO SAVEPOINT a; LOCK TABLE foo, and might have leaked strong-lock counts in other situations. Report by Zoltán Böszörményi; patch review by Jeff Davis.
*	Update copyright notices for year 2012.	Bruce Momjian	2012-01-01
\|
*	Revert removal of trace_userlocks, because userlocks aren't gone.	Robert Haas	2011-11-10
\| \| \| \| \| \|	This reverts commit 0180bd6180511875db046bf8ddcaa633a2952dfd. contrib/userlock is gone, but user-level locking still exists, and is exposed via the pg_advisory* family of functions.
*	Remove all "traces" of trace_userlocks, because userlocks were removed	Bruce Momjian	2011-10-13
\| \| \| \|	in PG 8.2.
*	Create VXID locks "lazily" in the main lock table.	Robert Haas	2011-08-04
\| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of entering them on transaction startup, we materialize them only when someone wants to wait, which will occur only during CREATE INDEX CONCURRENTLY. In Hot Standby mode, the startup process must also be able to probe for conflicting VXID locks, but the lock need never be fully materialized, because the startup process does not use the normal lock wait mechanism. Since most VXID locks never need to touch the lock manager partition locks, this can significantly reduce blocking contention on read-heavy workloads. Patch by me. Review by Jeff Davis.
*	Create a "fast path" for acquiring weak relation locks.	Robert Haas	2011-07-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an AccessShareLock, RowShareLock, or RowExclusiveLock is requested on an unshared database relation, and we can verify that no conflicting locks can possibly be present, record the lock in a per-backend queue, stored within the PGPROC, rather than in the primary lock table. This eliminates a great deal of contention on the lock manager LWLocks. This patch also refactors the interface between GetLockStatusData() and pg_lock_status() to be a bit more abstract, so that we don't rely so heavily on the lock manager's internal representation details. The new fast path lock structures don't have a LOCK or PROCLOCK structure to return, so we mustn't depend on that for purposes of listing outstanding locks. Review by Jeff Davis.
*	Add transaction-level advisory locks.	Itagaki Takahiro	2011-02-18
\| \| \| \| \| \| \| \| \|	They share the same locking namespace with the existing session-level advisory locks, but they are automatically released at the end of the current transaction and cannot be released explicitly via unlock functions. Marko Tiikkaja, reviewed by me.
*	Stamp copyrights for year 2011.	Bruce Momjian	2011-01-01
\|
*	Remove cvs keywords from all files.	Magnus Hagander	2010-09-20
\|
*	pgindent run for 9.0	Bruce Momjian	2010-02-26
\|
*	Update copyright for the year 2010.	Bruce Momjian	2010-01-02
\|
*	Allow read only connections during recovery, known as Hot Standby.	Simon Riggs	2009-12-19
\| \| \| \| \| \| \| \| \| \| \| \|	Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
*	A session that does not have any live snapshots does not have to be waited for	Tom Lane	2009-04-04
\| \| \| \| \| \| \| \| \| \| \| \| \|	when we are waiting for old snapshots to go away during a concurrent index build. In particular, this rule lets us avoid waiting for idle-in-transaction sessions. This logic could be improved further if we had some way to wake up when the session we are currently waiting for goes idle-in-transaction. However that would be a significantly more complex/invasive patch, so it'll have to wait for some other day. Simon Riggs, with some improvements by Tom.
*	Update copyright for 2009.	Bruce Momjian	2009-01-01
\|
*	Widen the nLocks counts in local lock tables from int to int64. This	Tom Lane	2008-09-16
\| \| \| \| \| \| \| \| \| \| \| \| \|	forestalls potential overflow when the same table (or other object, but usually tables) is accessed by very many successive queries within a single transaction. Per report from Michael Milligan. Back-patch to 8.0, which is as far back as the patch conveniently applies. There have been no reports of overflow in pre-8.3 releases, but clearly the risk existed all along. (Michael's report suggests that 8.3 may consume lock counts faster than prior releases, but with no test case to look at it's hard to be sure about that. Widening the counts seems a good future-proofing measure in any event.)
*	Restructure some header files a bit, in particular heapam.h, by removing some	Alvaro Herrera	2008-05-12
\| \| \| \| \| \| \| \| \| \| \| \|	unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.
*	lmgr.c:DescribeLockTag was never taught about virtual xids, per Greg Stark.	Tom Lane	2008-01-08
\| \| \| \| \|	Also a couple of minor tweaks to try to future-proof the code a bit better against future locktag additions.
*	Update copyrights in source tree to 2008.	Bruce Momjian	2008-01-01
\|
*	Re-run pgindent with updated list of typedefs. (Updated README should	Bruce Momjian	2007-11-15
\| \| \| \|	avoid this problem in the future.)
*	pgindent run for 8.3.	Bruce Momjian	2007-11-15
\|
*	Allow an autovacuum worker to be interrupted automatically when it is found	Alvaro Herrera	2007-10-26
\| \| \| \| \|	to be locking another process (except when it's working to prevent Xid wraparound problems).
*	Implement lazy XID allocation: transactions that do not modify any database	Tom Lane	2007-09-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rows will normally never obtain an XID at all. We already did things this way for subtransactions, but this patch extends the concept to top-level transactions. In applications where there are lots of short read-only transactions, this should improve performance noticeably; not so much from removal of the actual XID-assignments, as from reduction of overhead that's driven by the rate of XID consumption. We add a concept of a "virtual transaction ID" so that active transactions can be uniquely identified even if they don't have a regular XID. This is a much lighter-weight concept: uniqueness of VXIDs is only guaranteed over the short term, and no on-disk record is made about them. Florian Pflug, with some editorialization by Tom.
*	Code review for log_lock_waits patch. Don't try to issue log messages from	Tom Lane	2007-06-19
\| \| \| \| \| \| \|	within a signal handler (this might be safe given the relatively narrow code range in which the interrupt is enabled, but it seems awfully risky); do issue more informative log messages that tell what is being waited for and the exact length of the wait; minor other code cleanup. Greg Stark and Tom Lane
*	Fix trivial misspelling in comment.	Tom Lane	2007-05-30
\|
*	Add GUC log_lock_waits to log long wait times.	Bruce Momjian	2007-03-03
\| \| \| \|	Simon Riggs
*	Update CVS HEAD for 2007 copyright. Back branches are typically not	Bruce Momjian	2007-01-05
\| \| \| \|	back-stamped for this.
*	Update lock comments for concurrent index creation, analyze.	Bruce Momjian	2006-11-23
\| \| \| \|	Walter Cruz
*	pgindent run for 8.2.	Bruce Momjian	2006-10-04
\|
*	Fix pg_locks view to call advisory locks advisory locks, while preserving	Tom Lane	2006-09-22
\| \| \| \| \|	backward compatibility for anyone using the old userlock code that's now on pgfoundry --- locks from that code still show as 'userlock'.
*	Add built-in userlock manipulation functions to replace the former	Tom Lane	2006-09-18
\| \| \| \| \| \| \|	contrib functionality. Along the way, remove the USER_LOCKS configuration symbol, since it no longer makes any sense to try to compile that out. No user documentation yet ... mmoncure has promised to write some. Thanks to Abhijit Menon-Sen for creating a first draft to work from.
*	Add a function GetLockConflicts() to lock.c to report xacts holding	Tom Lane	2006-08-27
\| \| \| \| \| \| \| \| \|	locks that would conflict with a specified lock request, without actually trying to get that lock. Use this instead of the former ad hoc method of doing the first wait step in CREATE INDEX CONCURRENTLY. Fixes problem with undetected deadlock and in many cases will allow the index creation to proceed sooner than it otherwise could've. Per discussion with Greg Stark.
*	Change the relation_open protocol so that we obtain lock on a relation	Tom Lane	2006-07-31
\| \| \| \| \| \| \| \| \| \| \| \|	(table or index) before trying to open its relcache entry. This fixes race conditions in which someone else commits a change to the relation's catalog entries while we are in process of doing relcache load. Problems of that ilk have been reported sporadically for years, but it was not really practical to fix until recently --- for instance, the recent addition of WAL-log support for in-place updates helped. Along the way, remove pg_am.amconcurrent: all AMs are now expected to support concurrent update.
*	Convert the lock manager to use the new dynahash.c support for partitioned	Tom Lane	2006-07-23
\| \| \| \| \|	hash tables, instead of the previous kluge involving multiple hash tables. This partially undoes my patch of last December.
*	Split the buffer mapping table into multiple separately lockable	Tom Lane	2006-07-23
\| \| \| \| \|	partitions, as per discussion. Passes functionality checks, but I don't have any performance data yet.
*	Update copyright for 2006. Update scripts.	Bruce Momjian	2006-03-05
\|
*	Divide the lock manager's shared state into 'partitions', so as to	Tom Lane	2005-12-11
\| \| \| \| \| \| \|	reduce contention for the former single LockMgrLock. Per my recent proposal. I set it up for 16 partitions, but on a pgbench test this gives only a marginal further improvement over 4 partitions --- we need to test more scenarios to choose the number of partitions.
*	Simplify lock manager data structures by making a clear separation between	Tom Lane	2005-12-09
\| \| \| \| \| \| \| \| \| \| \| \|	the data defining the semantics of a lock method (ie, conflict resolution table and ancillary data, which is all constant) and the hash tables storing the current state. The only thing we give up by this is the ability to use separate hashtables for different lock methods, but there is no need for that anyway. Put some extra fields into the LockMethod definition structs to clean up some other uglinesses, like hard-wired tests for DEFAULT_LOCKMETHOD and USER_LOCKMETHOD. This commit doesn't do anything about the performance issues we were discussing, but it clears away some of the underbrush that's in the way of fixing that.
*	Standard pgindent run for 8.1.	Bruce Momjian	2005-10-15
\|
*	Convert the arithmetic for shared memory size calculation from 'int'	Tom Lane	2005-08-20
\| \| \| \| \| \| \| \| \| \| \|	to 'Size' (that is, size_t), and install overflow detection checks in it. This allows us to remove the former arbitrary restrictions on NBuffers etc. It won't make any difference in a 32-bit machine, but in a 64-bit machine you could theoretically have terabytes of shared buffers. (How efficiently we could manage 'em remains to be seen.) Similarly, num_temp_buffers, work_mem, and maintenance_work_mem can be set above 2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional work by moi.
*	Two-phase commit. Original patch by Heikki Linnakangas, with additional	Tom Lane	2005-06-17
\| \| \| \|	hacking by Alvaro Herrera and Tom Lane.
*	Simplify shared-memory lock data structures as per recent discussion:	Tom Lane	2005-06-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	it is sufficient to track whether a backend holds a lock or not, and store information about transaction vs. session locks only in the inside-the-backend LocalLockTable. Since there can now be but one PROCLOCK per lock per backend, LockCountMyLocks() is no longer needed, thus eliminating some O(N^2) behavior when a backend holds many locks. Also simplify the LockAcquire/LockRelease API by passing just a 'sessionLock' boolean instead of a transaction ID. The previous API was designed with the idea that per-transaction lock holding would be important for subtransactions, but now that we have subtransactions we know that this is unwanted. While at it, add an 'isTempObject' parameter to LockAcquire to indicate whether the lock is being taken on a temp table. This is not used just yet, but will be needed shortly for two-phase commit.