aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/heap/heapam.c
Commit message (Collapse)AuthorAge
...
* Fix a couple of bugs in MultiXactId freezingAlvaro Herrera2013-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look into a multixact to check the members against cutoff_xid. This means that a very old Xid could survive hidden within a multi, possibly outliving its CLOG storage. In the distant future, this would cause clog lookup failures: ERROR: could not access status of transaction 3883960912 DETAIL: Could not open file "pg_clog/0E78": No such file or directory. This mostly was problematic when the updating transaction aborted, since in that case the row wouldn't get pruned away earlier in vacuum and the multixact could possibly survive for a long time. In many cases, data that is inaccessible for this reason way can be brought back heuristically. As a second bug, heap_freeze_tuple() didn't properly handle multixacts that need to be frozen according to cutoff_multi, but whose updater xid is still alive. Instead of preserving the update Xid, it just set Xmax invalid, which leads to both old and new tuple versions becoming visible. This is pretty rare in practice, but a real threat nonetheless. Existing corrupted rows, unfortunately, cannot be repaired in an automated fashion. Existing physical replicas might have already incorrectly frozen tuples because of different behavior than in master, which might only become apparent in the future once pg_multixact/ is truncated; it is recommended that all clones be rebuilt after upgrading. Following code analysis caused by bug report by J Smith in message CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com and privately by F-Secure. Backpatch to 9.3, where freezing of MultiXactIds was introduced. Analysis and patch by Andres Freund, with some tweaks by Álvaro.
* Don't TransactionIdDidAbort in HeapTupleGetUpdateXidAlvaro Herrera2013-11-29
| | | | | | | | | | | | | | | | It is dangerous to do so, because some code expects to be able to see what's the true Xmax even if it is aborted (particularly while traversing HOT chains). So don't do it, and instead rely on the callers to verify for abortedness, if necessary. Several race conditions and bugs fixed in the process. One isolation test changes the expected output due to these. This also reverts commit c235a6a589b, which is no longer necessary. Backpatch to 9.3, where this function was introduced. Andres Freund
* Refine our definition of what constitutes a system relation.Robert Haas2013-11-28
| | | | | | | | | | | | | | | | | | | | | | | Although user-defined relations can't be directly created in pg_catalog, it's possible for them to end up there, because you can create them in some other schema and then use ALTER TABLE .. SET SCHEMA to move them there. Previously, such relations couldn't afterwards be manipulated, because IsSystemRelation()/IsSystemClass() rejected all attempts to modify objects in the pg_catalog schema, regardless of their origin. With this patch, they now reject only those objects in pg_catalog which were created at initdb-time, allowing most operations on user-created tables in pg_catalog to proceed normally. This patch also adds new functions IsCatalogRelation() and IsCatalogClass(), which is similar to IsSystemRelation() and IsSystemClass() but with a slightly narrower definition: only TOAST tables of system catalogs are included, rather than *all* TOAST tables. This is currently used only for making decisions about when invalidation messages need to be sent, but upcoming logical decoding patches will find other uses for this information. Andres Freund, with some modifications by me.
* Unbreak buildfarmAlvaro Herrera2013-11-28
| | | | | I removed an intermediate commit before pushing and forgot to test the resulting tree :-(
* Use a more granular approach to follow update chainsAlvaro Herrera2013-11-28
| | | | | | | | Instead of simply checking the KEYS_UPDATED bit, we need to check whether each lock held on the future version of the tuple conflicts with the lock we're trying to acquire. Per bug report #8434 by Tomonari Katsumata
* Compare Xmin to previous Xmax when locking an update chainAlvaro Herrera2013-11-28
| | | | | | | | | | | | Not doing so causes us to traverse an update chain that has been broken by concurrent page pruning. All other code that traverses update chains uses this check as one of the cases in which to stop iterating, so replicate it here too. Failure to do so leads to erroneous CLOG, subtrans or multixact lookups. Per discussion following the bug report by J Smith in CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com as diagnosed by Andres Freund.
* Cope with heap_fetch failure while locking an update chainAlvaro Herrera2013-11-28
| | | | | | | | | | | The reason for the fetch failure is that the tuple was removed because it was dead; so the failure is innocuous and can be ignored. Moreover, there's no need for further work and we can return success to the caller immediately. EvalPlanQualFetch is doing something very similar to this already. Report and test case from Andres Freund in 20131124000203.GA4403@alap2.anarazel.de
* Fix bugs in SSI tuple locking.Heikki Linnakangas2013-10-08
| | | | | | | | | | | | | 1. In heap_hot_search_buffer(), the PredicateLockTuple() call is passed wrong offset number. heapTuple->t_self is set to the tid of the first tuple in the chain that's visited, not the one actually being read. 2. CheckForSerializableConflictIn() uses the tuple's t_ctid field instead of t_self to check for exiting predicate locks on the tuple. If the tuple was updated, but the updater rolled back, t_ctid points to the aborted dead tuple. Reported by Hannu Krosing. Backpatch to 9.1.
* Rename various "freeze multixact" variablesAlvaro Herrera2013-09-16
| | | | | | | | | It seems to make more sense to use "cutoff multixact" terminology throughout the backend code; "freeze" is associated with replacing of an Xid with FrozenTransactionId, which is not what we do for MultiXactIds. Andres Freund Some adjustments by Álvaro Herrera
* Adjust HeapTupleSatisfies* routines to take a HeapTuple.Robert Haas2013-07-22
| | | | | | | | | | | | | | | | | | Previously, these functions took a HeapTupleHeader, but upcoming patches for logical replication will introduce new a new snapshot type under which the tuple's TID will be used to lookup (CMIN, CMAX) for visibility determination purposes. This makes that information available. Code churn is minimal since HeapTupleSatisfiesVisibility took the HeapTuple anyway, and deferenced it before calling the satisfies function. Independently of logical replication, this allows t_tableOid and t_self to be cross-checked via assertions in tqual.c. This seems like a useful way to make sure that all callers are setting these values properly, which has been previously put forward as desirable. Andres Freund, reviewed by Álvaro Herrera
* Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.Robert Haas2013-07-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | SnapshotNow scans have the undesirable property that, in the face of concurrent updates, the scan can fail to see either the old or the new versions of the row. In many cases, we work around this by requiring DDL operations to hold AccessExclusiveLock on the object being modified; in some cases, the existing locking is inadequate and random failures occur as a result. This commit doesn't change anything related to locking, but will hopefully pave the way to allowing lock strength reductions in the future. The major issue has held us back from making this change in the past is that taking an MVCC snapshot is significantly more expensive than using a static special snapshot such as SnapshotNow. However, testing of various worst-case scenarios reveals that this problem is not severe except under fairly extreme workloads. To mitigate those problems, we avoid retaking the MVCC snapshot for each new scan; instead, we take a new snapshot only when invalidation messages have been processed. The catcache machinery already requires that invalidation messages be sent before releasing the related heavyweight lock; else other backends might rely on locally-cached data rather than scanning the catalog at all. Thus, making snapshot reuse dependent on the same guarantees shouldn't break anything that wasn't already subtly broken. Patch by me. Review by Michael Paquier and Andres Freund.
* Avoid inconsistent type declarationAlvaro Herrera2013-06-25
| | | | | | | | | Clang 3.3 correctly complains that a variable of type enum MultiXactStatus cannot hold a value of -1, which makes sense. Change the declared type of the variable to int instead, and apply casting as necessary to avoid the warning. Per notice from Andres Freund
* Post-pgindent cleanupStephen Frost2013-06-01
| | | | | | | | | | Make slightly better decisions about indentation than what pgindent is capable of. Mostly breaking out long function calls into one line per argument, with a few other minor adjustments. No functional changes- all whitespace. pgindent ran cleanly (didn't change anything) after. Passes all regressions.
* pgindent run for release 9.3Bruce Momjian2013-05-29
| | | | | This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
* Fix buffer pin leak in heap update redo routine.Heikki Linnakangas2013-03-27
| | | | | | | | | | | | | | In a heap update, if the old and new tuple were on different pages, and the new page no longer existed (because it was subsequently truncated away by vacuum), heap_xlog_update forgot to release the pin on the old buffer. This bug was introduced by the "Fix multiple problems in WAL replay" patch, commit 3bbf668de9f1bc172371681e80a4e769b6d014c8 (on master branch). With full_page_writes=off, this triggered an "incorrect local pin count" error later in replay, if the old page was vacuumed. This fixes bug #7969, reported by Yunong Xiao. Backpatch to 9.0, like the commit that introduced this bug.
* Allow I/O reliability checks using 16-bit checksumsSimon Riggs2013-03-22
| | | | | | | | | | | | | | | | | | | Checksums are set immediately prior to flush out of shared buffers and checked when pages are read in again. Hint bit setting will require full page write when block is dirtied, which causes various infrastructure changes. Extensive comments, docs and README. WARNING message thrown if checksum fails on non-all zeroes page; ERROR thrown but can be disabled with ignore_checksum_failure = on. Feature enabled by an initdb option, since transition from option off to option on is long and complex and has not yet been implemented. Default is not to use checksums. Checksum used is WAL CRC-32 truncated to 16-bits. Simon Riggs, Jeff Davis, Greg Smith Wide input and assistance from many community members. Thank you.
* Remove PageSetTLI and rename pd_tli to pd_checksumSimon Riggs2013-03-18
| | | | | | | | | | | | | | Remove use of PageSetTLI() from all page manipulation functions and adjust README to indicate change in the way we make changes to pages. Repurpose those bytes into the pd_checksum field and explain how that works in comments about page header. Refactoring ahead of actual feature patch which would make use of the checksum field, arriving later. Jeff Davis, with comments and doc changes by Simon Riggs Direction suggested by Robert Haas; many others providing review comments.
* Add a materialized view relations.Kevin Grittner2013-03-03
| | | | | | | | | | | | | | | | | | | | | | A materialized view has a rule just like a view and a heap and other physical properties like a table. The rule is only used to populate the table, references in queries refer to the materialized data. This is a minimal implementation, but should still be useful in many cases. Currently data is only populated "on demand" by the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements. It is expected that future releases will add incremental updates with various timings, and that a more refined concept of defining what is "fresh" data will be developed. At some point it may even be possible to have queries use a materialized in place of references to underlying tables, but that requires the other above-mentioned features to be working first. Much of the documentation work by Robert Haas. Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja Security review by KaiGai Kohei, with a decision on how best to implement sepgsql still pending.
* Fix Xmax freeze conditionsAlvaro Herrera2013-02-08
| | | | | | | I broke this in 0ac5ad5134; previously, freezing a tuple marked with an IS_MULTI xmax was not necessary. Per brokenness report from Jeff Janes.
* Fill tuple before HeapSatisfiesHOTAndKeyUpdateAlvaro Herrera2013-02-01
| | | | | | | | | | | | Failing to do this results in almost all updates to system catalogs being non-HOT updates, because the OID column would differ (not having been set for the new tuple), which is an indexed column. While at it, make sure to set the tableoid early in both old and new tuples as well. This isn't of much consequence, since that column is seldom (never?) indexed. Report and patch from Andres Freund.
* Restrict infomask bits to set on multixactsAlvaro Herrera2013-01-31
| | | | | | | | | | | | | | | We must only set the bit(s) for the strongest lock held in the tuple; otherwise, a multixact containing members with exclusive lock and key-share lock will behave as though only a share lock is held. This bug was introduced in commit 0ac5ad5134, somewhere along development, when we allowed a singleton FOR SHARE lock to be implemented without a MultiXact by using a multi-bit pattern. I overlooked that GetMultiXactIdHintBits() needed to be tweaked as well. Previously, we could have the bits for FOR KEY SHARE and FOR UPDATE simultaneously set and it wouldn't cause a problem. Per report from digoal@126.com
* Improve concurrency of foreign key lockingAlvaro Herrera2013-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Remove obsolete XLogRecPtr macrosAlvaro Herrera2012-12-28
| | | | | | | | | | | | | | | | | This gets rid of XLByteLT, XLByteLE, XLByteEQ and XLByteAdvance. These were useful for brevity when XLogRecPtrs were split in xlogid/xrecoff; but now that they are simple uint64's, they are just clutter. The only downside to making this change would be ease of backporting patches, but that has been negated by other substantive changes to the involved code anyway. The clarity of simpler expressions makes the change worthwhile. Most of the changes are mechanical, but in a couple of places, the patch author chose to invert the operator sense, making the code flow more logical (and more in line with preceding comments). Author: Andres Freund Eyeballed by Dimitri Fontaine and Alvaro Herrera
* Update comment in heapgetpage() regarding PD_ALL_VISIBLE vs. Hot Standby.Robert Haas2012-12-14
| | | | Pavan Deolasee, slightly modified by me
* In multi-insert, don't go into infinite loop on a huge tuple and fillfactor.Heikki Linnakangas2012-12-12
| | | | | | | | | | | | | | | | | | If a tuple is larger than page size minus space reserved for fillfactor, heap_multi_insert would never find a page that it fits in and repeatedly ask for a new page from RelationGetBufferForTuple. If a tuple is too large to fit on any page, taking fillfactor into account, RelationGetBufferForTuple will always expand the relation. In a normal insert, heap_insert will accept that and put the tuple on the new page. heap_multi_insert, however, does a fillfactor check of its own, and doesn't accept the newly-extended page RelationGetBufferForTuple returns, even though there is no other choice to make the tuple fit. Fix that by making the logic in heap_multi_insert more like the heap_insert logic. The first tuple is always put on the page RelationGetBufferForTuple gives us, and the fillfactor check is only applied to the subsequent tuples. Report from David Gould, although I didn't use his patch.
* Reduce scope of changes for COPY FREEZE.Simon Riggs2012-12-02
| | | | | | | | Allow support only for freezing tuples by explicit command. Previous coding mistakenly extended slightly beyond what was agreed as correct on -hackers. So essentially a partial revoke of earlier work, leaving just the COPY FREEZE command.
* COPY FREEZE and mark committed on fresh tables.Simon Riggs2012-12-01
| | | | | | | | | | | | | | | When a relfilenode is created in this subtransaction or a committed child transaction and it cannot otherwise be seen by our own process, mark tuples committed ahead of transaction commit for all COPY commands in same transaction. If FREEZE specified on COPY and pre-conditions met then rows will also be frozen. Both options designed to avoid revisiting rows after commit, increasing performance of subsequent commands after data load and upgrade. pg_restore changes later. Simon Riggs, review comments from Heikki Linnakangas, Noah Misch and design input from Tom Lane, Robert Haas and Kevin Grittner
* Split out rmgr rm_desc functions into their own filesAlvaro Herrera2012-11-28
| | | | | This is necessary (but not sufficient) to have them compilable outside of a backend environment.
* Fix multiple problems in WAL replay.Tom Lane2012-11-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the replay functions for WAL record types that modify more than one page failed to ensure that those pages were locked correctly to ensure that concurrent queries could not see inconsistent page states. This is a hangover from coding decisions made long before Hot Standby was added, when it was hardly necessary to acquire buffer locks during WAL replay at all, let alone hold them for carefully-chosen periods. The key problem was that RestoreBkpBlocks was written to hold lock on each page restored from a full-page image for only as long as it took to update that page. This was guaranteed to break any WAL replay function in which there was any update-ordering constraint between pages, because even if the nominal order of the pages is the right one, any mixture of full-page and non-full-page updates in the same record would result in out-of-order updates. Moreover, it wouldn't work for situations where there's a requirement to maintain lock on one page while updating another. Failure to honor an update ordering constraint in this way is thought to be the cause of bug #7648 from Daniel Farina: what seems to have happened there is that a btree page being split was rewritten from a full-page image before the new right sibling page was written, and because lock on the original page was not maintained it was possible for hot standby queries to try to traverse the page's right-link to the not-yet-existing sibling page. To fix, get rid of RestoreBkpBlocks as such, and instead create a new function RestoreBackupBlock that restores just one full-page image at a time. This function can be invoked by WAL replay functions at the points where they would otherwise perform non-full-page updates; in this way, the physical order of page updates remains the same no matter which pages are replaced by full-page images. We can then further adjust the logic in individual replay functions if it is necessary to hold buffer locks for overlapping periods. A side benefit is that we can simplify the handling of concurrency conflict resolution by moving that code into the record-type-specfic functions; there's no more need to contort the code layout to keep conflict resolution in front of the RestoreBkpBlocks call. In connection with that, standardize on zero-based numbering rather than one-based numbering for referencing the full-page images. In HEAD, I removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are still there in the header files in previous branches, but are no longer used by the code. In addition, fix some other bugs identified in the course of making these changes: spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple and we're not doing a full-page image: it would get fooled by the LSN having been advanced already. This would result in permanent index corruption, not just transient failure of concurrent queries. Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. heap_xlog_freeze() was inconsistent about using a cleanup lock or plain exclusive lock: it did the former in the normal path but the latter for a full-page image. A plain exclusive lock seems sufficient, so change to that. Also, remove gistRedoPageDeleteRecord(), which has been dead code since VACUUM FULL was rewritten. Back-patch to 9.0, where hot standby was introduced. Note however that 9.0 had a significantly different WAL-logging scheme for GIST index updates, and it doesn't appear possible to make that scheme safe for concurrent hot standby queries, because it can leave inconsistent states in the index even between WAL records. Given the lack of complaints from the field, we won't work too hard on fixing that branch.
* Throw error if expiring tuple is again updated or deleted.Kevin Grittner2012-10-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This prevents surprising behavior when a FOR EACH ROW trigger BEFORE UPDATE or BEFORE DELETE directly or indirectly updates or deletes the the old row. Prior to this patch the requested action on the row could be silently ignored while all triggered actions based on the occurence of the requested action could be committed. One example of how this could happen is if the BEFORE DELETE trigger for a "parent" row deleted "children" which had trigger functions to update summary or status data on the parent. This also prevents similar surprising problems if the query has a volatile function which updates a target row while it is already being updated. There are related issues present in FOR UPDATE cursors and READ COMMITTED queries which are not handled by this patch. These issues need further evalution to determine what change, if any, is needed. Where the new error messages are generated, in most cases the best fix will be to move code from the BEFORE trigger to an AFTER trigger. Where this is not feasible, the trigger can avoid the error by re-issuing the triggering statement and returning NULL. Documentation changes will be submitted in a separate patch. Kevin Grittner and Tom Lane with input from Florian Pflug and Robert Haas, based on problems encountered during conversion of Wisconsin Circuit Court trigger logic to plpgsql triggers.
* Put back AcceptInvalidationMessages calls in heap_openrv(_extended).Tom Lane2012-09-19
| | | | | | | | | | | | | | | | | | These calls were removed in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7 as part of a general refactoring and improvement of DDL locking. However, there's a problem not solved by the rewrite, which is that GRANT/REVOKE update pg_class.relacl without taking any particular lock on the target table as such. If another backend fails to do AcceptInvalidationMessages, it won't notice a recently-committed change in ACLs. Bug #7557 from Piotr Czachur demonstrates that there's at least one code path in 9.2.0 in which a command fails to do any AcceptInvalidationMessages calls at all, if the current transaction already holds all the locks it will need. Since we're hard up against the release deadline for 9.2.1, fix this by putting back the AcceptInvalidationMessages calls in heap_openrv and heap_openrv_extended, thereby restoring the historical behavior in this area. We ought to look for a more elegant and perhaps more bulletproof solution, but there's no time for that right now.
* Split heapam_xlog.h from heapam.hAlvaro Herrera2012-08-28
| | | | | | | | | | | | The heapam XLog functions are used by other modules, not all of which are interested in the rest of the heapam API. With this, we let them get just the XLog stuff in which they are interested and not pollute them with unrelated includes. Also, since heapam.h no longer requires xlog.h, many files that do include heapam.h no longer get xlog.h automatically, including a few headers. This is useful because heapam.h is getting pulled in by execnodes.h, which is in turn included by a lot of files.
* Add new function log_newpage_buffer.Robert Haas2012-06-14
| | | | | | | | When I implemented the ginbuildempty() function as part of implementing unlogged tables, I falsified the note in the header comment for log_newpage. Although we could fix that up by changing the comment, it seems cleaner to add a new function which is specifically intended to handle this case. So do that.
* Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian2012-06-10
| | | | commit-fest.
* Fix more crash-safe visibility map bugs, and improve comments.Robert Haas2012-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | In lazy_scan_heap, we could issue bogus warnings about incorrect information in the visibility map, because we checked the visibility map bit before locking the heap page, creating a race condition. Fix by rechecking the visibility map bit before we complain. Rejigger some related logic so that we rely on the possibly-outdated all_visible_according_to_vm value as little as possible. In heap_multi_insert, it's not safe to clear the visibility map bit before beginning the critical section. The visibility map is not crash-safe unless we treat clearing the bit as a critical operation. Specifically, if the transaction were to error out after we set the bit and before entering the critical section, we could end up writing the heap page to disk (with the bit cleared) and crashing before the visibility map page made it to disk. That would be bad. heap_insert has this correct, but somehow the order of operations got rearranged when heap_multi_insert was added. Also, add some more comments to visibilitymap_test, lazy_scan_heap, and IndexOnlyNext, expounding on concurrency issues. Per extensive code review by Andres Freund, and further review by Tom Lane, who also made the original report about the bogus warnings.
* Only throw recovery conflicts when InHotStandby. Bug fix to recentSimon Riggs2012-05-31
| | | | | | patch to allow Index Only Scans on Hot Standby. Bug report from Jaime Casanova
* Ensure that seqscans check for interrupts at least once per page.Tom Lane2012-05-22
| | | | | | | | | | | | | If a seqscan encounters many consecutive pages containing only dead tuples, it can remain in the loop in heapgettup for a long time, and there was no CHECK_FOR_INTERRUPTS anywhere in that loop. This meant there were real-world situations where a query would be effectively uncancelable for long stretches. Add a check placed to occur once per page, which should be enough to provide reasonable response time without adding any measurable overhead. Report and patch by Merlin Moncure (though I tweaked it a bit). Back-patch to all supported branches.
* Fix bug in freespace calculation in heap_multi_insert().Heikki Linnakangas2012-05-16
| | | | | | | If the amount of freespace on page was less than the amount reserved by fillfactor, the calculation would underflow. This fixes bug #6643 reported by Tomonari Katsumata.
* Avoid repeated CLOG access from heap_hot_search_buffer.Robert Haas2012-05-02
| | | | | | | | | | | At the time we check whether the tuple is dead to all running transactions, we've already verified that it isn't visible to our scan, setting hint bits if appropriate. So there's no need to recheck CLOG for the all-dead test we do just a moment later. So, add HeapTupleIsSurelyDead() to test the appropriate condition under the assumption that all relevant hit bits are already set. Review by Tom Lane.
* Prevent index-only scans from returning wrong answers under Hot Standby.Robert Haas2012-04-26
| | | | | | | | | The alternative of disallowing index-only scans in HS operation was discussed, but the consensus was that it was better to treat marking a page all-visible as a recovery conflict for snapshots that could still fail to see XIDs on that page. We may in the future try to soften this, so that we simply force index scans to do heap fetches in cases where this may be an issue, rather than throwing a hard conflict.
* Lots of doc corrections.Robert Haas2012-04-23
| | | | Josh Kupershmidt
* Code cleanup for heap_freeze_tuple.Robert Haas2012-03-26
| | | | | | | It used to be case that lazy vacuum could call this function with only a shared lock on the buffer, but neither lazy vacuum nor any other code path does that any more. Simplify the code accordingly and clean up some related, obsolete comments.
* Fix heap_multi_insert to set t_self field in the caller's tuples.Heikki Linnakangas2012-02-13
| | | | | | | | If tuples were toasted, heap_multi_insert didn't update the ctid on the original tuples. This caused a failure if there was an after trigger (including a foreign key), on the table, and a tuple got toasted. Per off-list report and test case from Ted Phelps
* fastgetattr is in access/htup.h, not access/heapam.hRobert Haas2012-01-16
| | | | Noted by Peter Geoghegan
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Improve table locking behavior in the face of current DDL.Robert Haas2011-11-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
* Take fillfactor into account in the new COPY bulk heap insert code.Heikki Linnakangas2011-11-26
| | | | Jeff Janes
* Fix another bug in the redo of COPY batches.Heikki Linnakangas2011-11-10
| | | | | I got alignment wrong in the redo routine. Spotted by redoing the log genereated by copy regression test.
* Fix bugs in the COPY heap-insert batching patch.Heikki Linnakangas2011-11-09
| | | | | | | | | | Forgot to call RestoreBkpBlocks() in the redo-function, as pointed out by Simon Riggs. In redo of a regular heap insert, it's taken care of in heap_redo(), but this new record type uses the heap2 RM, and heap2_redo() does not take care of that for you. Also, failed to reset the vmbuffer and all_visibile_cleared local variables after switching to a new buffer.