aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
...
* Change the autovacuum launcher to use WaitLatch instead of a poll loop.Tom Lane2011-08-10
| | | | | | | | | | | | | | | | | In pursuit of this (and with the expectation that WaitLatch will be needed in more places), convert the latch field that was already added to PGPROC for sync rep into a generic latch that is activated for all PGPROC-owning processes, and change many of the standard backend signal handlers to set that latch when a signal happens. This will allow WaitLatch callers to be wakened properly by these signals. In passing, fix a whole bunch of signal handlers that had been hacked to do things that might change errno, without adding the necessary save/restore logic for errno. Also make some minor fixes in unix_latch.c, and clean up bizarre and unsafe scheme for disowning the process's latch. Much of this has to be back-patched into 9.1. Peter Geoghegan, with additional work by Tom
* If backup-end record is not seen, and we reach end of recovery from aHeikki Linnakangas2011-08-10
| | | | | | | | | | | | | | | | streamed backup, throw an error and refuse to start up. The restore has not finished correctly in that case and the data directory is possibly corrupt. We already errored out in case of archive recovery, but could not during crash recovery because we couldn't distinguish between the case that pg_start_backup() was called and the database then crashed (must not error, data is OK), and the case that we're restoring from a backup and not all the needed WAL was replayed (data can be corrupt). To distinguish those cases, add a line to backup_label to indicate whether the backup was taken with pg_start/stop_backup(), or by streaming (ie. pg_basebackup). This requires re-initdb, because of a new field added to the control file.
* Measure WaitLatch's timeout parameter in milliseconds, not microseconds.Tom Lane2011-08-09
| | | | | | | | | | | | The original definition had the problem that timeouts exceeding about 2100 seconds couldn't be specified on 32-bit machines. Milliseconds seem like sufficient resolution, and finer grain than that would be fantasy anyway on many platforms. Back-patch to 9.1 so that this aspect of the latch API won't change between 9.1 and later releases. Peter Geoghegan
* Change the way string relopts are allocated.Heikki Linnakangas2011-08-09
| | | | | | | | | | | | Don't try to allocate the default value for a string relopt in the same palloc chunk as the relopt_string struct. That didn't work too well if you added a built-in string relopt in the stringRelOpts array, as it's not possible to have an initializer for a variable length struct in C. This makes the code slightly simpler too. While we're at it, move the call to validator function in add_string_reloption to before the allocation, so that if someone does pass a bogus default value, we don't leak memory.
* Allow per-column foreign data wrapper options.Robert Haas2011-08-05
| | | | Shigeru Hanada, with fairly minor editing by me.
* Remove O(N^2) performance issue with multiple SAVEPOINTs.Simon Riggs2011-07-19
| | | | | | | | | Subtransaction locks now released en masse at main commit, rather than repeatedly re-scanning for locks as we ascend the nested transaction tree. Split transaction state TBLOCK_SUBEND into two states, TBLOCK_SUBCOMMIT and TBLOCK_SUBRELEASE to allow the commit path to be optimised using the existing code in ResourceOwnerRelease() which appears to have been intended for this usage, judging from comments therein.
* Cascading replication feature for streaming log-based replication.Simon Riggs2011-07-19
| | | | | | | | | Standby servers can now have WALSender processes, which can work with either WALReceiver or archive_commands to pass data. Fully updated docs, including new conceptual terms of sending server, upstream and downstream servers. WALSenders terminated when promote to master. Fujii Masao, review, rework and doc rewrite by Simon Riggs
* Change the way the offset of downlink is stored in GISTInsertStack.Heikki Linnakangas2011-07-15
| | | | | | | | | | | | | | | | | | GISTInsertStack.childoffnum used to mean "offset of the downlink in this node, pointing to the child node in the stack". It's now replaced with downlinkoffnum, which means "offset of the downlink in the parent of this node". gistFindPath() already used childoffnum with this new meaning, and had an extra step at the end to pull all the childoffnum values down one node in the stack, to adjust the stack for the meaning that childoffnum had elsewhere. That's no longer required. The reason to do this now is this new representation is more convenient for the GiST fast build patch that Alexander Korotkov is working on. While we're at it, replace the linked list used in gistFindPath with a standard List, and make gistFindPath() static. Alexander Korotkov, with some changes by me.
* Fix two ancient bugs in GiST code to re-find a parent after page split:Heikki Linnakangas2011-07-15
| | | | | | | | | | | | | | | | | | | | | | | | First, when following a right-link, we incorrectly marked the current page as the parent of the right sibling. In reality, the parent of the right page is the same as the parent of the current page (or some page to the right of it, gistFindCorrectParent() will sort that out). Secondly, when we follow a right-link, we must prepend, not append, the right page to our list of pages to visit. That's because we assume that once we hit a leaf page in the list, all the rest are leaf pages too, and give up. To hit these bugs, you need concurrent actions and several unlucky accidents. Another backend must split the root page, while you're in process of splitting a lower-level page. Furthermore, while you scan the internal nodes to re-find the parent, another backend needs to again split some more internal pages. Even then, the bugs don't necessarily manifest as user-visible errors or index corruption. While we're at it, make the error reporting a bit better if gistFindPath() fails to re-find the parent. It used to be an assertion, but an elog() seems more appropriate. Backpatch to all supported branches.
* Try to acquire relation locks in RangeVarGetRelid.Robert Haas2011-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | | In the previous coding, we would look up a relation in RangeVarGetRelid, lock the resulting OID, and then AcceptInvalidationMessages(). While this was sufficient to ensure that we noticed any changes to the relation definition before building the relcache entry, it didn't handle the possibility that the name we looked up no longer referenced the same OID. This was particularly problematic in the case where a table had been dropped and recreated: we'd latch on to the entry for the old relation and fail later on. Now, we acquire the relation lock inside RangeVarGetRelid, and retry the name lookup if we notice that invalidation messages have been processed meanwhile. Many operations that would previously have failed with an error in the presence of concurrent DDL will now succeed. There is a good deal of work remaining to be done here: many callers of RangeVarGetRelid still pass NoLock for one reason or another. In addition, nothing in this patch guards against the possibility that the meaning of an unqualified name might change due to the creation of a relation in a schema earlier in the user's search path than the one where it was previously found. Furthermore, there's nothing at all here to guard against similar race conditions for non-relations. For all that, it's a start. Noah Misch and Robert Haas
* Introduce a pipe between postmaster and each backend, which can be used toHeikki Linnakangas2011-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | detect postmaster death. Postmaster keeps the write-end of the pipe open, so when it dies, children get EOF in the read-end. That can conveniently be waited for in select(), which allows eliminating some of the polling loops that check for postmaster death. This patch doesn't yet change all the loops to use the new mechanism, expect a follow-on patch to do that. This changes the interface to WaitLatch, so that it takes as argument a bitmask of events that it waits for. Possible events are latch set, timeout, postmaster death, and socket becoming readable or writeable. The pipe method behaves slightly differently from the kill() method previously used in PostmasterIsAlive() in the case that postmaster has died, but its parent has not yet read its exit code with waitpid(). The pipe returns EOF as soon as the process dies, but kill() continues to return true until waitpid() has been called (IOW while the process is a zombie). Because of that, change PostmasterIsAlive() to use the pipe too, otherwise WaitLatch() would return immediately with WL_POSTMASTER_DEATH, while PostmasterIsAlive() would claim it's still alive. That could easily lead to busy-waiting while postmaster is in zombie state. Peter Geoghegan with further changes by me, reviewed by Fujii Masao and Florian Pflug.
* Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.hAlvaro Herrera2011-07-04
| | | | | This lets us stop including rel.h into execnodes.h, which is a widely used header.
* Enable CHECK constraints to be declared NOT VALIDAlvaro Herrera2011-06-30
| | | | | | | | | | | | | | | | | | | | | | | This means that they can initially be added to a large existing table without checking its initial contents, but new tuples must comply to them; a separate pass invoked by ALTER TABLE / VALIDATE can verify existing data and ensure it complies with the constraint, at which point it is marked validated and becomes a normal part of the table ecosystem. An non-validated CHECK constraint is ignored in the planner for constraint_exclusion purposes; when validated, cached plans are recomputed so that partitioning starts working right away. This patch also enables domains to have unvalidated CHECK constraints attached to them as well by way of ALTER DOMAIN / ADD CONSTRAINT / NOT VALID, which can later be validated with ALTER DOMAIN / VALIDATE CONSTRAINT. Thanks to Thom Brown, Dean Rasheed and Jaime Casanova for the various reviews, and Robert Hass for documentation wording improvement suggestions. This patch was sponsored by Enova Financial.
* Restore correct btree preprocessing of "indexedcol IS NULL" conditions.Tom Lane2011-06-29
| | | | | | | | | | | | Such a condition is unsatisfiable in combination with any other type of btree-indexable condition (since we assume btree operators are always strict). 8.3 and 8.4 had an explicit test for this, which I removed in commit 29c4ad98293e3c5cb3fcdd413a3f4904efff8762, mistakenly thinking that the case would be subsumed by the more general handling of IS (NOT) NULL added in that patch. Put it back, and improve the comments about it, and add a regression test case. Per bug #6079 from Renat Nasyrov, and analysis by Dean Rasheed.
* Move the PredicateLockRelation() call from nodeSeqscan.c to heapam.c. It'sHeikki Linnakangas2011-06-29
| | | | | | | | | | | | | | | | | | | | more consistent that way, since all the other PredicateLock* calls are made in various heapam.c and index AM functions. The call in nodeSeqscan.c was unnecessarily aggressive anyway, there's no need to try to lock the relation every time a tuple is fetched, it's enough to do it once. This has the user-visible effect that if a seq scan is initialized in the executor, but never executed, we now acquire the predicate lock on the heap relation anyway. We could avoid that by taking the lock on the first heap_getnext() call instead, but it doesn't seem worth the trouble given that it feels more natural to do it in heap_beginscan(). Also, remove the retail PredicateLockTuple() calls from heap_getnext(). In a seqscan, started with heap_begin(), we're holding a whole-relation predicate lock on the heap so there's no need to lock the tuples individually. Kevin Grittner and me
* Unify spelling of "canceled", "canceling", "cancellation"Peter Eisentraut2011-06-29
| | | | | We had previously (af26857a2775e7ceb0916155e931008c2116632f) established the U.S. spellings as standard.
* Introduce compact WAL record for the common case of commit (non-DDL).Simon Riggs2011-06-28
| | | | | | | | XLOG_XACT_COMMIT_COMPACT leaves out invalidation messages and relfilenodes, saving considerable space for the vast majority of transaction commits. XLOG_XACT_COMMIT keeps same definition as XLOG_PAGE_MAGIC 0xD067 and earlier. Leonardo Francalanci and Simon Riggs
* Reduce impact of btree page reuse on Hot Standby by fixing off-by-1 error.Simon Riggs2011-06-27
| | | | | | | | | WAL records of type XLOG_BTREE_REUSE_PAGE were generated using a latestRemovedXid one higher than actually needed because xid used was page opaque->btpo.xact rather than an actually removed xid. Noticed on an otherwise quiet system by Noah Misch. Noah Misch and Simon Riggs
* Allow callers to pass a missing_ok flag when opening a relation.Robert Haas2011-06-27
| | | | | | | | | | | | | Since the names try_relation_openrv() and try_heap_openrv() don't seem quite appropriate, rename the functions to relation_openrv_extended() and heap_openrv_extended(). This is also more general, if we have a future need for additional parameters that are of interest to only a few callers. This is infrastructure for a forthcoming patch to allow get_object_address() to take a missing_ok argument as well. Patch by me, review by Noah Misch.
* Try again to make the visibility map crash safe.Robert Haas2011-06-27
| | | | | My previous attempt was quite a bit less than half-baked with respect to heap_update().
* Avoid having two copies of the HOT-chain search logic.Robert Haas2011-06-27
| | | | | | | | | | | | | It's been like this since HOT was originally introduced, but the logic is complex enough that this is a recipe for bugs, as we've already found out with SSI. So refactor heap_hot_search_buffer() so that it can satisfy the needs of index_getnext(), and make index_getnext() use that rather than duplicating the logic. This change was originally proposed by Heikki Linnakangas as part of a larger refactoring oriented towards allowing index-only scans. I extracted and adjusted this part, since it seems to have independent merit. Review by Jeff Davis.
* Make the visibility map crash-safe.Robert Haas2011-06-21
| | | | | | | | | | | | | | | | | | | | This involves two main changes from the previous behavior. First, when we set a bit in the visibility map, emit a new WAL record of type XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and the visibility map bit. Second, when inserting, updating, or deleting a tuple, we can no longer get away with clearing the visibility map bit after releasing the lock on the corresponding heap page, because an intervening crash might leave the visibility map bit set and the page-level bit clear. Making this work requires a bit of interface refactoring. In passing, a few minor but related cleanups: change the test in visibilitymap_set and visibilitymap_clear to throw an error if the wrong page (or no page) is pinned, rather than silently doing nothing; this case should never occur. Also, remove duplicate definitions of InvalidXLogRecPtr. Patch by me, review by Noah Misch.
* Message style and spelling improvementsPeter Eisentraut2011-06-22
|
* pgindent run of recent SSI changes. Also, remove an unnecessary #include.Heikki Linnakangas2011-06-16
| | | | Kevin Grittner
* Respect Hot Standby controls while recycling btree index pages.Simon Riggs2011-06-16
| | | | | | | | | | | | | | Btree pages were recycled after VACUUM deletes all records on a page and then a subsequent VACUUM occurs after the RecentXmin horizon is reached. Using RecentXmin meant that we did not respond correctly to the user controls provide to avoid Hot Standby conflicts and so spurious conflicts could be generated in some workload combinations. We now reuse pages only when we reach RecentGlobalXmin, which can be much later in the presence of long running queries and is also controlled by vacuum_defer_cleanup_age and hot_standby_feedback. Noah Misch and Simon Riggs
* Make non-MVCC snapshots exempt from predicate locking. Scans with non-MVCCHeikki Linnakangas2011-06-15
| | | | | | | | snapshots, like in REINDEX, are basically non-transactional operations. The DDL operation itself might participate in SSI, but there's separate functions for that. Kevin Grittner and Dan Ports, with some changes by me.
* Oops, forgot to change the order of entries in 2PC callback arrays when IHeikki Linnakangas2011-06-14
| | | | renumbered the resource managers. This should fix the buildfarm..
* Work around gcc 4.6.0 bug that breaks WAL replay.Tom Lane2011-06-10
| | | | | | | | | | | | | ReadRecord's habit of using both direct references to tmpRecPtr and references to *RecPtr (which is pointing at tmpRecPtr) triggers an optimization bug in gcc 4.6.0, which apparently has forgotten about aliasing rules. Avoid the compiler bug, and make the code more readable to boot, by getting rid of the direct references. Improve the comments while at it. Back-patch to all supported versions, in case they get built with 4.6.0. Tom Lane, with some cosmetic suggestions from Alex Hunsaker
* Pgindent run before 9.1 beta2.Bruce Momjian2011-06-09
|
* Protect GIST logic that assumes penalty values can't be negative.Tom Lane2011-05-31
| | | | | | | | | | Apparently sane-looking penalty code might return small negative values, for example because of roundoff error. This will confuse places like gistchoose(). Prevent problems by clamping negative penalty values to zero. (Just to be really sure, I also made it force NaNs to zero.) Back-patch to all supported branches. Alexander Korotkov
* The row-version chaining in Serializable Snapshot Isolation was still wrong.Heikki Linnakangas2011-05-30
| | | | | | | | | | On further analysis, it turns out that it is not needed to duplicate predicate locks to the new row version at update, the lock on the version that the transaction saw as visible is enough. However, there was a different bug in the code that checks for dangerous structures when a new rw-conflict happens. Fix that bug, and remove all the row-version chaining related code. Kevin Grittner & Dan Ports, with some comment editorialization by me.
* Spell checking and markup refinementPeter Eisentraut2011-05-19
|
* Fix assorted typosAlvaro Herrera2011-05-12
|
* Shut down WAL receiver if it's still running at end of recovery. We used toHeikki Linnakangas2011-05-11
| | | | | just check that it's not running and PANIC if it was, but that can rightfully happen if recovery stops at recovery target.
* Move RegisterPredicateLockingXid() call to a safer place.Tom Lane2011-05-06
| | | | | | | | | | | | | | | | | | | The SSI patch inserted a call of RegisterPredicateLockingXid into GetNewTransactionId, which was a bad idea on a couple of grounds. First, it's not necessary to hold XidGenLock while manipulating that shared memory, and doing so is bad because XidGenLock is a high-contention lock that should be held for as short a time as possible. (Not to mention that it adds an entirely unnecessary deadlock hazard, since we must take SerializableXactHashLock as well.) Second, the specific place where it was put was between extending CLOG and advancing nextXid, which could result in unpleasant behavior in case of a failure there. Pull the call out to AssignTransactionId, which is much safer and arguably better from a modularity standpoint too. There is more work to do to clean up the failure-before-advancing-nextXid issue, but that is a separate change that will need to be back-patched. So for the moment I just want to make GetNewTransactionId look the same as it did in prior versions.
* Fix SSI-related assertion failure.Robert Haas2011-04-25
| | | | | | Bug #5899, reported by Marko Tiikkaja. Heikki Linnakangas, reviewed by Kevin Grittner and Dan Ports.
* Hash indexes had better pass the index collation to support functions, too.Tom Lane2011-04-23
| | | | | Per experimentation with contrib/citext, whose hash function assumes that it'll be passed a collation.
* Make GIN and GIST pass the index collation to all their support functions.Tom Lane2011-04-22
| | | | | | | Experimentation with contrib/btree_gist shows that the majority of the GIST support functions potentially need collation information. Safest policy seems to be to pass it to all of them, instead of making assumptions about which ones could possibly need it.
* Make a code-cleanup pass over the collations patch.Tom Lane2011-04-22
| | | | | | | This patch is almost entirely cosmetic --- mostly cleaning up a lot of neglected comments, and fixing code layout problems in places where the patch made lines too long and then pgindent did weird things with that. I did find a bug-of-omission in equalTupleDescs().
* recoveryStopsHere() must check the resource manager ID.Robert Haas2011-04-18
| | | | | | | | | | Before commit c016ce728139be95bb0dc7c4e5640507334c2339, this wasn't needed, but now that multiple resource manager IDs can percolate down through here, we have to make sure we know which one we've got. Otherwise, we can confuse (for example) an XLOG_XACT_COMMIT record with an XLOG_CHECKPOINT_SHUTDOWN record. Review by Jaime Casanova
* Add an Assert that indexam.c isn't used on an index awaiting reindexing.Tom Lane2011-04-16
| | | | | | | This might have caught the recent embarrassment over trying to modify pg_index while its indexes were being rebuilt. Noah Misch
* Revert the patch to check if we've reached end-of-backup also when doingHeikki Linnakangas2011-04-13
| | | | | | | | | crash recovery, and throw an error if not. hubert depesz lubaczewski pointed out that that situation also happens in the crash recovery following a system crash that happens during an online backup. We might want to do something smarter in 9.1, like put the check back for backups taken with pg_basebackup, but that's for another patch.
* Pass collations to functions in FunctionCallInfoData, not FmgrInfo.Tom Lane2011-04-12
| | | | | | | | | | | Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...
* Clean up most -Wunused-but-set-variable warnings from gcc 4.6Peter Eisentraut2011-04-11
| | | | | | This warning is new in gcc 4.6 and part of -Wall. This patch cleans up most of the noise, but there are some still warnings that are trickier to remove.
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-10
|
* Tweak collation setup for GIN index comparison functions.Tom Lane2011-04-08
| | | | | | | Honor index column's collation spec if there is one, don't go to the expense of calling get_typcollation when we can reasonably assume that all GIN storage types will use default collation, and be sure to set a collation for the comparePartialFn too.
* Revise the API for GUC variable assign hooks.Tom Lane2011-04-07
| | | | | | | | | | | | | | | | | The previous functions of assign hooks are now split between check hooks and assign hooks, where the former can fail but the latter shouldn't. Aside from being conceptually clearer, this approach exposes the "canonicalized" form of the variable value to guc.c without having to do an actual assignment. And that lets us fix the problem recently noted by Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus log messages about "parameter "wal_buffers" cannot be changed without restarting the server". There may be some speed advantage too, because this design lets hook functions avoid re-parsing variable values when restoring a previous state after a rollback (they can store a pre-parsed representation of the value instead). This patch also resolves a longstanding annoyance about custom error messages from variable assign hooks: they should modify, not appear separately from, guc.c's own message about "invalid parameter value".
* Avoid assuming there will be only 3 states for synchronous_commit.Simon Riggs2011-04-04
| | | | | | Also avoid hardcoding the current default state by giving it the name "on" and replace with a meaningful name that reflects its behaviour. Coding only, no change in behaviour.
* Merge synchronous_replication setting into synchronous_commit.Robert Haas2011-04-04
| | | | | | | | This means one less thing to configure when setting up synchronous replication, and also avoids some ambiguity around what the behavior should be when the settings of these variables conflict. Fujii Masao, with additional hacking by me.
* Improve error message when WAL ends before reaching end of online backup.Heikki Linnakangas2011-03-31
|