aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Fix dynahash.c to suppress hash bucket splits while a hash_seq_search() scanTom Lane2007-04-26
| | | | | | | | | | | | | | | | | | | | | | | is in progress on the same hashtable. This seems the least invasive way to fix the recently-recognized problem that a split could cause the scan to visit entries twice or (with much lower probability) miss them entirely. The only field-reported problem caused by this is the "failed to re-find shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky, which was caused by multiply visited entries. However, it seems certain that mdsync() is vulnerable to missing required fsync's due to missed entries, and I am fearful that RelationCacheInitializePhase2() might be at risk as well. Because of that and the generalized hazard presented by this bug, back-patch all the supported branches. Along the way, fix pg_prepared_statement() and pg_cursor() to not assume that the hashtables they are examining will stay static between calls. This is risky regardless of the newly noted dynahash problem, because hash_seq_search() has never promised to cope with deletion of table entries other than the just-returned one. There may be no bug here because the only supported way to call these functions is via ExecMakeTableFunctionResult() which will cycle them to completion before doing anything very interesting, but it seems best to get rid of the assumption. This affects 8.2 and HEAD only, since those functions weren't there earlier.
* Repair PANIC condition in hash indexes when a previous index extension attemptTom Lane2007-04-19
| | | | | | | | | | | failed (due to lock conflicts or out-of-space). We might have already extended the index's filesystem EOF before failing, causing the EOF to be beyond what the metapage says is the last used page. Hence the invariant maintained by the code needs to be "EOF is at or beyond last used page", not "EOF is exactly the last used page". Problem was created by my patch of 2006-11-19 that attempted to repair bug #2737. Since that was back-patched to 7.4, this needs to be as well. Per report and test case from Vlastimil Krejcir.
* Fix oversight in coding of _bt_start_vacuum: we can't assume that the LWLockTom Lane2007-03-30
| | | | | | | will be released by transaction abort before _bt_end_vacuum gets called. If either of these "can't happen" errors actually happened, we'd freeze up trying to acquire an already-held lock. Latest word is that this does not explain Martin Pitt's trouble report, but it still looks like a bug.
* Disallow committing a prepared transaction unless we are in the same databaseTom Lane2007-02-13
| | | | | it was executed in. Someday it might be nice to allow cross-DB commits, but work would be needed in NOTIFY and perhaps other places. Per Heikki.
* Don't MAXALIGN in the checks to decide whether a tuple is over TOAST'sTom Lane2007-02-04
| | | | | | | | | | | | | | | | | | | | | | | | | | threshold for tuple length. On 4-byte-MAXALIGN machines, the toast code creates tuples that have t_len exactly TOAST_TUPLE_THRESHOLD ... but this number is not itself maxaligned, so if heap_insert maxaligns t_len before comparing to TOAST_TUPLE_THRESHOLD, it'll uselessly recurse back to tuptoaster.c, wasting cycles. (It turns out that this does not happen on 8-byte-MAXALIGN machines, because for them the outer MAXALIGN in the TOAST_MAX_CHUNK_SIZE macro reduces TOAST_MAX_CHUNK_SIZE so that toast tuples will be less than TOAST_TUPLE_THRESHOLD in size. That MAXALIGN is really incorrect, but we can't remove it now, see below.) There isn't any particular value in maxaligning before comparing to the thresholds, so just don't do that, which saves a small number of cycles in itself. These numbers should be rejiggered to minimize wasted space on toast-relation pages, but we can't do that in the back branches because changing TOAST_MAX_CHUNK_SIZE would force an initdb (by changing the contents of toast tables). We can move the toast decision thresholds a bit, though, which is what this patch effectively does. Thanks to Pavan Deolasee for discovering the unintended recursion. Back-patch into 8.2, but not further, pending more testing. (HEAD is about to get a further patch modifying the thresholds, so it won't help much for testing this form of the patch.)
* Correct an old logic error in btree page splitting: when considering a splitTom Lane2007-01-27
| | | | | | | | | | | | exactly at the point where we need to insert a new item, the calculation used the wrong size for the "high key" of the new left page. This could lead to choosing an unworkable split, resulting in "PANIC: failed to add item to the left sibling" (or "right sibling") failure. Although this bug has been there a long time, it's very difficult to trigger a failure before 8.2, since there was generally a lot of free space on both sides of a chosen split. In 8.2, where the user-selected fill factor determines how much free space the code tries to leave, an unworkable split is much more likely. Report by Joe Conway, diagnosis and fix by Heikki Linnakangas.
* Fix oversight in handling of row-comparison index keys: if the row comparisonTom Lane2007-01-07
| | | | | | | | | | | doesn't exactly match the index, we may have to change our initial positioning strategy. For example, given an index on (f1,f2,f3) and a WHERE condition "ROW(f1,f3) > ROW(2,3)", the code extracted the initial-positioning condition "f1 > 2", which is wrong ... it has to be "f1 >= 2", else some rows matching the WHERE condition may fail to be returned. Applying patch to 8.2 only --- I'll fix it in HEAD later as part of the planned index improvements (reverse-sort and NULLS FIRST/LAST work).
* Minor adjustments to make failures in startup/shutdown behave more cleanly.Tom Lane2006-11-30
| | | | | | | | | | | | | | | | StartupXLOG and ShutdownXLOG no longer need to be critical sections, because in all contexts where they are invoked, elog(ERROR) would be translated to elog(FATAL) anyway. (One change in bgwriter.c is needed to make this true: set ExitOnAnyError before trying to exit. This is a good fix anyway since the existing code would have gone into an infinite loop on elog(ERROR) during shutdown.) That avoids a misleading report of PANIC during semi-orderly failures. Modify the postmaster to include the startup process in the set of processes that get SIGTERM when a fast shutdown is requested, and also fix it to not try to restart the bgwriter if the bgwriter fails while trying to write the shutdown checkpoint. Net result is that "pg_ctl stop -m fast" does something reasonable for a system in warm standby mode, and so should Unix system shutdown (ie, universal SIGTERM). Per gripe from Stephen Harris and some corner-case testing of my own.
* Fix bug with page deletion. If inner page is removed and it tries toTeodor Sigaev2006-11-30
| | | | | | | | | | remove page on next level linked from next inner page, ginScanToDelete() wrongly sets parent page. Bug reveals when many item pointers from index was deleted ( several hundred thousands). Bug is discovered by hubert depesz lubaczewski <depesz@gmail.com> Suppose, we need rc2 before release...
* Add a comment noting that heap_copytuple_with_tuple() results in aNeil Conway2006-11-23
| | | | | | HeapTuple that is no longer allocated as a single palloc() block; if used carelessly, this might result in a subsequent memory leak after heap_freetuple().
* Several changes to reduce the probability of running out of memory duringTom Lane2006-11-23
| | | | | | | | | | | | | | | | | AbortTransaction, which would lead to recursion and eventual PANIC exit as illustrated in recent report from Jeff Davis. First, in xact.c create a special dedicated memory context for AbortTransaction to run in. This solves the problem as long as AbortTransaction doesn't need more than 32K (or whatever other size we create the context with). But in corner cases it might. Second, in trigger.c arrange to keep pending after-trigger event records in separate contexts that can be freed near the beginning of AbortTransaction, rather than having them persist until CleanupTransaction as before. Third, in portalmem.c arrange to free executor state data earlier as well. These two changes should result in backing off the out-of-memory condition before AbortTransaction needs any significant amount of memory, at least in typical cases such as memory overrun due to too many trigger events or too big an executor hash table. And all the same for subtransaction abort too, of course.
* On systems that have setsid(2) (which should be just about everything exceptTom Lane2006-11-21
| | | | | | | | | | | | | Windows), arrange for each postmaster child process to be its own process group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole process group not only the direct child process. This provides saner behavior for archive and recovery scripts; in particular, it's possible to shut down a warm-standby recovery server using "pg_ctl stop -m immediate", since delivery of SIGQUIT to the startup subprocess will result in killing the waiting recovery_command. Also, this makes Query Cancel and statement_timeout apply to scripts being run from backends via system(). (There is no support in the core backend for that, but it's widely done using untrusted PLs.) Per gripe from Stephen Harris and subsequent discussion.
* Repair problems with hash indexes that span multiple segments: the hash code'sTom Lane2006-11-19
| | | | | | | | | | | | | | | | | | preference for filling pages out-of-order tends to confuse the sanity checks in md.c, as per report from Balazs Nagy in bug #2737. The fix is to ensure that the smgr-level code always has the same idea of the logical EOF as the hash index code does, by using ReadBuffer(P_NEW) where we are adding a single page to the end of the index, and using smgrextend() to reserve a large batch of pages when creating a new splitpoint. The patch is a bit ugly because it avoids making any changes in md.c, which seems the most prudent approach for a backpatchable beta-period fix. After 8.3 development opens, I'll take a look at a cleaner but more invasive patch, in particular getting rid of the now unnecessary hack to allow reading beyond EOF in mdread(). Backpatch as far as 7.4. The bug likely exists in 7.3 as well, but because of the magnitude of the 7.3-to-7.4 changes in hash, the later-version patch doesn't even begin to apply. Given the other known bugs in the 7.3-era hash code, it does not seem worth trying to develop a separate patch for 7.3.
* Repair two related errors in heap_lock_tuple: it was failing to recognizeTom Lane2006-11-17
| | | | | | | | | cases where we already hold the desired lock "indirectly", either via membership in a MultiXact or because the lock was originally taken by a different subtransaction of the current transaction. These cases must be accounted for to avoid needless deadlocks and/or inappropriate replacement of an exclusive lock with a shared lock. Per report from Clarence Gardner and subsequent investigation.
* String fixPeter Eisentraut2006-11-16
|
* Fix some typos in comments.Neil Conway2006-11-12
|
* Suppress a few 'uninitialized variable' warnings that gcc emits only atTom Lane2006-11-11
| | | | | -O3 or higher (presumably because it inlines more things). Per gripe from Mark Mielke.
* Clean up some misleading references to %p being a full path, per Simon.Tom Lane2006-11-10
|
* Change Windows rename and unlink substitutes so that they time out afterTom Lane2006-11-08
| | | | | | | | 30 seconds instead of retrying forever. Also modify xlog.c so that if it fails to rename an old xlog segment up to a future slot, it will unlink the segment instead. Per discussion of bug #2712, in which it became apparent that Windows can handle unlinking a file that's being held open, but not renaming it.
* Fix recently-understood problems with handling of XID freezing, particularlyTom Lane2006-11-05
| | | | | | | | | | | | | | | in PITR scenarios. We now WAL-log the replacement of old XIDs with FrozenTransactionId, so that such replacement is guaranteed to propagate to PITR slave databases. Also, rather than relying on hint-bit updates to be preserved, pg_clog is not truncated until all instances of an XID are known to have been replaced by FrozenTransactionId. Add new GUC variables and pg_autovacuum columns to allow management of the freezing policy, so that users can trade off the size of pg_clog against the amount of freezing work done. Revise the already-existing code that forces autovacuum of tables approaching the wraparound point to make it more bulletproof; also, revise the autovacuum logic so that anti-wraparound vacuuming is done per-table rather than per-database. initdb forced because of changes in pg_class, pg_database, and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.
* Fix "failed to re-find parent key" btree VACUUM failure by revising pageTom Lane2006-11-01
| | | | | | | | | | | deletion code to avoid the case where an upper-level btree page remains "half dead" for a significant period of time, and to block insertions into a key range that is in process of being re-assigned to the right sibling of the deleted page's parent. This prevents the scenario reported by Ed L. wherein index keys could become out-of-order in the grandparent index level. Since this is a moderately invasive fix, I'm applying it only to HEAD. The bug exists back to 7.4, but the back branches will get a different patch.
* Add some code to CREATE DATABASE to check for pre-existing subdirectoriesTom Lane2006-10-18
| | | | | | that conflict with the OID that we want to use for the new database. This avoids the risk of trying to remove files that maybe we shouldn't remove. Per gripe from Jon Lapham and subsequent discussion of 27-Sep.
* Message style improvementsPeter Eisentraut2006-10-06
|
* Cleanup for pglz_compress code: remove dead code, const-ify API ofTom Lane2006-10-05
| | | | | | remaining functions, simplify pglz_compress's API to not require a useless data copy when compression fails. Also add a check in pglz_decompress that the expected amount of data was decompressed.
* Make use of qsort_arg in several places that were formerly using klugyTom Lane2006-10-05
| | | | | | static variables. This avoids any risk of potential non-reentrancy, and in particular offers a much cleaner workaround for the Intel compiler bug that was affecting ginutil.c.
* pgindent run for 8.2.Bruce Momjian2006-10-04
|
* Make some sentences consistent with similar ones.Bruce Momjian2006-10-03
| | | | Euler Taveira de Oliveira
* Degrade the transaction-id wraparound point message from LOG to DEBUG1, perAlvaro Herrera2006-09-26
| | | | | | discussion. Patch from Simon Riggs.
* Fix free space map to correctly track the total amount of FSM space neededTom Lane2006-09-21
| | | | | | | even when a single relation requires more than max_fsm_pages pages. Also, make VACUUM emit a warning in this case, since it likely means that VACUUM FULL or other drastic corrective measure is needed. Per reports from Jeff Frost and others of unexpected changes in the claimed max_fsm_pages need.
* Improve error message. Per discussionTeodor Sigaev2006-09-14
| | | | http://archives.postgresql.org/pgsql-general/2006-09/msg00186.php
* Remove unnecessary brace pair.Bruce Momjian2006-09-10
|
* If we're going to advertise the array overlap/containment operators,Tom Lane2006-09-10
| | | | | | we probably should make them work reliably for all arrays. Fix code to handle NULLs and multidimensional arrays, move it into arrayfuncs.c. GIN is still restricted to indexing arrays with no null elements, however.
* Rename contains/contained-by operators to @> and <@, per discussion thatTom Lane2006-09-10
| | | | | | | | agreed these symbols are less easily confused. I made new pg_operator entries (with new OIDs) for the old names, so as to provide backward compatibility while making it pretty easy to remove the old names in some future release cycle. This commit only touches the core datatypes, contrib will be fixed separately.
* Fix Intel compiler bug. Per discussionTeodor Sigaev2006-09-05
| | | | | 'GIN FailedAssertions on Itanium2 with Intel compiler' in pgsql-hackers, http://archives.postgresql.org/pgsql-hackers/2006-08/msg01914.php
* Arrange for GetSnapshotData to copy live-subtransaction XIDs from theTom Lane2006-09-03
| | | | | | | | | | PGPROC array into snapshots, and use this information to avoid visits to pg_subtrans in HeapTupleSatisfiesSnapshot. This appears to solve the pg_subtrans-related context swap storm problem that's been reported by several people for 8.1. While at it, modify GetSnapshotData to not take an exclusive lock on ProcArrayLock, as closer analysis shows that shared lock is always sufficient. Itagaki Takahiro and Tom Lane
* Fix BUG #2594: Gin Indexes cause server to crash when it builds on empty tableTeodor Sigaev2006-08-29
|
* Move xact.c's partial support for Lists of TransactionIds into pg_list.h.Tom Lane2006-08-27
| | | | Needed because lock.c is now going to use the same type of list.
* Add the ability to create indexes 'concurrently', that is, withoutTom Lane2006-08-25
| | | | | blocking concurrent writes to the table. Greg Stark, with a little help from Tom Lane.
* Optimize the case where a btree indexscan has current and mark positionsTom Lane2006-08-24
| | | | | | | | on the same index page; we can avoid data copying as well as buffer refcount manipulations in this common case. Makes for a small but noticeable improvement in mergejoin speed. Heikki Linnakangas
* Make the server track an 'XID epoch', that is, maintain higher-order bitsTom Lane2006-08-21
| | | | | | | | | of the transaction ID counter. Nothing is done with the epoch except to store it in checkpoint records, but this provides a foundation with which add-on code can pretend that XIDs never wrap around. This is a severely trimmed and rewritten version of the xxid patch submitted by Marko Kreen. Per discussion, the epoch counter seems the only part of xxid that really needs to be in the core server.
* Now that we've rearranged relation open to get a lock before touchingTom Lane2006-08-18
| | | | | | the rel, it's easy to get rid of the narrow race-condition window that used to exist in VACUUM and CLUSTER. Did some minor code-beautification work in the same area, too.
* Implement archive_timeout feature to force xlog file switches to occur no moreTom Lane2006-08-17
| | | | | | | | | | | than N seconds apart. This allows a simple, if not very high performance, means of guaranteeing that a PITR archive is no more than N seconds behind real time. Also make pg_current_xlog_location return the WAL Write pointer, add pg_current_xlog_insert_location to return the Insert pointer, and fix pg_xlogfile_name_offset to return its results as a two-element record instead of a smashed-together string, as per recent discussion. Simon Riggs
* Add INSERT/UPDATE/DELETE RETURNING, with basic docs and regression tests.Tom Lane2006-08-12
| | | | | | | | plpgsql support to come later. Along the way, convert execMain's SELECT INTO support into a DestReceiver, in order to eliminate some ugly special cases. Jonah Harris and Tom Lane
* Make recovery from WAL be restartable, by executing a checkpoint-likeTom Lane2006-08-07
| | | | | | | | | | operation every so often. This improves the usefulness of PITR log shipping for hot standby: formerly, if the standby server crashed, it was necessary to restart it from the last base backup and replay all the WAL since then. Now it will only need to reread about the same amount of WAL as the master server would. The behavior might also come in handy during a long PITR replay sequence. Simon Riggs, with some editorialization by Tom Lane.
* Add support for forcing a switch to a new xlog file; cause such a switchTom Lane2006-08-06
| | | | | | | to happen automatically during pg_stop_backup(). Add some functions for interrogating the current xlog insertion point and for easily extracting WAL filenames from the hex WAL locations displayed by pg_stop_backup and friends. Simon Riggs with some editorialization by Tom Lane.
* Add missing pgstat_count_index_scan(), per Andreas Seltenreich.Tom Lane2006-08-03
|
* Change the relation_open protocol so that we obtain lock on a relationTom Lane2006-07-31
| | | | | | | | | | | | (table or index) before trying to open its relcache entry. This fixes race conditions in which someone else commits a change to the relation's catalog entries while we are in process of doing relcache load. Problems of that ilk have been reported sporadically for years, but it was not really practical to fix until recently --- for instance, the recent addition of WAL-log support for in-place updates helped. Along the way, remove pg_am.amconcurrent: all AMs are now expected to support concurrent update.
* Modify snapshot definition so that lazy vacuums are ignored by otherAlvaro Herrera2006-07-30
| | | | | | | | | vacuums. This allows a OLTP-like system with big tables to continue regular vacuuming on small-but-frequently-updated tables while the big tables are being vacuumed. Original patch from Hannu Krossing, rewritten by Tom Lane and updated by me.
* Modify btree to delete known-dead index entries without an actual VACUUM.Tom Lane2006-07-25
| | | | | | | | | | When we are about to split an index page to do an insertion, first look to see if any entries marked LP_DELETE exist on the page, and if so remove them to try to make enough space for the desired insert. This should reduce index bloat in heavily-updated tables, although of course you still need VACUUM eventually to clean up the heap. Junji Teramoto
* DTrace support, with a small initial set of probesPeter Eisentraut2006-07-24
| | | | by Robert Lor