aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Fix dynahash.c to suppress hash bucket splits while a hash_seq_search() scanTom Lane2007-04-26
| | | | | | | | | | | | | | | | | | | | | | | is in progress on the same hashtable. This seems the least invasive way to fix the recently-recognized problem that a split could cause the scan to visit entries twice or (with much lower probability) miss them entirely. The only field-reported problem caused by this is the "failed to re-find shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky, which was caused by multiply visited entries. However, it seems certain that mdsync() is vulnerable to missing required fsync's due to missed entries, and I am fearful that RelationCacheInitializePhase2() might be at risk as well. Because of that and the generalized hazard presented by this bug, back-patch all the supported branches. Along the way, fix pg_prepared_statement() and pg_cursor() to not assume that the hashtables they are examining will stay static between calls. This is risky regardless of the newly noted dynahash problem, because hash_seq_search() has never promised to cope with deletion of table entries other than the just-returned one. There may be no bug here because the only supported way to call these functions is via ExecMakeTableFunctionResult() which will cycle them to completion before doing anything very interesting, but it seems best to get rid of the assumption. This affects 8.2 and HEAD only, since those functions weren't there earlier.
* Repair PANIC condition in hash indexes when a previous index extension attemptTom Lane2007-04-19
| | | | | | | | | | | failed (due to lock conflicts or out-of-space). We might have already extended the index's filesystem EOF before failing, causing the EOF to be beyond what the metapage says is the last used page. Hence the invariant maintained by the code needs to be "EOF is at or beyond last used page", not "EOF is exactly the last used page". Problem was created by my patch of 2006-11-19 that attempted to repair bug #2737. Since that was back-patched to 7.4, this needs to be as well. Per report and test case from Vlastimil Krejcir.
* Correct an old logic error in btree page splitting: when considering a splitTom Lane2007-01-27
| | | | | | | | | | | | exactly at the point where we need to insert a new item, the calculation used the wrong size for the "high key" of the new left page. This could lead to choosing an unworkable split, resulting in "PANIC: failed to add item to the left sibling" (or "right sibling") failure. Although this bug has been there a long time, it's very difficult to trigger a failure before 8.2, since there was generally a lot of free space on both sides of a chosen split. In 8.2, where the user-selected fill factor determines how much free space the code tries to leave, an unworkable split is much more likely. Report by Joe Conway, diagnosis and fix by Heikki Linnakangas.
* Repair problems with hash indexes that span multiple segments: the hash code'sTom Lane2006-11-19
| | | | | | | | | | | | | | | | | | preference for filling pages out-of-order tends to confuse the sanity checks in md.c, as per report from Balazs Nagy in bug #2737. The fix is to ensure that the smgr-level code always has the same idea of the logical EOF as the hash index code does, by using ReadBuffer(P_NEW) where we are adding a single page to the end of the index, and using smgrextend() to reserve a large batch of pages when creating a new splitpoint. The patch is a bit ugly because it avoids making any changes in md.c, which seems the most prudent approach for a backpatchable beta-period fix. After 8.3 development opens, I'll take a look at a cleaner but more invasive patch, in particular getting rid of the now unnecessary hack to allow reading beyond EOF in mdread(). Backpatch as far as 7.4. The bug likely exists in 7.3 as well, but because of the magnitude of the 7.3-to-7.4 changes in hash, the later-version patch doesn't even begin to apply. Given the other known bugs in the 7.3-era hash code, it does not seem worth trying to develop a separate patch for 7.3.
* Fix "failed to re-find parent key" btree VACUUM failure by tweakingTom Lane2006-11-01
| | | | | | | | | _bt_pagedel to recover from the failure: just search the whole parent level if searching to the right fails. This does nothing for the underlying problem that index keys became out-of-order in the grandparent level. However, we believe that there is no other consequence worse than slightly inefficient searching, so this narrow patch seems like the safest solution for the back branches.
* Change the backend to reject strings containing invalidly-encoded multibyteTom Lane2006-05-21
| | | | | | | | | | | | | | | | | | | | characters in all cases. Formerly we mostly just threw warnings for invalid input, and failed to detect it at all if no encoding conversion was required. The tighter check is needed to defend against SQL-injection attacks as per CVE-2006-2313 (further details will be published after release). Embedded zero (null) bytes will be rejected as well. The checks are applied during input to the backend (receipt from client or COPY IN), so it no longer seems necessary to check in textin() and related routines; any string arriving at those functions will already have been validated. Conversion failure reporting (for characters with no equivalent in the destination encoding) has been cleaned up and made consistent while at it. Also, fix a few longstanding errors in little-used encoding conversion routines: win1251_to_iso, win866_to_iso, euc_tw_to_big5, euc_tw_to_mic, mic_to_euc_tw were all broken to varying extents. Patches by Tatsuo Ishii and Tom Lane. Thanks to Akio Ishida and Yasuo Ohgaki for identifying the security issues.
* Repair longstanding error in btree xlog replay: XLogReadBuffer should beTom Lane2006-03-28
| | | | | | | | | | passed extend = true whenever we are reading a page we intend to reinitialize completely, even if we think the page "should exist". This is because it might indeed not exist, if the relation got truncated sometime after the current xlog record was made and before the crash we're trying to recover from. These two thinkos appear to explain both of the old bug reports discussed here: http://archives.postgresql.org/pgsql-hackers/2005-05/msg01369.php
* Repair longstanding bug in slru/clog logic: it is possible for two backendsTom Lane2006-01-21
| | | | | | | | | | to try to create a log segment file concurrently, but the code erroneously specified O_EXCL to open(), resulting in a needless failure. Before 7.4, it was even a PANIC condition :-(. Correct code is actually simpler than what we had, because we can just say O_CREAT to start with and not need a second open() call. I believe this accounts for several recent reports of hard-to-reproduce "could not create file ...: File exists" errors in both pg_clog and pg_subtrans.
* Arrange to set the LC_XXX environment variables to match our locale setup.Tom Lane2006-01-05
| | | | Back-patch of previous fix in HEAD for plperl-vs-locale issue.
* Fix longstanding race condition in transaction log management: there was aTom Lane2005-11-03
| | | | | | | | | | | very narrow window in which SimpleLruReadPage or SimpleLruWritePage could think that I/O was needed when it wasn't (and indeed the buffer had already been assigned to another page). This would result in an Assert failure if Asserts were enabled, and probably in silent data corruption if not. Reported independently by Jim Nasby and Robert Creager. I intend a more extensive fix when 8.2 development starts, but this is a reasonably low-impact patch for the existing branches.
* Fix longstanding bug found by Atsushi Ogawa: _bt_check_unique would markTom Lane2005-10-12
| | | | | | the wrong buffer dirty when trying to kill a dead index entry that's on a page after the one it started on. No risk of data corruption, just inefficiency, but still a bug.
* Fix missing rows in queryTeodor Sigaev2005-08-30
| | | | | update a=.. where a... with GiST index on column 'a' Backpatch from 8.0 branch
* Back-patch fixes for problems with VACUUM destroying t_ctid chains too soon,Tom Lane2005-08-25
| | | | | and with insufficient paranoia in code that follows t_ctid links. This patch covers the 7.4 branch.
* Add test to WAL replay to verify that xl_prev points back to the previousTom Lane2005-05-31
| | | | | WAL record; this is necessary to be sure we recognize stale WAL records when a WAL page was only partially written during a system crash.
* Repair very-low-probability race condition between relation extensionTom Lane2005-05-07
| | | | | | | | and VACUUM: in the interval between adding a new page to the relation and formatting it, it was possible for VACUUM to come along and decide it should format the page too. Though not harmful in itself, this would cause data loss if a third transaction were able to insert tuples into the vacuumed page before the original extender got control back.
* Back-port heap_deformtuple() into 7.4 branch; needed for planned fix forTom Lane2005-02-06
| | | | CLUSTER failure after ALTER TABLE SET WITHOUT OIDS.
* Fix memory leak in rtdosplit, per report from Clive Page.Tom Lane2005-01-24
|
* Rearrange order of pre-commit operations: must close cursors before doingTom Lane2004-10-29
| | | | ON COMMIT actions. Per bug report from Michael Guerin.
* Repair possible failure to update hint bits back to disk, perTom Lane2004-10-13
| | | | | | http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php. I plan a more permanent fix in HEAD, but for the back branches it seems best to just touch the places that actually have a problem.
* Make gistindex_keytest safe against NULL values. Same fix was alreadyTom Lane2004-08-27
| | | | | made in passing for 8.0, but now that we have a bug report showing it's needed, we should put it into 7.4 branch.
* Fix bug introduced into _bt_getstackbuf() on 2003-Feb-21: the initialTom Lane2004-08-17
| | | | | | | | | | value of 'start' could be past the end of the page, if the page was split by some concurrent inserting process since we visited it. In this situation the code could look at bogus entries and possibly find a match (since after all those entries still contain what they had before the split). This would lead to 'specified item offset is too large' followed by 'PANIC: failed to add item to the page', as reported by Joe Conway for scenarios involving heavy concurrent insertion activity.
* Fix failure to guarantee that a checkpoint will write out pg_clog updatesTom Lane2004-08-11
| | | | | | for transaction commits that occurred just before the checkpoint. This is an EXTREMELY serious bug --- kudos to Satoshi Okada for creating a reproducible test case to prove its existence.
* Replace opendir/closedir calls throughout the backend with AllocateDirTom Lane2004-02-23
| | | | | | | | | | and FreeDir routines modeled on the existing AllocateFile/FreeFile. Like the latter, these routines will avoid failing on EMFILE/ENFILE conditions whenever possible, and will prevent leakage of directory descriptors if an elog() occurs while one is open. Also, reduce PANIC to ERROR in MoveOfflineLogs() --- this is not critical code and there is no reason to force a DB restart on failure. All per recent trouble report from Olivier Hubaut.
* Repair incorrect order of operations in GetNewTransactionId(). We mustTom Lane2004-01-26
| | | | | | complete ExtendCLOG() before advancing nextXid, so that if that routine fails, the next incoming transaction will try it again. Per trouble report from Christopher Kings-Lynne.
* Fix bad interaction between NOTIFY processing and V3 extended queryTom Lane2003-10-16
| | | | | | | | | | protocol, per report from Igor Shevchenko. NOTIFY thought it could do its thing if transaction blockState is TBLOCK_DEFAULT, but in reality it had better check the low-level transaction state is TRANS_DEFAULT as well. Formerly it was not possible to wait for the client in a state where the first is true and the second is not ... but now we can have such a state. Minor cleanup in StartTransaction() as well.
* Repair RI trigger visibility problems (this time for sure ;-)) per recentTom Lane2003-10-01
| | | | | | | discussion on pgsql-hackers: in READ COMMITTED mode we just have to force a QuerySnapshot update in the trigger, but in SERIALIZABLE mode we have to run the scan under a current snapshot and then complain if any rows would be updated/deleted that are not visible in the transaction snapshot.
* Adjust btree index build procedure so that the btree metapage looksTom Lane2003-09-29
| | | | | | | | | | invalid (has the wrong magic number) until the build is entirely complete. This turns out to cost no additional writes in the normal case, since we were rewriting the metapage at the end of the process anyway. In normal scenarios there's no real gain in security, because a failed index build would roll back the transaction leaving an unused index file, but for rebuilding shared system indexes this seems to add some useful protection.
* Add a mechanism to let dynamically loaded modules register post-commit/Tom Lane2003-09-28
| | | | | | post-abort cleanup hooks. I'm surprised that we have not needed this already, but I need it now to fix a plpgsql problem, and the usefulness for other dynamically loaded modules seems obvious.
* Fix typo in message.Tom Lane2003-09-27
|
* Various message fixes, among those fixes for the previous round of fixesPeter Eisentraut2003-09-26
|
* Message editing: remove gratuitous variations in message wording, standardizePeter Eisentraut2003-09-25
| | | | | terms, add some clarifications, fix some untranslatable attempts at dynamic message building.
* Repair some REINDEX problems per recent discussions. The relcache isTom Lane2003-09-24
| | | | | | | | | | | | | now able to cope with assigning new relfilenode values to nailed-in-cache indexes, so they can be reindexed using the fully crash-safe method. This leaves only shared system indexes as special cases. Remove the 'index deactivation' code, since it provides no useful protection in the shared- index case. Require reindexing of shared indexes to be done in standalone mode, but remove other restrictions on REINDEX. -P (IgnoreSystemIndexes) now prevents using indexes for lookups, but does not disable index updates. It is therefore safe to allow from PGOPTIONS. Upshot: reindexing system catalogs can be done without a standalone backend for all cases except shared catalogs.
* Fix LISTEN/NOTIFY race condition reported by Gavin Sherry. While aTom Lane2003-09-15
| | | | | | | | | | | | | | | really general fix might be difficult, I believe the only case where AtCommit_Notify could see an uncommitted tuple is where the other guy has just unlistened and not yet committed. The best solution seems to be to just skip updating that tuple, on the assumption that the other guy does not want to hear about the notification anyway. This is not perfect --- if the other guy rolls back his unlisten instead of committing, then he really should have gotten this notify. But to do that, we'd have to wait to see if he commits or not, or make UNLISTEN hold exclusive lock on pg_listener until commit. Either of these answers is deadlock-prone, not to mention horrible for interactive performance. Do it this way for now. (What happened to that project to do LISTEN/NOTIFY in memory with no table, anyway?)
* Reimplement hash index locking algorithms, per my recent proposal toTom Lane2003-09-04
| | | | | | | | pghackers. This fixes the problem recently reported by Markus KrÌutner (hash bucket split corrupts the state of scans being done concurrently), and I believe it also fixes all the known problems with deadlocks in hash index operations. Hash indexes are still not really ready for prime time (since they aren't WAL-logged), but this is a step forward.
* In _bt_check_unique() loop, don't bother applying _bt_isequal() toTom Lane2003-09-02
| | | | | | | | killed items; just skip to the next item immediately. Only check for key equality when we reach a non-killed item or the end of the index page. This saves key comparisons when there are lots of killed items, as for example in a heavily-updated table that's not been vacuumed lately. Seems to be a win for pgbench anyway.
* Several fixes for hash indexes that involve changing the on-disk indexTom Lane2003-09-02
| | | | | | | | | layout; therefore, this change forces REINDEX of hash indexes (though not a full initdb). Widen hashm_ntuples to double so that hash space management doesn't get confused by more than 4G entries; enlarge the allowed number of free-space-bitmap pages; replace the useless bshift field with a useful bmshift field; eliminate 4 bytes of wasted space in the per-page special area.
* Fix a couple typos, add some more comments.Tom Lane2003-09-02
|
* Rewrite hashbulkdelete() to make it amenable to new bucket lockingTom Lane2003-09-02
| | | | | | | scheme. A pleasant side effect is that it is *much* faster when deleting a large fraction of the indexed tuples, because of elimination of redundant hash_step activity induced by hash_adjscans. Various other continuing code cleanup.
* Preliminary cleanup for hash index code (doesn't attack the locking problemTom Lane2003-09-01
| | | | | | | | | yet). Fix a couple of bugs that would only appear if multiple bitmap pages are used, including a buffer reference leak and incorrect computation of bit indexes. Get rid of 'overflow address' concept, which accomplished nothing except obfuscating the code and creating a risk of failure due to limited range of offset field. Rename some misleadingly-named fields and routines, and improve documentation.
* Add some internals documentation for hash indexes, including anTom Lane2003-09-01
| | | | | | explanation of the remarkably confusing page addressing scheme. The file also includes my planned-but-not-yet-implemented revision of the hash index locking scheme.
* Rewriter and planner should use only resno, not resname, to identifyTom Lane2003-08-11
| | | | | | | target columns in INSERT and UPDATE targetlists. Don't rely on resname to be accurate in ruleutils, either. This fixes bug reported by Donald Fraser, in which renaming a column referenced in a rule did not work very well.
* Repair potential deadlock created by recent changes to recycle btreeTom Lane2003-08-10
| | | | | | | | | | index pages: when _bt_getbuf asks the FSM for a free index page, it is possible (and, in some cases, even moderately likely) that the answer will be the same page that _bt_split is trying to split. _bt_getbuf already knew that the returned page might not be free, but it wasn't prepared for the possibility that even trying to lock the page could be problematic. Fix by doing a conditional rather than unconditional grab of the page lock.
* Another pgindent run with updated typedefs.Bruce Momjian2003-08-08
|
* Suppress unused-variable warnings when building without Asserts.Tom Lane2003-08-08
|
* Rename fields of DestReceiver to avoid collisions with (ill-considered)Tom Lane2003-08-06
| | | | macros in some platforms' sys/socket.h.
* Fix some copyright notices that weren't updated. Improve copyright toolTom Lane2003-08-04
| | | | so it won't miss 'em again.
* Update copyrights to 2003.Bruce Momjian2003-08-04
|
* pgindent run.Bruce Momjian2003-08-04
|
* Fix longstanding error in _bt_search(): should moveright at top of loop notTom Lane2003-07-29
| | | | | | | | | bottom. Otherwise we fail to moveright when the root page was split while we were "in flight" to it. This is not a significant problem when the root is above the leaf level, but if the root was also a leaf (ie, a single-page index just got split) we may return the wrong leaf page to the caller, resulting in failure to find a key that is in fact present. Bug has existed at least since 7.1, probably forever.
* A visit from the message-style police ...Tom Lane2003-07-28
|