aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Fix possible duplicate tuples while GiST scan. Now page is processedTeodor Sigaev2008-08-23
| | | | | | | | | at once and ItemPointers are collected in memory. Remove tuple's killing by killtuple() if tuple was moved to another page - it could produce unaceptable overhead. Backpatch up to 8.1 because the bug was introduced by GiST's concurrency support.
* Remove arbitrary 10MB limit on two-phase state file size. It's not that hardHeikki Linnakangas2008-05-19
| | | | | | | | | | | | | | | | to go beoynd 10MB, as demonstrated by Gavin Sharry's example of dropping a schema with ~25000 objects. The really bogus thing about the limit was that it was enforced when a state file file was read in, not when it was written, so you would end up with a prepared transaction that you can't commit or abort, and the only recourse was to shut down the server and remove the file by hand. Raise the limit to MaxAllocSize, and enforce it also when a state file is written. We could've removed the limit altogether, but reading in a file larger than MaxAllocSize would fail anyway because we read it into a palloc'd buffer. Backpatch down to 8.1, where 2PC and this issue was introduced.
* Don't try to close negative file descriptors, since this can causeMagnus Hagander2008-05-13
| | | | | | | crashes on certain platforms. In particular, the MSVC runtime is known to do this. Fixes bug #4162, reported and diagnosed by Javier Pimas
* Change hashscan.c to keep its list of active hash index scans inTom Lane2008-03-07
| | | | | | | | | | | | | TopMemoryContext, rather than scattered through executor per-query contexts. This poses no danger of memory leak since the ResourceOwner mechanism guarantees release of no-longer-needed items. It is needed because the per-query context might already be released by the time we try to clean up the hash scan list. Report by ykhuang, diagnosis by Heikki. Back-patch to 8.0, where the ResourceOwner-based cleanup was introduced. The given test case does not fail before 8.2, probably because we rearranged transaction abort processing somehow; but this coding is undoubtedly risky so I'll patch 8.0 and 8.1 anyway.
* Make standard maintenance operations (including VACUUM, ANALYZE, REINDEX,Tom Lane2008-01-03
| | | | | | | | | | | | | | | | | | | and CLUSTER) execute as the table owner rather than the calling user, using the same privilege-switching mechanism already used for SECURITY DEFINER functions. The purpose of this change is to ensure that user-defined functions used in index definitions cannot acquire the privileges of a superuser account that is performing routine maintenance. While a function used in an index is supposed to be IMMUTABLE and thus not able to do anything very interesting, there are several easy ways around that restriction; and even if we could plug them all, there would remain a risk of reading sensitive information and broadcasting it through a covert channel such as CPU usage. To prevent bypassing this security measure, execution of SET SESSION AUTHORIZATION and SET ROLE is now forbidden within a SECURITY DEFINER context. Thanks to Itagaki Takahiro for reporting this vulnerability. Security: CVE-2007-6600
* Make archive recovery always start a new timeline, rather than only when aTom Lane2007-09-29
| | | | | | | recovery stop time was used. This avoids a corner-case risk of trying to overwrite an existing archived copy of the last WAL segment, and seems simpler and cleaner all around than the original definition. Per example from Jon Colverson and subsequent analysis by Simon.
* Improve page split in rtree emulation. Now if splitted result hasTeodor Sigaev2007-09-07
| | | | | | | big misalignement, then it tries to split page basing on distribution of boxe's centers. Per report from Dolafi, Tom <dolafit@janelia.hhmi.org>
* Suppress time zone name (%Z) when logging timestamps in xlog.c startupTom Lane2007-08-04
| | | | | | | on Windows. This is yet another manifestation of the problem that Windows returns time zone names that may be in a different encoding than we are using. I've put a better solution in HEAD, but the back branches need a simple patch. Per report from Hiroshi Saito.
* Fix performance problems in multi-batch hash joins by ensuring that we selectTom Lane2007-06-01
| | | | | | | | | | a well-randomized batch number even when given a poorly-randomized hash value. This is a bit inefficient but seems the only practical solution given the constraint that we can't change the hash functions in released branches. Per report from Joseph Shraibman. Applied to 8.1 and 8.2 only --- HEAD is getting a cleaner fix, and 8.0 and before use different coding that seems less vulnerable.
* Fix overly-strict sanity check in BeginInternalSubTransaction that made itTom Lane2007-05-30
| | | | | | fail when used in a deferred trigger. Bug goes back to 8.0; no doubt the reason it hadn't been noticed is that we've been discouraging use of user-defined constraint triggers. Per report from Frank van Vugt.
* Fix dynahash.c to suppress hash bucket splits while a hash_seq_search() scanTom Lane2007-04-26
| | | | | | | | | | | | | | | | | | | | | | | is in progress on the same hashtable. This seems the least invasive way to fix the recently-recognized problem that a split could cause the scan to visit entries twice or (with much lower probability) miss them entirely. The only field-reported problem caused by this is the "failed to re-find shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky, which was caused by multiply visited entries. However, it seems certain that mdsync() is vulnerable to missing required fsync's due to missed entries, and I am fearful that RelationCacheInitializePhase2() might be at risk as well. Because of that and the generalized hazard presented by this bug, back-patch all the supported branches. Along the way, fix pg_prepared_statement() and pg_cursor() to not assume that the hashtables they are examining will stay static between calls. This is risky regardless of the newly noted dynahash problem, because hash_seq_search() has never promised to cope with deletion of table entries other than the just-returned one. There may be no bug here because the only supported way to call these functions is via ExecMakeTableFunctionResult() which will cycle them to completion before doing anything very interesting, but it seems best to get rid of the assumption. This affects 8.2 and HEAD only, since those functions weren't there earlier.
* Repair PANIC condition in hash indexes when a previous index extension attemptTom Lane2007-04-19
| | | | | | | | | | | failed (due to lock conflicts or out-of-space). We might have already extended the index's filesystem EOF before failing, causing the EOF to be beyond what the metapage says is the last used page. Hence the invariant maintained by the code needs to be "EOF is at or beyond last used page", not "EOF is exactly the last used page". Problem was created by my patch of 2006-11-19 that attempted to repair bug #2737. Since that was back-patched to 7.4, this needs to be as well. Per report and test case from Vlastimil Krejcir.
* Disallow committing a prepared transaction unless we are in the same databaseTom Lane2007-02-13
| | | | | it was executed in. Someday it might be nice to allow cross-DB commits, but work would be needed in NOTIFY and perhaps other places. Per Heikki.
* Correct an old logic error in btree page splitting: when considering a splitTom Lane2007-01-27
| | | | | | | | | | | | exactly at the point where we need to insert a new item, the calculation used the wrong size for the "high key" of the new left page. This could lead to choosing an unworkable split, resulting in "PANIC: failed to add item to the left sibling" (or "right sibling") failure. Although this bug has been there a long time, it's very difficult to trigger a failure before 8.2, since there was generally a lot of free space on both sides of a chosen split. In 8.2, where the user-selected fill factor determines how much free space the code tries to leave, an unworkable split is much more likely. Report by Joe Conway, diagnosis and fix by Heikki Linnakangas.
* Repair problems with hash indexes that span multiple segments: the hash code'sTom Lane2006-11-19
| | | | | | | | | | | | | | | | | | preference for filling pages out-of-order tends to confuse the sanity checks in md.c, as per report from Balazs Nagy in bug #2737. The fix is to ensure that the smgr-level code always has the same idea of the logical EOF as the hash index code does, by using ReadBuffer(P_NEW) where we are adding a single page to the end of the index, and using smgrextend() to reserve a large batch of pages when creating a new splitpoint. The patch is a bit ugly because it avoids making any changes in md.c, which seems the most prudent approach for a backpatchable beta-period fix. After 8.3 development opens, I'll take a look at a cleaner but more invasive patch, in particular getting rid of the now unnecessary hack to allow reading beyond EOF in mdread(). Backpatch as far as 7.4. The bug likely exists in 7.3 as well, but because of the magnitude of the 7.3-to-7.4 changes in hash, the later-version patch doesn't even begin to apply. Given the other known bugs in the 7.3-era hash code, it does not seem worth trying to develop a separate patch for 7.3.
* Repair two related errors in heap_lock_tuple: it was failing to recognizeTom Lane2006-11-17
| | | | | | | | | cases where we already hold the desired lock "indirectly", either via membership in a MultiXact or because the lock was originally taken by a different subtransaction of the current transaction. These cases must be accounted for to avoid needless deadlocks and/or inappropriate replacement of an exclusive lock with a shared lock. Per report from Clarence Gardner and subsequent investigation.
* Fix "failed to re-find parent key" btree VACUUM failure by tweakingTom Lane2006-11-01
| | | | | | | | | _bt_pagedel to recover from the failure: just search the whole parent level if searching to the right fails. This does nothing for the underlying problem that index keys became out-of-order in the grandparent level. However, we believe that there is no other consequence worse than slightly inefficient searching, so this narrow patch seems like the safest solution for the back branches.
* Don't try to truncate multixact SLRU files in checkpoints done during xlogTom Lane2006-07-20
| | | | | | | | | | | recovery. In the first place, it doesn't work because slru's latest_page_number isn't set up yet (this is why we've been hearing reports of strange "apparent wraparound" log messages during crash recovery, but only from people who'd managed to advance their next-mxact counters some considerable distance from 0). In the second place, it seems a bit unwise to be throwing away data during crash recovery anwyway. This latter consideration convinces me to just disable truncation during recovery, rather than computing latest_page_number and pushing ahead.
* pg_stop_backup was calling XLogArchiveNotify() twice for the newly createdTom Lane2006-06-22
| | | | | backup history file. Bug introduced by the 8.1 change to make pg_stop_backup delete older history files. Per report from Masao Fujii.
* Disable full_page_writes, because turning it off risks causing crash-recoveryTom Lane2006-03-28
| | | | | | | | | | failures even when the hardware and OS did nothing wrong. Per recent analysis of a problem report from Alex Bahdushka. For the moment I've just diked out the test of the parameter, rather than removing the GUC infrastructure and documentation, in case we conclude that there's something salvageable there. There seems no chance of it being resurrected in the 8.1 branch though.
* Repair longstanding error in btree xlog replay: XLogReadBuffer should beTom Lane2006-03-28
| | | | | | | | | | passed extend = true whenever we are reading a page we intend to reinitialize completely, even if we think the page "should exist". This is because it might indeed not exist, if the relation got truncated sometime after the current xlog record was made and before the crash we're trying to recover from. These two thinkos appear to explain both of the old bug reports discussed here: http://archives.postgresql.org/pgsql-hackers/2005-05/msg01369.php
* Add a CHECK_FOR_INTERRUPTS() in _bt_buildadd(). This fixes problemTom Lane2006-03-10
| | | | | with not responding to query cancel during the last stage of btree index creation.
* Move btbulkdelete's vacuum_delay_point() call to a place in the loop whereTom Lane2006-02-14
| | | | | | | | we are not holding a buffer content lock; where it was, InterruptHoldoffCount is positive and so we'd not respond to cancel signals as intended. Also add missing vacuum_delay_point() call in btvacuumcleanup. This should fix complaint from Evgeny Gridasov about failure to respond to SIGINT/SIGTERM in a timely fashion (bug #2257).
* Add some missing vacuum_delay_point calls in GIST vacuuming.Tom Lane2006-02-14
|
* Repair longstanding bug in slru/clog logic: it is possible for two backendsTom Lane2006-01-21
| | | | | | | | | | to try to create a log segment file concurrently, but the code erroneously specified O_EXCL to open(), resulting in a needless failure. Before 7.4, it was even a PANIC condition :-(. Correct code is actually simpler than what we had, because we can just say O_CREAT to start with and not need a second open() call. I believe this accounts for several recent reports of hard-to-reproduce "could not create file ...: File exists" errors in both pg_clog and pg_subtrans.
* Repair problems with the result of lookup_rowtype_tupdesc() possibly beingTom Lane2006-01-17
| | | | | | | discarded by cache flush while still in use. This is a minimal patch that just copies the tupdesc anywhere it could be needed across a flush. Applied to back branches only; Neil Conway is working on a better long-term solution for HEAD.
* Add RelationOpenSmgr() calls to ensure rd_smgr is valid when we try toTom Lane2006-01-07
| | | | | | use it. While it normally has been opened earlier during btree index build, testing shows that it's possible for the link to be closed again if an sinval reset occurs while the index is being built.
* Convert Assert checking for empty page into a regular test and elog.Tom Lane2006-01-06
| | | | | The consequences of overwriting a non-empty page are bad enough that we should not omit this test in production builds.
* Arrange to set the LC_XXX environment variables to match our locale setup.Tom Lane2006-01-05
| | | | Back-patch of previous fix in HEAD for plperl-vs-locale issue.
* Adjust string comparison so that only bitwise-equal strings are consideredTom Lane2005-12-22
| | | | | | | | | | | | equal: if strcoll claims two strings are equal, check it with strcmp, and sort according to strcmp if not identical. This fixes inconsistent behavior under glibc's hu_HU locale, and probably under some other locales as well. Also, take advantage of the now-well-defined behavior to speed up texteq, textne, bpchareq, bpcharne: they may as well just do a bitwise comparison and not bother with strcoll at all. NOTE: affected databases may need to REINDEX indexes on text columns to be sure they are self-consistent.
* Re-run pgindent, fixing a problem where comment lines after a blankBruce Momjian2005-11-22
| | | | | | | | | comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.
* Modify tuptoaster's API so that it does not try to modify the passedTom Lane2005-11-20
| | | | | | | | | tuple in-place, but instead passes back an all-new tuple structure if any changes are needed. This is a much cleaner and more robust solution for the bug discovered by Alexey Beschiokov; accordingly, revert the quick hack I installed yesterday. With this change, HeapTupleData.t_datamcxt is no longer needed; will remove it in a separate commit in HEAD only.
* Rename the members of CommandDest enum so they don't collide with other uses ofAlvaro Herrera2005-11-03
| | | | | those names. (Debug and None were pretty bad names anyway.) I hope I catched all uses of the names in comments too.
* Fix longstanding race condition in transaction log management: there was aTom Lane2005-11-03
| | | | | | | | | | | very narrow window in which SimpleLruReadPage or SimpleLruWritePage could think that I/O was needed when it wasn't (and indeed the buffer had already been assigned to another page). This would result in an Assert failure if Asserts were enabled, and probably in silent data corruption if not. Reported independently by Jim Nasby and Robert Creager. I intend a more extensive fix when 8.2 development starts, but this is a reasonably low-impact patch for the existing branches.
* Message correctionsPeter Eisentraut2005-10-29
|
* Reorder code so that we don't have to hold a critical section whileTom Lane2005-10-28
| | | | | reserving SLRU space for a new MultiXact. The original coding would have treated out-of-disk-space as a PANIC condition, which is unnecessary.
* Fix race condition in multixact code: it's possible to try to read aTom Lane2005-10-28
| | | | | | | | | | multixact's starting offset before the offset has been stored into the SLRU file. A simple fix would be to hold the MultiXactGenLock until the offset has been stored, but that looks like a big concurrency hit. Instead rely on knowledge that unset offsets will be zero, and loop when we see a zero. This requires a little extra hacking to ensure that zero is never a valid value for the offset. Problem reported by Matteo Beccati, fix ideas from Martijn van Oosterhout, Alvaro Herrera, and Tom Lane.
* Make code for selecting default WAL sync method less confusing.Tom Lane2005-10-22
|
* Better solution to the problem of labeling whole-row Datums that areTom Lane2005-10-19
| | | | | | generated from subquery outputs: use the type info stored in the Var itself. To avoid making ExecEvalVar and slot_getattr more complex and slower, I split out the whole-row case into a separate ExecEval routine.
* Ensure that the Datum generated from a whole-row Var contains validTom Lane2005-10-19
| | | | | | type ID information even when it's a record type. This is needed to handle whole-row Vars referencing subquery outputs. Per example from Richard Huxton.
* A few trivial code cleanups motivated by reading warnings generatedTom Lane2005-10-18
| | | | | by a recent HP C compiler. Mostly, get rid of useless local variables that are assigned to but never used.
* Standard pgindent run for 8.1.Bruce Momjian2005-10-15
|
* This makes the error messages for PREPARE TRANSACTION, COMMIT PREPAREDBruce Momjian2005-10-13
| | | | | | | etc. match the docs, which talk about "transaction identifier" not "gid" or "global transaction identifier". Steve Woodcock
* Back out this because of fear of changing error strings:Bruce Momjian2005-10-13
| | | | | | | | This makes the error messages for PREPARE TRANSACTION, COMMIT PREPARED etc. match the docs, which talk about "transaction identifier" not "gid" or "global transaction identifier". Steve Woodcock
* This makes the error messages for PREPARE TRANSACTION, COMMIT PREPAREDBruce Momjian2005-10-13
| | | | | | | etc. match the docs, which talk about "transaction identifier" not "gid" or "global transaction identifier". Steve Woodcock
* Fix longstanding bug found by Atsushi Ogawa: _bt_check_unique would markTom Lane2005-10-12
| | | | | | the wrong buffer dirty when trying to kill a dead index entry that's on a page after the one it started on. No risk of data corruption, just inefficiency, but still a bug.
* Revise pgstats stuff to fix the problems with not counting accessesTom Lane2005-10-06
| | | | | | | generated by bitmap index scans. Along the way, simplify and speed up the code for counting sequential and index scans; it was both confusing and inefficient to be taking care of that in the per-tuple loops, IMHO. initdb forced because of internal changes in pg_stat view definitions.
* Expand pg_control information so that we can verify that the databaseTom Lane2005-10-03
| | | | | | was created on a machine with alignment rules and floating-point format similar to the current machine. Per recent discussion, this seems like a good idea with the increasing prevalence of 32/64 bit environments.
* Clean up possibly-uninitialized-variable warnings reported by gcc 4.x.Tom Lane2005-09-24
|
* pgindent new GIST index code, per request from Tom.Bruce Momjian2005-09-22
|