aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Move the HTSU_Result enum definition into snapshot.h, to avoid includingAlvaro Herrera2008-03-26
| | | | | | tqual.h into heapam.h. This makes all inclusion of tqual.h explicit. I also sorted alphabetically the includes on some source files.
* Rename snapmgmt.c/h to snapmgr.c/h, for consistency with other files.Alvaro Herrera2008-03-26
| | | | Per complaint from Tom Lane.
* Separate snapshot management code from tuple visibility code, create aAlvaro Herrera2008-03-26
| | | | | | | | | | | | | snapmgmt.c file for the former. The header files have also been reorganized in three parts: the most basic snapshot definitions are now in a new file snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c. tqual.h has been reduced to the bare minimum. This patch is just a first step towards managing live snapshots within a transaction; there is no functionality change. Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and subsequent discussion.
* Simplify and standardize conversions between TEXT datums and ordinary CTom Lane2008-03-25
| | | | | | | | | | | | | | | | | | | | strings. This patch introduces four support functions cstring_to_text, cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and two macros CStringGetTextDatum and TextDatumGetCString. A number of existing macros that provided variants on these themes were removed. Most of the places that need to make such conversions now require just one function or macro call, in place of the multiple notational layers that used to be needed. There are no longer any direct calls of textout or textin, and we got most of the places that were using handmade conversions via memcpy (there may be a few still lurking, though). This commit doesn't make any serious effort to eliminate transient memory leaks caused by detoasting toasted text objects before they reach text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few places where it was easy, but much more could be done. Brendan Jurd and Tom Lane
* More README src cleanups.Bruce Momjian2008-03-21
|
* Make source code READMEs more consistent. Add CVS tags to all README files.Bruce Momjian2008-03-20
|
* Enable probes to work with Mac OS X Leopard and other OSes that willPeter Eisentraut2008-03-17
| | | | | | | | | | | support DTrace in the future. Switch from using DTRACE_PROBEn macros to the dynamically generated macros. Use "dtrace -h" to create a header file that contains the dynamically generated macros to be used in the source code instead of the DTRACE_PROBEn macros. A dummy header file is generated for builds without DTrace support. Author: Robert Lor <Robert.Lor@sun.com>
* Fix TransactionIdIsCurrentTransactionId() to use binary search instead ofTom Lane2008-03-17
| | | | | | | | | linear search when checking child-transaction XIDs. This makes for an important speedup in transactions that have large numbers of children, as in a recent example from Craig Ringer. We can also get rid of an ugly kluge that represented lists of TransactionIds as lists of OIDs. Heikki Linnakangas
* When creating a large hash index, pre-sort the index entries by estimatedTom Lane2008-03-16
| | | | | | | | | | bucket number, so as to ensure locality of access to the index during the insertion step. Without this, building an index significantly larger than available RAM takes a very long time because of thrashing. On the other hand, sorting is just useless overhead when the index does fit in RAM. We choose to sort when the initial index size exceeds effective_cache_size. This is a revised version of work by Tom Raney and Shreya Bhargava.
* Change hash index creation so that rather than always establishing exactlyTom Lane2008-03-15
| | | | | | | | | | | two buckets at the start, we create a number of buckets appropriate for the estimated size of the table. This avoids a lot of expensive bucket-split actions during initial index build on an already-populated table. This is one of the two core ideas of Tom Raney and Shreya Bhargava's patch to reduce hash index build time. I'm committing it separately to make it easier for people to test the effects of this separately from the effects of their other core idea (pre-sorting the index entries by bucket number).
* Fix heap_page_prune's problem with failing to send cache invalidationTom Lane2008-03-13
| | | | | | | | | | | messages if the calling transaction aborts later on. Collapsing out line pointer redirects is a done deal as soon as we complete the page update, so syscache *must* be notified even if the VACUUM FULL as a whole doesn't complete. To fix, add some functionality to inval.c to allow the pending inval messages to be sent immediately while heap_page_prune is still running. The implementation is a bit chintzy: it will only work in the context of VACUUM FULL. But that's all we need now, and it can always be extended later if needed. Per my trouble report of a week ago.
* Make TransactionIdIsInProgress check transam.c's single-item XID status cacheTom Lane2008-03-11
| | | | | | | | before it goes groveling through the ProcArray. In situations where the same recently-committed transaction ID is checked repeatedly by tqual.c, this saves a lot of shared-memory searches. And it's cheap enough that it shouldn't hurt noticeably when it doesn't help. Concept and patch by Simon, some minor tweaking and comment-cleanup by Tom.
* Remove no-longer-used XLogCacheByte field of XLogCtl.Tom Lane2008-03-10
| | | | Itagaki Takahiro
* Refactor heap_page_prune so that instead of changing item states on-the-fly,Tom Lane2008-03-08
| | | | | | | | | | | | | | | | | it accumulates the set of changes to be made and then applies them. It had to accumulate the set of changes anyway to prepare a WAL record for the pruning action, so this isn't an enormous change; the only new complexity is to not doubly mark tuples that are visited twice in the scan. The main advantage is that we can substantially reduce the scope of the critical section in which the changes are applied, thus avoiding PANIC in foreseeable cases like running out of memory in inval.c. A nice secondary advantage is that it is now far clearer that WAL replay will actually do the same thing that the original pruning did. This commit doesn't do anything about the open problem that CacheInvalidateHeapTuple doesn't have the right semantics for a CTID change caused by collapsing out a redirect pointer. But whatever we do about that, it'll be a good idea to not do it inside a critical section.
* This patch addresses some issues in TOAST compression strategy thatTom Lane2008-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | were discussed last year, but we felt it was too late in the 8.3 cycle to change the code immediately. Specifically, the patch: * Reduces the minimum datum size to be considered for compression from 256 to 32 bytes, as suggested by Greg Stark. * Increases the required compression rate for compressed storage from 20% to 25%, again per Greg's suggestion. * Replaces force_input_size (size above which compression is forced) with a maximum size to be considered for compression. It was agreed that allowing large inputs to escape the minimum-compression-rate requirement was not bright, and that indeed we'd rather have a knob that acted in the other direction. I set this value to 1MB for the moment, but it could use some performance studies to tune it. * Adds an early-failure path to the compressor as suggested by Jan: if it's been unable to find even one compressible substring in the first 1KB (parameterizable), assume we're looking at incompressible input and give up. (Possibly this logic can be improved, but I'll commit it as-is for now.) * Improves the toasting heuristics so that when we have very large fields with attstorage 'x' or 'e', we will push those out to toast storage before considering inline compression of shorter fields. This also responds to a suggestion of Greg's, though my original proposal for a solution was a bit off base because it didn't fix the problem for large 'e' fields. There was some discussion in the earlier threads of exposing some of the compression knobs to users, perhaps even on a per-column basis. I have not done anything about that here. It seems to me that if we are changing around the parameters, we'd better get some experience and be sure we are happy with the design before we set things in stone by providing user-visible knobs.
* Change hashscan.c to keep its list of active hash index scans inTom Lane2008-03-07
| | | | | | | | | | | | | TopMemoryContext, rather than scattered through executor per-query contexts. This poses no danger of memory leak since the ResourceOwner mechanism guarantees release of no-longer-needed items. It is needed because the per-query context might already be released by the time we try to clean up the hash scan list. Report by ykhuang, diagnosis by Heikki. Back-patch to 8.0, where the ResourceOwner-based cleanup was introduced. The given test case does not fail before 8.2, probably because we rearranged transaction abort processing somehow; but this coding is undoubtedly risky so I'll patch 8.0 and 8.1 anyway.
* Fix PREPARE TRANSACTION to reject the case where the transaction has dropped aTom Lane2008-03-04
| | | | | | | temporary table; we can't support that because there's no way to clean up the source backend's internal state if the eventual COMMIT PREPARED is done by another backend. This was checked correctly in 8.1 but I broke it in 8.2 :-(. Patch by Heikki Linnakangas, original trouble report by John Smith.
* Reducing the assumed alignment of struct varlena means that the compilerTom Lane2008-02-29
| | | | | | | | | | is also licensed to put a local variable declared that way at an unaligned address. Which will not work if the variable is then manipulated with SET_VARSIZE or other macros that assume alignment. So the previous patch is not an unalloyed good, but on balance I think it's still a win, since we have very few places that do that sort of thing. Fix the one place in tuptoaster.c that does it. Per buildfarm results from gypsy_moth (I'm a bit surprised that only one machine showed a failure).
* Change the declaration of struct varlena so that the length word isTom Lane2008-02-23
| | | | | | | | | | | | | | | represented as "char ...[4]" not "int32". Since the length word is never supposed to be accessed via this struct member anyway, this won't break any existing code that is following the rules. The advantage is that C compilers will no longer assume that a pointer to struct varlena is word-aligned, which prevents incorrect optimizations in TOAST-pointer access and perhaps other places. gcc doesn't seem to do this (at least not at -O2), but the problem is demonstrable on some other compilers. I changed struct inet as well, but didn't bother to touch a lot of other struct definitions in which it wouldn't make any difference because there were other fields forcing int alignment anyway. Hopefully none of those struct definitions are used for accessing unaligned Datums.
* Remove another target I forgot during the refactoringPeter Eisentraut2008-02-19
|
* Refactor backend makefiles to remove lots of duplicate codePeter Eisentraut2008-02-19
|
* Replace time_t with pg_time_t (same values, but always int64) in on-diskTom Lane2008-02-17
| | | | | | | | | | | | | | data structures and backend internal APIs. This solves problems we've seen recently with inconsistent layout of pg_control between machines that have 32-bit time_t and those that have already migrated to 64-bit time_t. Also, we can get out from under the problem that Windows' Unix-API emulation is not consistent about the width of time_t. There are a few remaining places where local time_t variables are used to hold the current or recent result of time(NULL). I didn't bother changing these since they do not affect any cross-module APIs and surely all platforms will have 64-bit time_t before overflow becomes an actual risk. time_t should be avoided for anything visible to extension modules, however.
* Add a GUC variable "synchronize_seqscans" to allow clients to disable the newTom Lane2008-01-30
| | | | | synchronized-scanning behavior, and make pg_dump disable sync scans so that it will reliably preserve row ordering. Per recent discussions.
* Provide a clearer error message if the pg_control version number looksPeter Eisentraut2008-01-21
| | | | wrong because of mismatched byte ordering.
* Revise memory management for libxml calls. Instead of keeping libxml's dataTom Lane2008-01-15
| | | | | | | | | | | in whichever context happens to be current during a call of an xml.c function, use a dedicated context that will not go away until we explicitly delete it (which we do at transaction end or subtransaction abort). This makes recovery after an error much simpler --- we don't have to individually delete the data structures created by libxml. Also, we need to initialize and cleanup libxml only once per transaction (if there's no error) instead of once per function call, so it should be a bit faster. We'll need to keep an eye out for intra-transaction memory leaks, though. Alvaro and Tom.
* Fix CREATE INDEX CONCURRENTLY so that it won't use synchronized scan forTom Lane2008-01-14
| | | | | | | | its second pass over the table. It has to start at block zero, else the "merge join" logic for detecting which TIDs are already in the index doesn't work. Hence, extend heapam.c's API so that callers can enable or disable syncscan. (I put in an option to disable buffer access strategy, too, just in case somebody needs it.) Per report from Hannes Dorbath.
* Make standard maintenance operations (including VACUUM, ANALYZE, REINDEX,Tom Lane2008-01-03
| | | | | | | | | | | | | | | | | | | and CLUSTER) execute as the table owner rather than the calling user, using the same privilege-switching mechanism already used for SECURITY DEFINER functions. The purpose of this change is to ensure that user-defined functions used in index definitions cannot acquire the privileges of a superuser account that is performing routine maintenance. While a function used in an index is supposed to be IMMUTABLE and thus not able to do anything very interesting, there are several easy ways around that restriction; and even if we could plug them all, there would remain a risk of reading sensitive information and broadcasting it through a covert channel such as CPU usage. To prevent bypassing this security measure, execution of SET SESSION AUTHORIZATION and SET ROLE is now forbidden within a SECURITY DEFINER context. Thanks to Itagaki Takahiro for reporting this vulnerability. Security: CVE-2007-6600
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* Improve a number of elog messages for not-supposed-to-happen cases in btrees,Tom Lane2007-12-31
| | | | | | | | | since these seem to happen after all in corrupted indexes. Make sure we supply the index name in all cases, and provide relevant block numbers where available. Also consistently identify the index name as such. Back-patch to 8.2, in hopes that this might help Mason Hale figure out his problem.
* Code review for LIKE ... INCLUDING INDEXES patch. Fix failure to propagateTom Lane2007-12-01
| | | | | | | | | | constraint status of copied indexes (bug #3774), as well as various other small bugs such as failure to pstrdup when needed. Allow INCLUDING INDEXES indexes to be merged with identical declared indexes (perhaps not real useful, but the code is there and having it not apply to LIKE indexes seems pretty unorthogonal). Avoid useless work in generateClonedIndexStmt(). Undo some poorly chosen API changes, and put a couple of routines in modules that seem to be better places for them.
* Avoid incrementing the CommandCounter when CommandCounterIncrement is calledTom Lane2007-11-30
| | | | | | | | | | | | | | | | | | | | but no database changes have been made since the last CommandCounterIncrement. This should result in a significant improvement in the number of "commands" that can typically be performed within a transaction before hitting the 2^32 CommandId size limit. In particular this buys back (and more) the possible adverse consequences of my previous patch to fix plan caching behavior. The implementation requires tracking whether the current CommandCounter value has been "used" to mark any tuples. CommandCounter values stored into snapshots are presumed not to be used for this purpose. This requires some small executor changes, since the executor used to conflate the curcid of the snapshot it was using with the command ID to mark output tuples with. Separating these concepts allows some small simplifications in executor APIs. Something for the TODO list: look into having CommandCounterIncrement not do AcceptInvalidationMessages. It seems fairly bogus to be doing it there, but exactly where to do it instead isn't clear, and I'm disinclined to mess with asynchronous behavior during late beta.
* Improve GIN index build's tracking of memory usage by usingTom Lane2007-11-16
| | | | | | | | | GetMemoryChunkSpace, not just the palloc request size. This brings the allocatedMemory counter close enough to reality (as measured by MemoryContextStats printouts) that I think we can get rid of the arbitrary factor-of-2 adjustment that was put into the code initially. Given the sensitivity of GIN build to work memory size, not using as much of work memory as we're allowed to seems a pretty bad idea.
* Repair still another bug in the btree page split WAL reduction patch:Tom Lane2007-11-16
| | | | | | | it failed for splits of non-leaf pages because in such pages the first data key on a page is suppressed, and so we can't just copy the first key from the right page to reconstitute the left page's high key. Problem found by Koichi Suzuki, patch by Heikki.
* Small comment spacing improvement.Bruce Momjian2007-11-16
|
* Fix pgindent to properly handle 'else' and single-line comments on theBruce Momjian2007-11-15
| | | | | same line; previous fix was only partial. Re-run pgindent on files that need it.
* Re-run pgindent with updated list of typedefs. (Updated README shouldBruce Momjian2007-11-15
| | | | avoid this problem in the future.)
* When logging the recovery.conf parameters, show them quoted as they wouldPeter Eisentraut2007-11-15
| | | | appear in the configuration file.
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Prevent re-use of a deleted relation's relfilenode until after the nextTom Lane2007-11-15
| | | | | | | | | | checkpoint. This guards against an unlikely data-loss scenario in which we re-use the relfilenode, then crash, then replay the deletion and recreation of the file. Even then we'd be OK if all insertions into the new relation had been WAL-logged ... but that's not guaranteed given all the no-WAL-logging optimizations that have recently been added. Patch by Heikki Linnakangas, per a discussion last month.
* Clean up some stray references to tsearch2.Tom Lane2007-11-13
|
* Ensure that typmod decoration on a datatype name is validated in all cases,Tom Lane2007-11-11
| | | | | | | | | | | | | | even in code paths where we don't pay any subsequent attention to the typmod value. This seems needed in view of the fact that 8.3's generalized typmod support will accept a lot of bogus syntax, such as "timestamp(foo)" or "record(int, 42)" --- if we allow such things to pass without comment, users will get confused. Per a recent example from Greg Stark. To implement this in a way that's not very vulnerable to future bugs-of-omission, refactor the API of parse_type.c's TypeName lookup routines so that typmod validation is folded into the base lookup operation. Callers can still choose not to receive the encoded typmod, but we'll check the decoration anyway if it's present.
* Reduce error level of ROLLBACK outside a transaction from WARNING toBruce Momjian2007-11-10
| | | | NOTICE.
* Use "alternative" instead of "alternate" where it is clearer.Peter Eisentraut2007-11-07
|
* - Add check of already changed page while replay WAL. This touches onlyTeodor Sigaev2007-10-29
| | | | | | | | | | | | | ginRedoInsert(), because other ginRedo* functions rewrite whole page or make changes which could be applied several times without consistent's loss - Remove check of identifying of corresponding split record: it's possible that replaying of WAL starts after actual page split, but before removing of that split from incomplete splits list. In this case, that check cause FATAL error. Per stress test which reproduces bug reported by Craig McElroy <craig.mcelroy@contegix.com>
* Fix coredump during replay WAL after crash. Change entrySplitPage() to preventTeodor Sigaev2007-10-29
| | | | | | | | usage of any information from system catalog, because it could be called during replay of WAL. Per bug report from Craig McElroy <craig.mcelroy@contegix.com>. Patch doesn't change on-disk storage.
* Rearrange vacuum-related bits in PGPROC as a bitmask, to better supportAlvaro Herrera2007-10-24
| | | | | | | | | having several of them. Add two more flags: whether the process is executing an ANALYZE, and whether a vacuum is for Xid wraparound (which is obviously only set by autovacuum). Sneakily move the worker's recently-acquired PostAuthDelay to a more useful place.
* Keep heap_page_prune from marking the buffer dirty when it didn'tTom Lane2007-10-24
| | | | | really change anything. Per report from Itagaki Takahiro. Fix by Pavan Deolasee.
* Tweak toast-related logic in heapam.c so that the toaster is only invokedTom Lane2007-10-16
| | | | | | | when relkind = RELKIND_RELATION. This syncs these tests with the Asserts in tuptoaster.c, and ensures that we won't ever try to, for example, compress a sequence's tuple. Problem found by Greg Stark while stress-testing with much-smaller-than-normal page sizes.
* When telling the bgwriter that we need a checkpoint because too much xlogTom Lane2007-10-12
| | | | | | | | | has been consumed, recheck against the latest value of RedoRecPtr before really sending the signal. This avoids useless checkpoint activity if XLogWrite is executed when we have a very stale local copy of RedoRecPtr. The potential for useless checkpoint is very much worse in 8.3 because of the walwriter process (which never does XLogInsert), so while this behavior was intentional, it needs to be changed. Per report from Itagaki Takahiro.
* Remove incorrect use of VARSIZE() on a toasted datum. We can just remove itTom Lane2007-10-11
| | | | | instead of fix it, since once we've set toast_action[i] to 'p' it no longer matters what toast_sizes[i] is. Greg Stark