aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Remove arbitrary 10MB limit on two-phase state file size. It's not that hardHeikki Linnakangas2008-05-19
| | | | | | | | | | | | | | | | to go beoynd 10MB, as demonstrated by Gavin Sharry's example of dropping a schema with ~25000 objects. The really bogus thing about the limit was that it was enforced when a state file file was read in, not when it was written, so you would end up with a prepared transaction that you can't commit or abort, and the only recourse was to shut down the server and remove the file by hand. Raise the limit to MaxAllocSize, and enforce it also when a state file is written. We could've removed the limit altogether, but reading in a file larger than MaxAllocSize would fail anyway because we read it into a palloc'd buffer. Backpatch down to 8.1, where 2PC and this issue was introduced.
* Don't try to close negative file descriptors, since this can causeMagnus Hagander2008-05-13
| | | | | | | crashes on certain platforms. In particular, the MSVC runtime is known to do this. Fixes bug #4162, reported and diagnosed by Javier Pimas
* Fix Assert introduced in previous patch.Heikki Linnakangas2008-05-09
|
* Fix incorrect archive truncation point calculation in the %r recovery_commandHeikki Linnakangas2008-05-09
| | | | | | | | | | | parameter. This fixes bug 4137 reported by Wojciech Strzalka, where a WAL file is deleted too early when starting the recovery of a warm standby server. Also add a sanity check in pg_standby so that it will refuse to delete anything earlier than the file being restored, and improve the debug message in case nothing is deleted. Simon Riggs. Backpatch to 8.3, which is where %r was introduced.
* Back-patch Heikki's fix to make TransactionIdIsCurrentTransactionId() useTom Lane2008-04-26
| | | | | | | binary search instead of linear search when checking child-transaction XIDs. Per example from Robert Treat, the speed of TransactionIdIsCurrentTransactionId is significantly more important in 8.3 than it was in prior releases, so this seems worth taking back-patching risk for.
* Fix using too many LWLocks bug, reported by Craig RingerTeodor Sigaev2008-04-22
| | | | | | | <craig@postnewspapers.com.au>. It was my mistake, I missed limitation of number of held locks, now GIN doesn't use continiuous locks, but still hold buffers pinned to prevent interference with vacuum's deletion algorithm.
* Repair two places where SIGTERM exit could leave shared memory stateTom Lane2008-04-16
| | | | | | | | | | | | | | corrupted. (Neither is very important if SIGTERM is used to shut down the whole database cluster together, but there's a problem if someone tries to SIGTERM individual backends.) To do this, introduce new infrastructure macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care of transiently pushing an on_shmem_exit cleanup hook. Also use this method for createdb cleanup --- that wasn't a shared-memory-corruption problem, but SIGTERM abort of createdb could leave orphaned files lying around. Backpatch as far as 8.2. The shmem corruption cases don't exist in 8.1, and the createdb usage doesn't seem important enough to risk backpatching further.
* Fix heap_page_prune's problem with failing to send cache invalidationTom Lane2008-03-13
| | | | | | | | | | | messages if the calling transaction aborts later on. Collapsing out line pointer redirects is a done deal as soon as we complete the page update, so syscache *must* be notified even if the VACUUM FULL as a whole doesn't complete. To fix, add some functionality to inval.c to allow the pending inval messages to be sent immediately while heap_page_prune is still running. The implementation is a bit chintzy: it will only work in the context of VACUUM FULL. But that's all we need now, and it can always be extended later if needed. Per my trouble report of a week ago.
* Refactor heap_page_prune so that instead of changing item states on-the-fly,Tom Lane2008-03-08
| | | | | | | | | | | | | | | | | it accumulates the set of changes to be made and then applies them. It had to accumulate the set of changes anyway to prepare a WAL record for the pruning action, so this isn't an enormous change; the only new complexity is to not doubly mark tuples that are visited twice in the scan. The main advantage is that we can substantially reduce the scope of the critical section in which the changes are applied, thus avoiding PANIC in foreseeable cases like running out of memory in inval.c. A nice secondary advantage is that it is now far clearer that WAL replay will actually do the same thing that the original pruning did. This commit doesn't do anything about the open problem that CacheInvalidateHeapTuple doesn't have the right semantics for a CTID change caused by collapsing out a redirect pointer. But whatever we do about that, it'll be a good idea to not do it inside a critical section.
* Change hashscan.c to keep its list of active hash index scans inTom Lane2008-03-07
| | | | | | | | | | | | | TopMemoryContext, rather than scattered through executor per-query contexts. This poses no danger of memory leak since the ResourceOwner mechanism guarantees release of no-longer-needed items. It is needed because the per-query context might already be released by the time we try to clean up the hash scan list. Report by ykhuang, diagnosis by Heikki. Back-patch to 8.0, where the ResourceOwner-based cleanup was introduced. The given test case does not fail before 8.2, probably because we rearranged transaction abort processing somehow; but this coding is undoubtedly risky so I'll patch 8.0 and 8.1 anyway.
* Fix PREPARE TRANSACTION to reject the case where the transaction has dropped aTom Lane2008-03-04
| | | | | | | temporary table; we can't support that because there's no way to clean up the source backend's internal state if the eventual COMMIT PREPARED is done by another backend. This was checked correctly in 8.1 but I broke it in 8.2 :-(. Patch by Heikki Linnakangas, original trouble report by John Smith.
* Reducing the assumed alignment of struct varlena means that the compilerTom Lane2008-02-29
| | | | | | | | | | is also licensed to put a local variable declared that way at an unaligned address. Which will not work if the variable is then manipulated with SET_VARSIZE or other macros that assume alignment. So the previous patch is not an unalloyed good, but on balance I think it's still a win, since we have very few places that do that sort of thing. Fix the one place in tuptoaster.c that does it. Per buildfarm results from gypsy_moth (I'm a bit surprised that only one machine showed a failure).
* Change the declaration of struct varlena so that the length word isTom Lane2008-02-23
| | | | | | | | | | | | | | | represented as "char ...[4]" not "int32". Since the length word is never supposed to be accessed via this struct member anyway, this won't break any existing code that is following the rules. The advantage is that C compilers will no longer assume that a pointer to struct varlena is word-aligned, which prevents incorrect optimizations in TOAST-pointer access and perhaps other places. gcc doesn't seem to do this (at least not at -O2), but the problem is demonstrable on some other compilers. I changed struct inet as well, but didn't bother to touch a lot of other struct definitions in which it wouldn't make any difference because there were other fields forcing int alignment anyway. Hopefully none of those struct definitions are used for accessing unaligned Datums.
* Add a GUC variable "synchronize_seqscans" to allow clients to disable the newTom Lane2008-01-30
| | | | | synchronized-scanning behavior, and make pg_dump disable sync scans so that it will reliably preserve row ordering. Per recent discussions.
* Provide a clearer error message if the pg_control version number looksPeter Eisentraut2008-01-21
| | | | wrong because of mismatched byte ordering.
* Revise memory management for libxml calls. Instead of keeping libxml's dataTom Lane2008-01-15
| | | | | | | | | | | in whichever context happens to be current during a call of an xml.c function, use a dedicated context that will not go away until we explicitly delete it (which we do at transaction end or subtransaction abort). This makes recovery after an error much simpler --- we don't have to individually delete the data structures created by libxml. Also, we need to initialize and cleanup libxml only once per transaction (if there's no error) instead of once per function call, so it should be a bit faster. We'll need to keep an eye out for intra-transaction memory leaks, though. Alvaro and Tom.
* Fix CREATE INDEX CONCURRENTLY so that it won't use synchronized scan forTom Lane2008-01-14
| | | | | | | | its second pass over the table. It has to start at block zero, else the "merge join" logic for detecting which TIDs are already in the index doesn't work. Hence, extend heapam.c's API so that callers can enable or disable syncscan. (I put in an option to disable buffer access strategy, too, just in case somebody needs it.) Per report from Hannes Dorbath.
* Make standard maintenance operations (including VACUUM, ANALYZE, REINDEX,Tom Lane2008-01-03
| | | | | | | | | | | | | | | | | | | and CLUSTER) execute as the table owner rather than the calling user, using the same privilege-switching mechanism already used for SECURITY DEFINER functions. The purpose of this change is to ensure that user-defined functions used in index definitions cannot acquire the privileges of a superuser account that is performing routine maintenance. While a function used in an index is supposed to be IMMUTABLE and thus not able to do anything very interesting, there are several easy ways around that restriction; and even if we could plug them all, there would remain a risk of reading sensitive information and broadcasting it through a covert channel such as CPU usage. To prevent bypassing this security measure, execution of SET SESSION AUTHORIZATION and SET ROLE is now forbidden within a SECURITY DEFINER context. Thanks to Itagaki Takahiro for reporting this vulnerability. Security: CVE-2007-6600
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* Improve a number of elog messages for not-supposed-to-happen cases in btrees,Tom Lane2007-12-31
| | | | | | | | | since these seem to happen after all in corrupted indexes. Make sure we supply the index name in all cases, and provide relevant block numbers where available. Also consistently identify the index name as such. Back-patch to 8.2, in hopes that this might help Mason Hale figure out his problem.
* Code review for LIKE ... INCLUDING INDEXES patch. Fix failure to propagateTom Lane2007-12-01
| | | | | | | | | | constraint status of copied indexes (bug #3774), as well as various other small bugs such as failure to pstrdup when needed. Allow INCLUDING INDEXES indexes to be merged with identical declared indexes (perhaps not real useful, but the code is there and having it not apply to LIKE indexes seems pretty unorthogonal). Avoid useless work in generateClonedIndexStmt(). Undo some poorly chosen API changes, and put a couple of routines in modules that seem to be better places for them.
* Avoid incrementing the CommandCounter when CommandCounterIncrement is calledTom Lane2007-11-30
| | | | | | | | | | | | | | | | | | | | but no database changes have been made since the last CommandCounterIncrement. This should result in a significant improvement in the number of "commands" that can typically be performed within a transaction before hitting the 2^32 CommandId size limit. In particular this buys back (and more) the possible adverse consequences of my previous patch to fix plan caching behavior. The implementation requires tracking whether the current CommandCounter value has been "used" to mark any tuples. CommandCounter values stored into snapshots are presumed not to be used for this purpose. This requires some small executor changes, since the executor used to conflate the curcid of the snapshot it was using with the command ID to mark output tuples with. Separating these concepts allows some small simplifications in executor APIs. Something for the TODO list: look into having CommandCounterIncrement not do AcceptInvalidationMessages. It seems fairly bogus to be doing it there, but exactly where to do it instead isn't clear, and I'm disinclined to mess with asynchronous behavior during late beta.
* Improve GIN index build's tracking of memory usage by usingTom Lane2007-11-16
| | | | | | | | | GetMemoryChunkSpace, not just the palloc request size. This brings the allocatedMemory counter close enough to reality (as measured by MemoryContextStats printouts) that I think we can get rid of the arbitrary factor-of-2 adjustment that was put into the code initially. Given the sensitivity of GIN build to work memory size, not using as much of work memory as we're allowed to seems a pretty bad idea.
* Repair still another bug in the btree page split WAL reduction patch:Tom Lane2007-11-16
| | | | | | | it failed for splits of non-leaf pages because in such pages the first data key on a page is suppressed, and so we can't just copy the first key from the right page to reconstitute the left page's high key. Problem found by Koichi Suzuki, patch by Heikki.
* Small comment spacing improvement.Bruce Momjian2007-11-16
|
* Fix pgindent to properly handle 'else' and single-line comments on theBruce Momjian2007-11-15
| | | | | same line; previous fix was only partial. Re-run pgindent on files that need it.
* Re-run pgindent with updated list of typedefs. (Updated README shouldBruce Momjian2007-11-15
| | | | avoid this problem in the future.)
* When logging the recovery.conf parameters, show them quoted as they wouldPeter Eisentraut2007-11-15
| | | | appear in the configuration file.
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Prevent re-use of a deleted relation's relfilenode until after the nextTom Lane2007-11-15
| | | | | | | | | | checkpoint. This guards against an unlikely data-loss scenario in which we re-use the relfilenode, then crash, then replay the deletion and recreation of the file. Even then we'd be OK if all insertions into the new relation had been WAL-logged ... but that's not guaranteed given all the no-WAL-logging optimizations that have recently been added. Patch by Heikki Linnakangas, per a discussion last month.
* Clean up some stray references to tsearch2.Tom Lane2007-11-13
|
* Ensure that typmod decoration on a datatype name is validated in all cases,Tom Lane2007-11-11
| | | | | | | | | | | | | | even in code paths where we don't pay any subsequent attention to the typmod value. This seems needed in view of the fact that 8.3's generalized typmod support will accept a lot of bogus syntax, such as "timestamp(foo)" or "record(int, 42)" --- if we allow such things to pass without comment, users will get confused. Per a recent example from Greg Stark. To implement this in a way that's not very vulnerable to future bugs-of-omission, refactor the API of parse_type.c's TypeName lookup routines so that typmod validation is folded into the base lookup operation. Callers can still choose not to receive the encoded typmod, but we'll check the decoration anyway if it's present.
* Reduce error level of ROLLBACK outside a transaction from WARNING toBruce Momjian2007-11-10
| | | | NOTICE.
* Use "alternative" instead of "alternate" where it is clearer.Peter Eisentraut2007-11-07
|
* - Add check of already changed page while replay WAL. This touches onlyTeodor Sigaev2007-10-29
| | | | | | | | | | | | | ginRedoInsert(), because other ginRedo* functions rewrite whole page or make changes which could be applied several times without consistent's loss - Remove check of identifying of corresponding split record: it's possible that replaying of WAL starts after actual page split, but before removing of that split from incomplete splits list. In this case, that check cause FATAL error. Per stress test which reproduces bug reported by Craig McElroy <craig.mcelroy@contegix.com>
* Fix coredump during replay WAL after crash. Change entrySplitPage() to preventTeodor Sigaev2007-10-29
| | | | | | | | usage of any information from system catalog, because it could be called during replay of WAL. Per bug report from Craig McElroy <craig.mcelroy@contegix.com>. Patch doesn't change on-disk storage.
* Rearrange vacuum-related bits in PGPROC as a bitmask, to better supportAlvaro Herrera2007-10-24
| | | | | | | | | having several of them. Add two more flags: whether the process is executing an ANALYZE, and whether a vacuum is for Xid wraparound (which is obviously only set by autovacuum). Sneakily move the worker's recently-acquired PostAuthDelay to a more useful place.
* Keep heap_page_prune from marking the buffer dirty when it didn'tTom Lane2007-10-24
| | | | | really change anything. Per report from Itagaki Takahiro. Fix by Pavan Deolasee.
* Tweak toast-related logic in heapam.c so that the toaster is only invokedTom Lane2007-10-16
| | | | | | | when relkind = RELKIND_RELATION. This syncs these tests with the Asserts in tuptoaster.c, and ensures that we won't ever try to, for example, compress a sequence's tuple. Problem found by Greg Stark while stress-testing with much-smaller-than-normal page sizes.
* When telling the bgwriter that we need a checkpoint because too much xlogTom Lane2007-10-12
| | | | | | | | | has been consumed, recheck against the latest value of RedoRecPtr before really sending the signal. This avoids useless checkpoint activity if XLogWrite is executed when we have a very stale local copy of RedoRecPtr. The potential for useless checkpoint is very much worse in 8.3 because of the walwriter process (which never does XLogInsert), so while this behavior was intentional, it needs to be changed. Per report from Itagaki Takahiro.
* Remove incorrect use of VARSIZE() on a toasted datum. We can just remove itTom Lane2007-10-11
| | | | | instead of fix it, since once we've set toast_action[i] to 'p' it no longer matters what toast_sizes[i] is. Greg Stark
* Avoid assuming that struct varattrib_pointer doesn't get padded by theTom Lane2007-10-01
| | | | | | | | | | | compiler --- at least on ARM, it does. I suspect that the varvarlena patch has been creating larger-than-intended toast pointers all along on ARM, but it wasn't exposed until the latest tweak added some Asserts that calculated the expected size in a different way. We could probably have fixed this by adding __attribute__((packed)) as is done for ItemPointerData, but struct varattrib_pointer isn't really all that useful anyway, so it seems cleanest to just get rid of it and have only struct varattrib_1b_e. Per results from buildfarm member quagga.
* Add an extra header byte to TOAST-pointer datums to represent their sizeTom Lane2007-09-30
| | | | | | | explicitly. This means a TOAST pointer takes 18 bytes instead of 17 --- still smaller than in 8.2 --- which seems a good tradeoff to ensure we won't have painted ourselves into a corner if we want to support multiple types of TOAST pointer later on. Per discussion with Greg Stark.
* Adjust recovery PS display as agreed with Simon: 'waiting for XXX'Tom Lane2007-09-30
| | | | | | while the restore_command does its thing, then 'recovering XXX' while processing the segment file. These operations are heavyweight enough that an extra PS display set shouldn't bother anyone.
* Make recovery show the current input WAL segment name in the startupTom Lane2007-09-29
| | | | | process' PS display. After a suggestion by Simon (not exactly his patch though).
* Make archive recovery always start a new timeline, rather than only when aTom Lane2007-09-29
| | | | | | | recovery stop time was used. This avoids a corner-case risk of trying to overwrite an existing archived copy of the last WAL segment, and seems simpler and cleaner all around than the original definition. Per example from Jon Colverson and subsequent analysis by Simon.
* Some small tuptoaster improvements from Greg Stark. Avoid unnecessaryTom Lane2007-09-26
| | | | | | decompression of an already-compressed external value when we have to copy it; save a few cycles when a value is too short for compression; and annotate various lines that are currently unreachable.
* Minor improvements in backup and recovery:Tom Lane2007-09-26
| | | | | | | | | | | | | | | | | | | | | - create a separate archive_mode GUC, on which archive_command is dependent - %r option in recovery.conf sends last restartpoint to recovery command - %r used in pg_standby, updated README - minor other code cleanup in pg_standby - doc on Warm Standby now mentions pg_standby and %r - log_restartpoints recovery option emits LOG message at each restartpoint - end of recovery now displays last transaction end time, as requested by Warren Little; also shown at each restartpoint - restart archiver if needed to carry away WAL files at shutdown Simon Riggs
* Fix regex, LIKE, and some other second-rank text-manipulation functionsTom Lane2007-09-21
| | | | | | to not cause needless copying of text datums that have 1-byte headers. Greg Stark, in response to performance gripe from Guillaume Smet and ITAGAKI Takahiro.
* Improve handling of prune/no-prune decisions by storing a page's oldestTom Lane2007-09-21
| | | | | | unpruned XMAX in its header. At the cost of 4 bytes per page, this keeps us from performing heap_page_prune when there's no chance of pruning anything. Seems to be necessary per Heikki's preliminary performance testing.