aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Use "alternative" instead of "alternate" where it is clearer.Peter Eisentraut2007-11-07
|
* - Add check of already changed page while replay WAL. This touches onlyTeodor Sigaev2007-10-29
| | | | | | | | | | | | | ginRedoInsert(), because other ginRedo* functions rewrite whole page or make changes which could be applied several times without consistent's loss - Remove check of identifying of corresponding split record: it's possible that replaying of WAL starts after actual page split, but before removing of that split from incomplete splits list. In this case, that check cause FATAL error. Per stress test which reproduces bug reported by Craig McElroy <craig.mcelroy@contegix.com>
* Fix coredump during replay WAL after crash. Change entrySplitPage() to preventTeodor Sigaev2007-10-29
| | | | | | | | usage of any information from system catalog, because it could be called during replay of WAL. Per bug report from Craig McElroy <craig.mcelroy@contegix.com>. Patch doesn't change on-disk storage.
* Rearrange vacuum-related bits in PGPROC as a bitmask, to better supportAlvaro Herrera2007-10-24
| | | | | | | | | having several of them. Add two more flags: whether the process is executing an ANALYZE, and whether a vacuum is for Xid wraparound (which is obviously only set by autovacuum). Sneakily move the worker's recently-acquired PostAuthDelay to a more useful place.
* Keep heap_page_prune from marking the buffer dirty when it didn'tTom Lane2007-10-24
| | | | | really change anything. Per report from Itagaki Takahiro. Fix by Pavan Deolasee.
* Tweak toast-related logic in heapam.c so that the toaster is only invokedTom Lane2007-10-16
| | | | | | | when relkind = RELKIND_RELATION. This syncs these tests with the Asserts in tuptoaster.c, and ensures that we won't ever try to, for example, compress a sequence's tuple. Problem found by Greg Stark while stress-testing with much-smaller-than-normal page sizes.
* When telling the bgwriter that we need a checkpoint because too much xlogTom Lane2007-10-12
| | | | | | | | | has been consumed, recheck against the latest value of RedoRecPtr before really sending the signal. This avoids useless checkpoint activity if XLogWrite is executed when we have a very stale local copy of RedoRecPtr. The potential for useless checkpoint is very much worse in 8.3 because of the walwriter process (which never does XLogInsert), so while this behavior was intentional, it needs to be changed. Per report from Itagaki Takahiro.
* Remove incorrect use of VARSIZE() on a toasted datum. We can just remove itTom Lane2007-10-11
| | | | | instead of fix it, since once we've set toast_action[i] to 'p' it no longer matters what toast_sizes[i] is. Greg Stark
* Avoid assuming that struct varattrib_pointer doesn't get padded by theTom Lane2007-10-01
| | | | | | | | | | | compiler --- at least on ARM, it does. I suspect that the varvarlena patch has been creating larger-than-intended toast pointers all along on ARM, but it wasn't exposed until the latest tweak added some Asserts that calculated the expected size in a different way. We could probably have fixed this by adding __attribute__((packed)) as is done for ItemPointerData, but struct varattrib_pointer isn't really all that useful anyway, so it seems cleanest to just get rid of it and have only struct varattrib_1b_e. Per results from buildfarm member quagga.
* Add an extra header byte to TOAST-pointer datums to represent their sizeTom Lane2007-09-30
| | | | | | | explicitly. This means a TOAST pointer takes 18 bytes instead of 17 --- still smaller than in 8.2 --- which seems a good tradeoff to ensure we won't have painted ourselves into a corner if we want to support multiple types of TOAST pointer later on. Per discussion with Greg Stark.
* Adjust recovery PS display as agreed with Simon: 'waiting for XXX'Tom Lane2007-09-30
| | | | | | while the restore_command does its thing, then 'recovering XXX' while processing the segment file. These operations are heavyweight enough that an extra PS display set shouldn't bother anyone.
* Make recovery show the current input WAL segment name in the startupTom Lane2007-09-29
| | | | | process' PS display. After a suggestion by Simon (not exactly his patch though).
* Make archive recovery always start a new timeline, rather than only when aTom Lane2007-09-29
| | | | | | | recovery stop time was used. This avoids a corner-case risk of trying to overwrite an existing archived copy of the last WAL segment, and seems simpler and cleaner all around than the original definition. Per example from Jon Colverson and subsequent analysis by Simon.
* Some small tuptoaster improvements from Greg Stark. Avoid unnecessaryTom Lane2007-09-26
| | | | | | decompression of an already-compressed external value when we have to copy it; save a few cycles when a value is too short for compression; and annotate various lines that are currently unreachable.
* Minor improvements in backup and recovery:Tom Lane2007-09-26
| | | | | | | | | | | | | | | | | | | | | - create a separate archive_mode GUC, on which archive_command is dependent - %r option in recovery.conf sends last restartpoint to recovery command - %r used in pg_standby, updated README - minor other code cleanup in pg_standby - doc on Warm Standby now mentions pg_standby and %r - log_restartpoints recovery option emits LOG message at each restartpoint - end of recovery now displays last transaction end time, as requested by Warren Little; also shown at each restartpoint - restart archiver if needed to carry away WAL files at shutdown Simon Riggs
* Fix regex, LIKE, and some other second-rank text-manipulation functionsTom Lane2007-09-21
| | | | | | to not cause needless copying of text datums that have 1-byte headers. Greg Stark, in response to performance gripe from Guillaume Smet and ITAGAKI Takahiro.
* Improve handling of prune/no-prune decisions by storing a page's oldestTom Lane2007-09-21
| | | | | | unpruned XMAX in its header. At the cost of 4 bytes per page, this keeps us from performing heap_page_prune when there's no chance of pruning anything. Seems to be necessary per Heikki's preliminary performance testing.
* Fix comments that misspelled TransactionIdIsInProgress, per Heikki.Tom Lane2007-09-21
|
* HOT updates. When we update a tuple without changing any of its indexedTom Lane2007-09-20
| | | | | | | | | | | | columns, and the new version can be stored on the same heap page, we no longer generate extra index entries for the new version. Instead, index searches follow the HOT-chain links to ensure they find the correct tuple version. In addition, this patch introduces the ability to "prune" dead tuples on a per-page basis, without having to do a complete VACUUM pass to recover space. VACUUM is still needed to clean up dead index entries, however. Pavan Deolasee, with help from a bunch of other people.
* Remove GIN interface section, which is now documented in SGML.Bruce Momjian2007-09-14
| | | | Heikki Linnakangas
* Redefine the lp_flags field of item pointers as having four states, ratherTom Lane2007-09-12
| | | | | | | | | than two independent bits (one of which was never used in heap pages anyway, or at least hadn't been in a very long time). This gives us flexibility to add the HOT notions of redirected and dead item pointers without requiring anything so klugy as magic values of lp_off and lp_len. The state values are chosen so that for the states currently in use (pre-HOT) there is no change in the physical representation.
* Rename recently-added pg_stat_activity column from txn_start to xact_start,Tom Lane2007-09-11
| | | | for consistency with other column names such as in pg_stat_database.
* Replace the former method of determining snapshot xmax --- to wit, callingTom Lane2007-09-08
| | | | | | | | | | | | | | | ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid" variable that is updated during transaction commit or abort. Since latestCompletedXid is written only in places that had to lock ProcArrayLock exclusively anyway, and is read only in places that had to lock ProcArrayLock shared anyway, it adds no new locking requirements to the system despite being cluster-wide. Moreover, removing ReadNewTransactionId from snapshot acquisition eliminates the need to take both XidGenLock and ProcArrayLock at the same time. Since XidGenLock is sometimes held across I/O this can be a significant win. Some preliminary benchmarking suggested that this patch has no effect on average throughput but can significantly improve the worst-case transaction times seen in pgbench. Concept by Florian Pflug, implementation by Tom Lane.
* Don't take ProcArrayLock while exiting a transaction that has no XID; there isTom Lane2007-09-07
| | | | | | | | | | no need for serialization against snapshot-taking because the xact doesn't affect anyone else's snapshot anyway. Per discussion. Also, move various info about the interlocking of transactions and snapshots out of code comments and into a hopefully-more-cohesive discussion in access/transam/README. Also, remove a couple of now-obsolete comments about having to force some WAL to be written to persuade RecordTransactionCommit to do its thing.
* Improve page split in rtree emulation. Now if splitted result hasTeodor Sigaev2007-09-07
| | | | | | | | | big misalignement, then it tries to split page basing on distribution of boxe's centers. Per report from Dolafi, Tom <dolafit@janelia.hhmi.org> Backpatch is needed, change doesn't affect on-disk storage.
* Quick hack to make the VXID of a prepared transaction be -1/XID,Tom Lane2007-09-05
| | | | | so that different prepared xacts can be told apart in the pg_locks view. Per suggestion from Florian.
* Implement lazy XID allocation: transactions that do not modify any databaseTom Lane2007-09-05
| | | | | | | | | | | | | | | rows will normally never obtain an XID at all. We already did things this way for subtransactions, but this patch extends the concept to top-level transactions. In applications where there are lots of short read-only transactions, this should improve performance noticeably; not so much from removal of the actual XID-assignments, as from reduction of overhead that's driven by the rate of XID consumption. We add a concept of a "virtual transaction ID" so that active transactions can be uniquely identified even if they don't have a regular XID. This is a much lighter-weight concept: uniqueness of VXIDs is only guaranteed over the short term, and no on-disk record is made about them. Florian Pflug, with some editorialization by Tom.
* Implement function-local GUC parameter settings, as per recent discussion.Tom Lane2007-09-03
| | | | | | | There are still some loose ends: I didn't do anything about the SET FROM CURRENT idea yet, and it's not real clear whether we are happy with the interaction of SET LOCAL with function-local settings. The documentation is a bit spartan, too.
* Add a debug logging message when a resource manager rejects an attemptedTom Lane2007-08-28
| | | | restart point. Per suggestion from Simon Riggs.
* Tsearch2 functionality migrates to core. The bulk of this work is byTom Lane2007-08-21
| | | | | | | | Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing, so anything that's broken is probably my fault. Documentation is nonexistent as yet, but let's land the patch so we can get some portability testing done.
* Fix oversight in async-commit patch: there were some places in heapam.cTom Lane2007-08-14
| | | | | | that still thought they could set HEAP_XMAX_COMMITTED immediately after seeing the other transaction commit. Make them use the same logic as tqual.c does to determine if the hint bit can be set yet.
* Fix two bugs induced in VACUUM FULL by async-commit patch.Tom Lane2007-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | First, we cannot assume that XLogAsyncCommitFlush guarantees hint bits will be settable, because clog.c's inexact LSN bookkeeping results in windows where a previously flushed transaction is considered unhintable because it shares an LSN slot with a later unflushed transaction. But repair_frag requires XMIN_COMMITTED to be correct so that it can distinguish tuples moved by the current vacuum. Since not being able to set the bit is an uncommon corner case, the most practical way of dealing with it seems to be to abandon shrinking (ie, don't invoke repair_frag) when we find a non-dead tuple whose XMIN_COMMITTED bit couldn't be set. Second, it is possible for the same reason that a RECENTLY_DEAD tuple does not get its XMAX_COMMITTED bit set during scan_heap. But by the time repair_frag examines the tuple it might be possible to set the bit. We therefore must take buffer content lock when calling HeapTupleSatisfiesVacuum a second time, else we can get an Assert failure in SetBufferCommitInfoNeedsSave. This latter bug is latent in existing releases, but I think it cannot actually occur without async commit, since the first HeapTupleSatisfiesVacuum call should always have set the bit. So I'm not going to back-patch it. In passing, reduce the existing "cannot shrink relation" messages from NOTICE to LOG level. The new message must be no higher than LOG if we don't want unpredictable regression test failures, and consistency seems like a good idea. Also arrange that only one such message is reported per VACUUM FULL; in typical scenarios you could get spammed with many such messages, which seems a bit useless.
* Switch over to using the src/timezone functions for formatting timestampsTom Lane2007-08-04
| | | | | | | | | | | | | | displayed in the postmaster log. This avoids Windows-specific problems with localized time zone names that are in the wrong encoding, and generally seems like a good idea to forestall other potential platform-dependent issues. To preserve the existing behavior that all backends will log in the same time zone, create a new GUC variable log_timezone that can only be changed on a system-wide basis, and reference log-related calculations to that zone instead of the TimeZone variable. This fixes the issue reported by Hiroshi Saito that timestamps printed by xlog.c startup could be improperly localized on Windows. We still need a simpler patch for that problem in the back branches, however.
* Support an optional asynchronous commit mode, in which we don't flush WALTom Lane2007-08-01
| | | | | | before reporting a transaction committed. Data consistency is still guaranteed (unlike setting fsync = off), but a crash may lose the effects of the last few transactions. Patch by Simon, some editorialization by Tom.
* Create a new dedicated Postgres process, "wal writer", which exists to writeTom Lane2007-07-24
| | | | | | | | | | | | | | and fsync WAL at convenient intervals. For the moment it just tries to offload this work from backends, but soon it will be responsible for guaranteeing a maximum delay before asynchronously-committed transactions will be flushed to disk. This is a portion of Simon Riggs' async-commit patch, committed to CVS separately because a background WAL writer seems like it might be a good idea independently of the async-commit feature. I rebased walwriter.c on bgwriter.c because it seemed like a more appropriate way of handling signals; while the startup/shutdown logic in postmaster.c is more like autovac because we want walwriter to quit before we start the shutdown checkpoint.
* Improve logging of checkpoints. Patch by Greg Smith, worked overTom Lane2007-06-30
| | | | by Heikki and a little bit by me.
* Implement "distributed" checkpoints in which the checkpoint I/O is spreadTom Lane2007-06-28
| | | | | | | | | | | | | over a fairly long period of time, rather than being spat out in a burst. This happens only for background checkpoints carried out by the bgwriter; other cases, such as a shutdown checkpoint, are still done at full speed. Remove the "all buffers" scan in the bgwriter, and associated stats infrastructure, since this seems no longer very useful when the checkpoint itself is properly throttled. Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas, and some minor API editorialization by me.
* Teach heapam code to know the difference between a real seqscan and theTom Lane2007-06-09
| | | | | | | | | pseudo HeapScanDesc created for a bitmap heap scan. This avoids some useless overhead during a bitmap scan startup, in particular invoking the syncscan code. (We might someday want to do that, but right now it's merely useless contention for shared memory, to say nothing of possibly pushing useful entries out of syncscan's small LRU list.) This also allows elimination of ugly pgstat_discount_heap_scan() kluge.
* Arrange for large sequential scans to synchronize with each other, so thatTom Lane2007-06-08
| | | | | | | when multiple backends are scanning the same relation concurrently, each page is (ideally) read only once. Jeff Davis, with review by Heikki and Tom.
* Redefine IsTransactionState() to only return true for TRANS_INPROGRESS state,Tom Lane2007-06-07
| | | | | | | | | which is the only state in which it's safe to initiate database queries. It turns out that all but two of the callers thought that's what it meant; and the other two were using it as a proxy for "will GetTopTransactionId() return a nonzero XID"? Since it was in fact an unreliable guide to that, make those two just invoke GetTopTransactionId() always, then deal with a zero result if they get one.
* Move call of MarkBufferDirty() before XLogInsert() as required.Teodor Sigaev2007-06-05
| | | | | Many thanks to Heikki Linnakangas <heikki@enterprisedb.com> for his sharp eyes.
* Fix bundle bugs of GIN:Teodor Sigaev2007-06-04
| | | | | | | | | | | | | | | | | - Fix possible deadlock between UPDATE and VACUUM queries. Bug never was observed in 8.2, but it still exist there. HEAD is more sensitive to bug after recent "ring" of buffer improvements. - Fix WAL creation: if parent page is stored as is after split then incomplete split isn't removed during replay. This happens rather rare, only on large tables with a lot of updates/inserts. - Fix WAL replay: there was wrong test of XLR_BKP_BLOCK_* for left page after deletion of page. That causes wrong rightlink field: it pointed to deleted page. - add checking of match of clearing incomplete split - cleanup incomplete split list after proceeding All of this chages doesn't change on-disk storage, so backpatch... But second point may be an issue for replaying logs from previous version.
* Clarify some error messages about duplicate things.Peter Eisentraut2007-06-03
|
* Fix several hash functions that were taking chintzy shortcuts instead ofTom Lane2007-06-01
| | | | | | | | | | | | | delivering a well-randomized hash value. I got religion on this after observing that performance of multi-batch hash join degrades terribly if the higher-order bits of hash values aren't random, as indeed was true for say hashes of small integer values. It's now expected and documented that hash functions should use hash_any or some comparable method to ensure that all bits of their output are about equally random. initdb forced because this change invalidates existing hash indexes. For the same reason, this isn't back-patchable; the hash join performance problem will get a band-aid fix in the back branches.
* Make some messages more consistentPeter Eisentraut2007-05-31
|
* Replace ReadBuffer to ReadBufferWithStrategy in all vacuum-involved placesTeodor Sigaev2007-05-31
| | | | to implement limited-size "ring" of buffers for VACUUM for GIN & GIST
* Downgrade some low-level startup messages to DEBUG1.Peter Eisentraut2007-05-31
|
* Fix overly-strict sanity check in BeginInternalSubTransaction that made itTom Lane2007-05-30
| | | | | | fail when used in a deferred trigger. Bug goes back to 8.0; no doubt the reason it hadn't been noticed is that we've been discouraging use of user-defined constraint triggers. Per report from Frank van Vugt.
* Make large sequential scans and VACUUMs work in a limited-size "ring" ofTom Lane2007-05-30
| | | | | | | | | | | | | | | | | | | | | | | buffers, rather than blowing out the whole shared-buffer arena. Aside from avoiding cache spoliation, this fixes the problem that VACUUM formerly tended to cause a WAL flush for every page it modified, because we had it hacked to use only a single buffer. Those flushes will now occur only once per ring-ful. The exact ring size, and the threshold for seqscans to switch into the ring usage pattern, remain under debate; but the infrastructure seems done. The key bit of infrastructure is a new optional BufferAccessStrategy object that can be passed to ReadBuffer operations; this replaces the former StrategyHintVacuum API. This patch also changes the buffer usage-count methodology a bit: we now advance usage_count when first pinning a buffer, rather than when last unpinning it. To preserve the behavior that a buffer's lifetime starts to decrease when it's released, the clock sweep code is modified to not decrement usage_count of pinned buffers. Work not done in this commit: teach GiST and GIN indexes to use the vacuum BufferAccessStrategy for vacuum-driven fetches. Original patch by Simon, reworked by Heikki and again by Tom.
* Fix up pgstats counting of live and dead tuples to recognize that committedTom Lane2007-05-27
| | | | | | | | | | | and aborted transactions have different effects; also teach it not to assume that prepared transactions are always committed. Along the way, simplify the pgstats API by tying counting directly to Relations; I cannot detect any redeeming social value in having stats pointers in HeapScanDesc and IndexScanDesc structures. And fix a few corner cases in which counts might be missed because the relation's pgstat_info pointer hadn't been set.