aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Avoid repeated CLOG access from heap_hot_search_buffer.Robert Haas2012-05-02
| | | | | | | | | | | At the time we check whether the tuple is dead to all running transactions, we've already verified that it isn't visible to our scan, setting hint bits if appropriate. So there's no need to recheck CLOG for the all-dead test we do just a moment later. So, add HeapTupleIsSurelyDead() to test the appropriate condition under the assumption that all relevant hit bits are already set. Review by Tom Lane.
* More duplicate word removal.Robert Haas2012-05-02
|
* Remove duplicate words in comments.Heikki Linnakangas2012-05-02
| | | | Found these with grep -r "for for ".
* Converge all SQL-level statistics timing values to float8 milliseconds.Tom Lane2012-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | This patch adjusts the core statistics views to match the decision already taken for pg_stat_statements, that values representing elapsed time should be represented as float8 and measured in milliseconds. By using float8, we are no longer tied to a specific maximum precision of timing data. (Internally, it's still microseconds, but we could now change that without needing changes at the SQL level.) The columns affected are pg_stat_bgwriter.checkpoint_write_time pg_stat_bgwriter.checkpoint_sync_time pg_stat_database.blk_read_time pg_stat_database.blk_write_time pg_stat_user_functions.total_time pg_stat_user_functions.self_time pg_stat_xact_user_functions.total_time pg_stat_xact_user_functions.self_time The first four of these are new in 9.2, so there is no compatibility issue from changing them. The others require a release note comment that they are now double precision (and can show a fractional part) rather than bigint as before; also their underlying statistics functions now match the column definitions, instead of returning bigint microseconds.
* Remove duplicate word in comment.Robert Haas2012-04-30
| | | | Noted by Peter Geoghegan.
* Prevent index-only scans from returning wrong answers under Hot Standby.Robert Haas2012-04-26
| | | | | | | | | The alternative of disallowing index-only scans in HS operation was discussed, but the consensus was that it was better to treat marking a page all-visible as a recovery conflict for snapshots that could still fail to see XIDs on that page. We may in the future try to soften this, so that we simply force index scans to do heap fetches in cases where this may be an issue, rather than throwing a hard conflict.
* Another trivial comment-typo fix.Tom Lane2012-04-25
|
* Lots of doc corrections.Robert Haas2012-04-23
| | | | Josh Kupershmidt
* Fix some typosPeter Eisentraut2012-04-22
| | | | Josh Kupershmidt
* Tighten up error recovery for fast-path locking.Robert Haas2012-04-18
| | | | | | | | | The previous code could cause a backend crash after BEGIN; SAVEPOINT a; LOCK TABLE foo (interrupted by ^C or statement timeout); ROLLBACK TO SAVEPOINT a; LOCK TABLE foo, and might have leaked strong-lock counts in other situations. Report by Zoltán Böszörményi; patch review by Jeff Davis.
* Don't wait for the commit record to be replicated if we wrote no WAL.Heikki Linnakangas2012-04-17
| | | | | | | | When using synchronous replication, we waited for the commit record to be replicated, but if we our transaction didn't write any other WAL records, that's not required because we don't even flush the WAL locally to disk in that case. This lead to long waits when committing a transaction that only modified a temporary table. Bug spotted by Thom Brown.
* Fix typoPeter Eisentraut2012-04-16
| | | | Kyotaro HORIGUCHI
* Teach SLRU code to avoid replacing I/O-busy pages.Robert Haas2012-04-08
| | | | Patch by me; review by Tom Lane and others.
* Fix misleading output from gin_desc().Tom Lane2012-04-06
| | | | | | | | | | | XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed with a list link field labeled as "blkno", which was confusing, especially when the link was empty (InvalidBlockNumber). Print the metapage block number instead, since that's what's actually being updated. We could include the link values too as a separate field, but not clear it's worth the trouble. Back-patch to 8.4 where the dubious code was added.
* Publish checkpoint timing information to pg_stat_bgwriter.Robert Haas2012-04-05
| | | | Greg Smith, Peter Geoghegan, and Robert Haas
* Correct epoch of txid_current() when executed on a Hot Standby server.Simon Riggs2012-03-29
| | | | | | | | | Initialise ckptXidEpoch from starting checkpoint and maintain the correct value as we roll forwards. This allows GetNextXidAndEpoch() to return the correct epoch when executed during recovery. Backpatch to 9.0 when the problem is first observable by a user. Bug report from Daniel Farina
* Code cleanup for heap_freeze_tuple.Robert Haas2012-03-26
| | | | | | | It used to be case that lazy vacuum could call this function with only a shared lock on the buffer, but neither lazy vacuum nor any other code path does that any more. Simplify the code accordingly and clean up some related, obsolete comments.
* Clean up compiler warnings from unused variables with asserts disabledPeter Eisentraut2012-03-21
| | | | | | For those variables only used when asserts are enabled, use a new macro PG_USED_FOR_ASSERTS_ONLY, which expands to __attribute__((unused)) when asserts are not enabled.
* Add additional safety check against invalid backup label filePeter Eisentraut2012-03-14
| | | | | | | It was already checking for invalid data after "BACKUP FROM", but would possibly crash if "BACKUP FROM" was missing altogether. found by Coverity
* Fix SPGiST vacuum algorithm to handle concurrent tuple motion properly.Tom Lane2012-03-12
| | | | | | | | | | | | | A leaf tuple that we need to delete could get moved as a consequence of an insertion happening concurrently with the VACUUM scan. If it moves from a page past the current scan point to a page before, we'll miss it, which is not acceptable. Hence, when we see a leaf-page REDIRECT that could have been made since our scan started, chase down the redirection pointer much as if we were doing a normal index search, and be sure to vacuum every page it leads to. This fixes the issue because, if the tuple was on page N at the instant we start our scan, we will surely find it as a consequence of chasing the redirect from page N, no matter how much it moves around in between. Problem noted by Takashi Yamamoto.
* Teach SPGiST to store nulls and do whole-index scans.Tom Lane2012-03-11
| | | | | | | | | | | | | | | | | | | | | This patch fixes the other major compatibility-breaking limitation of SPGiST, that it didn't store anything for null values of the indexed column, and so could not support whole-index scans or "x IS NULL" tests. The approach is to create a wholly separate search tree for the null entries, and use fixed "allTheSame" insertion and search rules when processing this tree, instead of calling the index opclass methods. This way the opclass methods do not need to worry about dealing with nulls. Catversion bump is for pg_am updates as well as the change in on-disk format of SPGiST indexes; there are some tweaks in SPGiST WAL records as well. Heavily rewritten version of a patch by Oleg Bartunov and Teodor Sigaev. (The original also stored nulls separately, but it reused GIN code to do so; which required undesirable compromises in the on-disk format, and would likely lead to bugs due to the GIN code being required to work in two very different contexts.)
* Restructure SPGiST opclass interface API to support whole-index scans.Tom Lane2012-03-10
| | | | | | | | | | | | | | | | | | | | The original API definition was incapable of supporting whole-index scans because there was no way to invoke leaf-value reconstruction without checking any qual conditions. Also, it was inefficient for multiple-qual-condition scans because value reconstruction got done over again for each qual condition, and because other internal work in the consistent functions likewise had to be done for each qual. To fix these issues, pass the whole scankey array to the opclass consistent functions, instead of only letting them see one item at a time. (Essentially, the loop over scankey entries is now inside the consistent functions not outside them. This makes the consistent functions a bit more complicated, but not unreasonably so.) In itself this commit does nothing except save a few cycles in multiple-qual-condition index scans, since we can't support whole-index scans on SPGiST indexes until nulls are included in the index. However, I consider this a must-fix for 9.2 because once we release it will get very much harder to change the opclass API definition.
* Update outdated comment. HeapTupleHeader.t_natts field doesn't exist anymore.Heikki Linnakangas2012-03-09
| | | | Kevin Grittner
* Silence warning about unused variable, when building without assertions.Heikki Linnakangas2012-03-08
|
* Typo fix.Robert Haas2012-03-06
| | | | Fujii Masao
* Make the comments more clear on the fact that UpdateFullPageWrites() is notHeikki Linnakangas2012-03-06
| | | | safe to call concurrently from multiple processes.
* Remove extra copies of LogwrtResult.Heikki Linnakangas2012-03-06
| | | | | | | | | | | | | | | This simplifies the code a little bit. The new rule is that to update XLogCtl->LogwrtResult, you must hold both WALWriteLock and info_lck, whereas before we had two copies, one that was protected by WALWriteLock and another protected by info_lck. The code that updates them was already holding both locks, so merging the two is trivial. The third copy, XLogCtl->Insert.LogwrtResult, was not totally redundant, it was used in AdvanceXLInsertBuffer to update the backend-local copy, before acquiring the info_lck to read the up-to-date value. But the value of that seems dubious; at best it's saving one spinlock acquisition per completed WAL page, which is not significant compared to all the other work involved. And in practice, it's probably not saving even that much.
* Simplify the way changes to full_page_writes are logged.Heikki Linnakangas2012-03-06
| | | | | | | | | | | | | | | It's harmless to do full page writes even when not strictly necessary, so when turning full_page_writes on, we can set the global flag first, and then call XLogInsert. Likewise, when turning it off, we can write the WAL record first, and then clear the flag. This way XLogInsert doesn't need any special handling of the XLOG_FPW_CHANGE record type. XLogInsert is complicated enough already, so anything we can keep away from there is a good thing. Actually I don't think the atomicity of the shared memory flag matters, anyway, because we only write the XLOG_FPW_CHANGE at the end of recovery, when there are no concurrent WAL insertions going on. But might as well make it safe, in case we allow changing full_page_writes on the fly in the future.
* More carefully validate xlog location string inputsMagnus Hagander2012-03-04
| | | | | | | Now that we have validate_xlog_location, call it from the previously existing functions taking xlog locatoins as a string input. Suggested by Fujii Masao
* Add function pg_xlog_location_diff to help comparisonsMagnus Hagander2012-03-04
| | | | | | | | Comparing two xlog locations are useful for example when calculating replication lag. Euler Taveira de Oliveira, reviewed by Fujii Masao, and some cleanups from me
* When a GiST page is split during index build, it might not have a buffer.Heikki Linnakangas2012-03-02
| | | | | | | | | | | | | Previously it was thought that it's impossible as the code stands, because insertions create buffers as tuples are cascaded downwards, and index split also creaters buffers eagerly for all halves. But the example from Jay Levitt demonstrates that it can happen, when the root page is split. It's in fact OK if the buffer doesn't exist, so we just need to remove the sanity check. In fact, we've been discussing the possibility of destroying empty buffers to conserve memory, which would render the sanity check completely useless anyway. Fix by Alexander Korotkov
* Add const qualifiers where they are accidentally cast awayPeter Eisentraut2012-02-28
| | | | | This only produces warnings under -Wcast-qual, but it's more correct and consistent in any case.
* Fix some more bugs in GIN's WAL replay logic.Tom Lane2012-02-26
| | | | | | | | | | | | | | | | | | In commit 4016bdef8aded77b4903c457050622a5a1815c16 I fixed a bunch of ginxlog.c bugs having to do with not handling XLogReadBuffer failures correctly. However, in ginRedoUpdateMetapage and ginRedoDeleteListPages, I unaccountably thought that failure to read the metapage would be impossible and just put in an elog(PANIC) call. This is of course wrong: failure is exactly what will happen if the index got dropped (or rebuilt) between creation of the WAL record and the crash we're trying to recover from. I believe this explains Nicholas Wilson's recent report of these errors getting reached. Also, fix memory leak in forgetIncompleteSplit. This wasn't of much concern when the code was written, but in a long-running standby server page split records could be expected to accumulate indefinitely. Back-patch to 8.4 --- before that, GIN didn't have a metapage.
* Add some enumeration commas, for consistencyPeter Eisentraut2012-02-24
|
* Don't clear btpo_cycleid during _bt_vacuum_one_page.Tom Lane2012-02-21
| | | | | | | | | | | | | | When "vacuuming" a single btree page by removing LP_DEAD tuples, we are not actually within a vacuum operation, but rather in an ordinary insertion process that could well be running concurrently with a vacuum. So clearing the cycleid is incorrect, and could cause the concurrent vacuum to miss removing tuples that it needs to remove. This is a longstanding bug introduced by commit e6284649b9e30372b3990107a082bc7520325676 of 2006-07-25. I believe it explains Maxim Boguk's recent report of index corruption, and probably some other previously unexplained reports. In 9.0 and up this is a one-line fix; before that we need to introduce a flag to tell _bt_delitems what to do.
* Cosmetic cleanup for commit a760893dbda9934e287789d54bbd3c4ca3914ce0.Tom Lane2012-02-21
| | | | Mostly, fixing overlooked comments.
* Fix heap_multi_insert to set t_self field in the caller's tuples.Heikki Linnakangas2012-02-13
| | | | | | | | If tuples were toasted, heap_multi_insert didn't update the ctid on the original tuples. This caused a failure if there was an after trigger (including a foreign key), on the table, and a tuple got toasted. Per off-list report and test case from Ted Phelps
* Throw error sooner for unlogged GiST indexes.Tom Lane2012-02-08
| | | | | | Throwing an error only after we've built the main index fork is pretty unfriendly when the table already contains data. Per gripe from Jay Levitt.
* Rename LWLockWaitUntilFree to LWLockAcquireOrWait.Heikki Linnakangas2012-02-08
| | | | | LWLockAcquireOrWait makes it more clear that the lock is acquired if it's free.
* Add locking around WAL-replay modification of shared-memory variables.Tom Lane2012-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | Originally, most of this code assumed that no Postgres backends could be running concurrently with it, and so no locking could be needed. That assumption fails in Hot Standby. While it's still true that Hot Standby backends should never change values like nextXid, they can examine them, and consistency is important in some cases such as when computing a snapshot. Therefore, prudence requires that WAL replay code obtain the relevant locks when modifying such variables, even though it can examine them without taking a lock. We were following that coding rule in some places but not all. This commit applies the coding rule uniformly to all updates of ShmemVariableCache and MultiXactState fields; a search of the replay routines did not find any other cases that seemed to be at risk. In addition, this commit fixes a longstanding thinko in replay of NEXTOID and checkpoint records: we tried to advance nextOid only if it was behind the value in the WAL record, but the comparison would draw the wrong conclusion if OID wraparound had occurred since the previous value. Better to just unconditionally assign the new value, since OID assignment shouldn't be happening during replay anyway. The additional locking seems to be more in the nature of future-proofing than fixing any live bug, so I am not going to back-patch it. The NEXTOID fix will be back-patched separately.
* Fix transient clobbering of shared buffers during WAL replay.Tom Lane2012-02-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | RestoreBkpBlocks was in the habit of zeroing and refilling the target buffer; which was perfectly safe when the code was written, but is unsafe during Hot Standby operation. The reason is that we have coding rules that allow backends to continue accessing a tuple in a heap relation while holding only a pin on its buffer. Such a backend could see transiently zeroed data, if WAL replay had occasion to change other data on the page. This has been shown to be the cause of bug #6425 from Duncan Rance (who deserves kudos for developing a sufficiently-reproducible test case) as well as Bridget Frey's re-report of bug #6200. It most likely explains the original report as well, though we don't yet have confirmation of that. To fix, change the code so that only bytes that are supposed to change will change, even transiently. This actually saves cycles in RestoreBkpBlocks, since it's not writing the same bytes twice. Also fix seq_redo, which has the same disease, though it has to work a bit harder to meet the requirement. So far as I can tell, no other WAL replay routines have this type of bug. In particular, the index-related replay routines, which would certainly be broken if they had to meet the same standard, are not at risk because we do not have coding rules that allow access to an index page when not holding a buffer lock on it. Back-patch to 9.0 where Hot Standby was added.
* Improve comment.Tom Lane2012-02-04
|
* Avoid re-checking for visibility map extension too frequently.Robert Haas2012-02-01
| | | | | | | | | | When testing bits (but not when setting or clearing them), we now won't check whether the map has been extended. This significantly improves performance in the case where the visibility map doesn't exist yet, by avoiding an extra system call per tuple. To make sure backends notice eventually, send an smgr inval on VM extension. Dean Rasheed, with minor modifications by me.
* Make group commit more effective.Heikki Linnakangas2012-01-30
| | | | | | | | | | | | | | | | | | | | When a backend needs to flush the WAL, and someone else is already flushing the WAL, wait until it releases the WALInsertLock and check if we still need to do the flush or if the other backend already did the work for us, before acquiring WALInsertLock. This helps group commit, because when the WAL flush finishes, all the backends that were waiting for it can be woken up in one go, and the can all concurrently observe that they're done, rather than waking them up one by one in a cascading fashion. This is based on a new LWLock function, LWLockWaitUntilFree(), which has peculiar semantics. If the lock is immediately free, it grabs the lock and returns true. If it's not free, it waits until it is released, but then returns false without grabbing the lock. This is used in XLogFlush(), so that when the lock is acquired, the backend flushes the WAL, but if it's not, the backend first checks the current flush location before retrying. Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although this patch as committed ended up being very different from that.
* Assorted comment fixes, mostly just typos, but some obsolete statements.Tom Lane2012-01-29
| | | | YAMAMOTO Takashi
* Allow pg_basebackup from standby node with safety checking.Simon Riggs2012-01-25
| | | | | | | Base backup follows recommended procedure, plus goes to great lengths to ensure that partial page writes are avoided. Jun Ishizuka and Fujii Masao, with minor modifications
* fastgetattr is in access/htup.h, not access/heapam.hRobert Haas2012-01-16
| | | | Noted by Peter Geoghegan
* Correctly initialise shared recoveryLastRecPtr in recovery.Simon Riggs2012-01-13
| | | | | | | | Previously we used ReadRecPtr rather than EndRecPtr, which was not a serious error but caused pg_stat_replication to report incorrect replay_location until at least one WAL record is replayed. Fujii Masao
* Fix CLUSTER/VACUUM FULL for toast values owned by recently-updated rows.Tom Lane2012-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | In commit 7b0d0e9356963d5c3e4d329a917f5fbb82a2ef05, I made CLUSTER and VACUUM FULL try to preserve toast value OIDs from the original toast table to the new one. However, if we have to copy both live and recently-dead versions of a row that has a toasted column, those versions may well reference the same toast value with the same OID. The patch then led to duplicate-key failures as we tried to insert the toast value twice with the same OID. (The previous behavior was not very desirable either, since it would have silently inserted the same value twice with different OIDs. That wastes space, but what's worse is that the toast values inserted for already-dead heap rows would not be reclaimed by subsequent ordinary VACUUMs, since they go into the new toast table marked live not deleted.) To fix, check if the copied OID already exists in the new toast table, and if so, assume that it stores the desired value. This is reasonably safe since the only case where we will copy an OID from a previous toast pointer is when toast_insert_or_update was given that toast pointer and so we just pulled the data from the old table; if we got two different values that way then we have big problems anyway. We do have to assume that no other backend is inserting items into the new toast table concurrently, but that's surely safe for CLUSTER and VACUUM FULL. Per bug #6393 from Maxim Boguk. Back-patch to 9.0, same as the previous patch.
* Remove useless 'needlock' argument from GetXLogInsertRecPtr. It was alwaysHeikki Linnakangas2012-01-11
| | | | passed as 'true'.