aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
...
* Refer to max_wal_senders in a more consistent fashion.Robert Haas2010-04-01
| | | | | | | The error message now makes explicit reference to the GUC that must be changed to fix the problem, using wording suggested by Tom Lane. Along the way, rename the GUC from MaxWalSenders to max_wal_senders for consistency and grep-ability.
* Change recovery.conf.sample to match postgresql.conf by showing onlyBruce Momjian2010-03-31
| | | | default values, with example comments.
* Change the retry-loop in standby mode to also try restoring files fromHeikki Linnakangas2010-03-30
| | | | | | | | | | | | | | | | pg_xlog directory. This is essential for replaying WAL records that were streamed from the master, after a standby server restart. If a corrupt record is seen in a file restored from the archive or streamed from the master, log it as a WARNING and keep retrying. If the corruption is permanent, and not just a glitch in the whatever copies the files to the archive or a network error not caught by CRC checks in TCP for example, we will keep retrying and logging the WARNING indefinitely. But that's better than shutting down completely, the standby is still useful for running read-only queries. In PITR the recovery ends at such a corrupt record, which is a bit questionable, but that's the behavior we had in previous releases and we don't feel like chaning it now. It does make sense for tools like pg_standby.
* Properly initialize local varaible inBruce Momjian2010-03-30
| | | | | btree_xlog_delete_get_latestRemovedXid(). This variable was only tested in assert builds.
* Edit recovery.conf.sample so it matches docs. Change standby_modeSimon Riggs2010-03-29
| | | | | | example to 'on or 'off' rather than 'true' or 'false', as shown in docs. Add restartpoint_command. Add section header for recovery target parameters, matching docs.
* Derive latestRemovedXid for btree deletes by reading heap pages. TheSimon Riggs2010-03-28
| | | | | | | | | | | WAL record for btree delete contains a list of tids, even when backup blocks are present. We follow the tids to their heap tuples, taking care to follow LP_REDIRECT tuples. We ignore LP_DEAD tuples on the understanding that they will always have xmin/xmax earlier than any LP_NORMAL tuples referred to by killed index tuples. Iff all tuples are LP_DEAD we return InvalidTransactionId. The heap relfilenode is added to the WAL record, requiring API changes to pass down the heap Relation. XLOG_PAGE_MAGIC updated.
* Message tuningPeter Eisentraut2010-03-21
|
* Adjust comment in .history file to match recovery target specified. CommentSimon Riggs2010-03-19
| | | | | | | | present since 8.0 was never fully meaningful, since two recovery targets cannot be specified. Refactor recovery target type to make this change and associated code easier to understand. No change in function. Bug report arising from internal support question.
* Reset btpo.xact following recovery of btree delete page. Add btpo_xactSimon Riggs2010-03-19
| | | | | | | field into WAL record and reset it from there, rather than using FrozenTransactionId which can lead to some corner case bugs. Problem report and suggested route to a fix from Heikki, details by me.
* Add restartpoint_command option to recovery.conf. Fix bug in %r handlingHeikki Linnakangas2010-03-18
| | | | | | | | | in recovery_end_command, it always came out as 0 because InRedo was cleared before recovery_end_command was executed. Also, always take ControlFileLock when reading checkpoint location for %r. The recovery_end_command bug and the missing locking was present in 8.4 as well, that part of this patch will be backported separately.
* Remove incorrect comment from GetWriteRecPtr(): the return value is alwaysSimon Riggs2010-03-15
| | | | correct, as described in comments at start of xlog.c
* Add missing reset of need_initialization in reloptions code.Tom Lane2010-03-11
| | | | | This resulted in useless extra work during every call of parseRelOptions, but no bad effects other than that. Noted by Alvaro.
* pg_start_backup() can use a share lock to lock ControlFileLockItagaki Takahiro2010-03-10
| | | | | | | | | instead of an exclusive lock. The change is almost for code cleanup. Since there seems to be no performance benefits from it, backports should not be needed. Fujii Masao
* pgindent run for 9.0Bruce Momjian2010-02-26
|
* Make pg_stop_backup's reporting a bit more verbose in hopes of makingTom Lane2010-02-25
| | | | | | error cases less intimidating for novices. Per discussion. Greg Smith
* Clean up handling of XactReadOnly and RecoveryInProgress checks.Tom Lane2010-02-20
| | | | | | | | | | | | | | | | | | Add some checks that seem logically necessary, in particular let's make real sure that HS slave sessions cannot create temp tables. (If they did they would think that temp tables belonging to the master's session with the same BackendId were theirs. We *must* not allow myTempNamespace to become set in a slave session.) Change setval() and nextval() so that they are only allowed on temp sequences in a read-only transaction. This seems consistent with what we allow for table modifications in read-only transactions. Since an HS slave can't have a temp sequence, this also provides a nicer cure for the setval PANIC reported by Erik Rijkers. Make the error messages more uniform, and have them mention the specific command being complained of. This seems worth the trifling amount of extra code, since people are likely to see such messages a lot more than before.
* Don't use O_DIRECT when writing WAL files if archiving or streaming isHeikki Linnakangas2010-02-19
| | | | | | | | | | enabled. Bypassing the kernel cache is counter-productive in that case, because the archiver/walsender process will read from the WAL file soon after it's written, and if it's not cached the read will cause a physical read, eating I/O bandwidth available on the WAL drive. Also, walreceiver process does unaligned writes, so disable O_DIRECT in walreceiver process for that reason too.
* Fix STOP WAL LOCATION in backup history files no to return the nextItagaki Takahiro2010-02-19
| | | | | | | | | | | segment of XLOG_BACKUP_END record even if the the record is placed at a segment boundary. Furthermore the previous implementation could return nonexistent segment file name when the boundary is in segments that has "FE" suffix; We never use segments with "FF" suffix. Backpatch to 8.0, where hot backup was introduced. Reported by Fujii Masao.
* Stamp HEAD as 9.0devel, and update various places that were referring to 8.5Tom Lane2010-02-17
| | | | (hope I got 'em all). Per discussion, this release will be 9.0 not 8.5.
* When updating ShmemVariableCache from a checkpoint record, be sure to setTom Lane2010-02-17
| | | | | | | | | | all the values derived from oldestXid, not just that field. Brain fade in one of my patches associated with flat file removal, exposed by a report from Fujii Masao. With this change, xidVacLimit should always be valid, so remove a couple of bits of complexity associated with the previous assumption that sometimes it wouldn't get set right away.
* Replace the pg_listener-based LISTEN/NOTIFY mechanism with an in-memory queue.Tom Lane2010-02-16
| | | | | | | | | | | | In addition, add support for a "payload" string to be passed along with each notify event. This implementation should be significantly more efficient than the old one, and is also more compatible with Hot Standby usage. There is not yet any facility for HS slaves to receive notifications generated on the master, although such a thing is possible in future. Joachim Wieland, reviewed by Jeff Davis; also hacked on by me.
* Wrap calls to SearchSysCache and related functions using macros.Robert Haas2010-02-14
| | | | | | | | | | | | The purpose of this change is to eliminate the need for every caller of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists, GetSysCacheOid, and SearchSysCacheList to know the maximum number of allowable keys for a syscache entry (currently 4). This will make it far easier to increase the maximum number of keys in a future release should we choose to do so, and it makes the code shorter, too. Design and review by Tom Lane.
* Fix relcache init file invalidation during Hot Standby for the caseSimon Riggs2010-02-13
| | | | | | | | where a database has a non-default tablespaceid. Pass thru MyDatabaseId and MyDatabaseTableSpace to allow file path to be re-created in standby and correct invalidation to take place in all cases. Update and rework xact_commit_desc() debug messages. Bug report from Tom by code inspection. Fix by me.
* Introduce WAL records to log reuse of btree pages, allowing conflictSimon Riggs2010-02-13
| | | | | resolution during Hot Standby. Page reuse interlock requested by Tom. Analysis and patch by me.
* Reduce the chatter to the log when starting a standby server. Don'tHeikki Linnakangas2010-02-12
| | | | | | | | | echo all the recovery.conf options. Don't emit the "initializing recovery connections" message, which doesn't mean anything to a user. Remove the "starting archive recovery" message and replace the "automatic recovery in progress" message with a more informative message saying whether the server is doing PITR, normal archive recovery, or standby mode.
* If primary_conninfo is not set, don't try to establish streamingHeikki Linnakangas2010-02-12
| | | | connection.
* Check for partial WAL files in standby mode. If restore_command restoresHeikki Linnakangas2010-02-12
| | | | | | | a partial WAL file, assume it's because the file is just being copied to the archive and treat it the same as "file not found" in standby mode. pg_standby has a similar check, so it seems reasonable to have the same level of protection in the built-in standby mode.
* Generic implementation of red-black binary tree. It's planned to use inTeodor Sigaev2010-02-11
| | | | | | several places, but for now only GIN uses it during index creation. Using self-balanced tree greatly speeds up index creation in corner cases with preordered data.
* Now that streaming replication switches between streaming mode andHeikki Linnakangas2010-02-10
| | | | | | | restoring from archive, the last WAL segment is not necessarily open at the end of recovery. Fix assertion that assumed that. Fujii Masao, fixing the assertion failure reported by Martin Pihlak.
* Fix up rickety handling of relation-truncation interlocks.Tom Lane2010-02-09
| | | | | | | | | | | | | | | | | | | | Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr relation entries, so that they will get reset to InvalidBlockNumber whenever an smgr-level flush happens. Because we now send smgr invalidation messages immediately (not at end of transaction) when a relation truncation occurs, this ensures that other backends will reset their values before they next access the relation. We no longer need the unreliable assumption that a VACUUM that's doing a truncation will hold its AccessExclusive lock until commit --- in fact, we can intentionally release that lock as soon as we've completed the truncation. This patch therefore reverts (most of) Alvaro's patch of 2009-11-10, as well as my marginal hacking on it yesterday. We can also get rid of assorted no-longer-needed relcache flushes, which are far more expensive than an smgr flush because they kill a lot more state. In passing this patch fixes smgr_redo's failure to perform visibility-map truncation, and cleans up some rather dubious assumptions in freespace.c and visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of date.
* Fix bug in GIN WAL redo cleanup function: don't free fake relcache entryHeikki Linnakangas2010-02-09
| | | | | | while it's still being used. Backpatch to 8.4, where the fake relcache method was introduced.
* Remove piece of code to zero out minRecoveryPoint when starting crashHeikki Linnakangas2010-02-08
| | | | | | | | | recovery. It's zeroed out whenever a checkpoint is written, so the only scenario where the removed code did anything is when you kill archive recovery, remove recovery.conf, and start up the server, so that it goes into crash recovery instead. That's a "don't do that" scenario, but it seems better to not clear minRecoveryPoint but instead update it like we do in archive recovery, which is what will now happen.
* Remove some more dead VACUUM-FULL-only code.Tom Lane2010-02-08
|
* Remove old-style VACUUM FULL (which was known for a little while asTom Lane2010-02-08
| | | | | | | | | | | | | | | | | VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity. Per discussion, the use case for this method of vacuuming is no longer large enough to justify maintaining it; not to mention that we don't wish to invest the work that would be needed to make it play nicely with Hot Standby. Aside from the code directly related to old-style VACUUM FULL, this commit removes support for certain WAL record types that could only be generated within VACUUM FULL, redirect-pointer removal in heap_page_prune, and nontransactional generation of cache invalidation sinval messages (the last being the sticking point for Hot Standby). We still have to retain all code that copes with finding HEAP_MOVED_OFF and HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long as we want to support in-place update from pre-9.0 databases.
* Create a "relation mapping" infrastructure to support changing the relfilenodesTom Lane2010-02-07
| | | | | | | | | | | | | | | | | | | | | | | of shared or nailed system catalogs. This has two key benefits: * The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs. * We no longer have to use an unsafe reindex-in-place approach for reindexing shared catalogs. CLUSTER on nailed catalogs now works too, although I left it disabled on shared catalogs because the resulting pg_index.indisclustered update would only be visible in one database. Since reindexing shared system catalogs is now fully transactional and crash-safe, the former special cases in REINDEX behavior have been removed; shared catalogs are treated the same as non-shared. This commit does not do anything about the recently-discussed problem of deadlocks between VACUUM FULL/CLUSTER on a system catalog and other concurrent queries; will address that in a separate patch. As a stopgap, parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid such failures during the regression tests.
* Restructure CLUSTER/newstyle VACUUM FULL/ALTER TABLE support so that swappingTom Lane2010-02-04
| | | | | | | | | | | | | | | | | | | of old and new toast tables can be done either at the logical level (by swapping the heaps' reltoastrelid links) or at the physical level (by swapping the relfilenodes of the toast tables and their indexes). This is necessary infrastructure for upcoming changes to support CLUSTER/VAC FULL on shared system catalogs, where we cannot change reltoastrelid. The physical swap saves a few catalog updates too. We unfortunately have to keep the logical-level swap logic because in some cases we will be adding or deleting a toast table, so there's no possibility of a physical swap. However, that only happens as a consequence of schema changes in the table, which we do not need to support for system catalogs, so such cases aren't an obstacle for that. In passing, refactor the cluster support functions a little bit to eliminate unnecessarily-duplicated code; and fix the problem that while CLUSTER had been taught to rename the final toast table at need, ALTER TABLE had not.
* Move the responsibility of writing a "unlogged WAL operation" record fromHeikki Linnakangas2010-02-03
| | | | | | heap_sync() to the callers, because heap_sync() is sometimes called even if the operation itself is WAL-logged. This eliminates the bogus unlogged records from CLUSTER that Simon Riggs reported, patch by Fujii Masao.
* Revoke augmentation of WAL records for btree delete, per discussion.Simon Riggs2010-02-01
|
* Augment WAL records for btree delete with GetOldestXmin() to reduceSimon Riggs2010-01-29
| | | | | | | | false positives during Hot Standby conflict processing. Simple patch to enhance conflict processing, following previous discussions. Controlled by parameter minimize_standby_conflicts = on | off, with default off allows measurement of performance impact to see whether it should be set on all the time.
* Filter recovery conflicts based upon dboid from relfilenode of WALSimon Riggs2010-01-29
| | | | | | | | records for heap and btree. Minor change, mostly API changes to pass through the required values. This is a simple change though also provides the refactoring required for further enhancements to conflict processing using the relOid. Changes only have effect during Hot Standby.
* Fix crashing bug at the end of recovery in Streaming Replication, whenHeikki Linnakangas2010-01-28
| | | | restore_command is not given. Fujii Masao.
* Fix bug in wasender's xlogid boundary handling, reported by Erik Rijkers.Heikki Linnakangas2010-01-27
| | | | | | | | LogwrtRqst.Write can be set to non-existent FF log segment, we mustn't try to send that in XLogSend(). Also fix similar bug in ReadRecord(), which I just introduced in the ReadRecord() refactoring patch.
* Make standby server continuously retry restoring the next WAL segment withHeikki Linnakangas2010-01-27
| | | | | | | | | | | | | | | | | | | | | | | | restore_command, if the connection to the primary server is lost. This ensures that the standby can recover automatically, if the connection is lost for a long time and standby falls behind so much that the required WAL segments have been archived and deleted in the master. This also makes standby_mode useful without streaming replication; the server will keep retrying restore_command every few seconds until the trigger file is found. That's the same basic functionality pg_standby offers, but without the bells and whistles. To implement that, refactor the ReadRecord/FetchRecord functions. The FetchRecord() function introduced in the original streaming replication patch is removed, and all the retry logic is now in a new function called XLogReadPage(). XLogReadPage() is now responsible for executing restore_command, launching walreceiver, and waiting for new WAL to arrive from primary, as required. This also changes the life cycle of walreceiver. When launched, it now only tries to connect to the master once, and exits if the connection fails, or is lost during streaming for any reason. The startup process detects the death, and re-launches walreceiver if necessary.
* Fix longstanding gripe that we check for 0000000001.history at start ofSimon Riggs2010-01-26
| | | | archive recovery, even when we know it is never present.
* Fix assorted core dumps and Assert failures that could occur duringTom Lane2010-01-24
| | | | | | | | | | | | | | AbortTransaction or AbortSubTransaction, when trying to clean up after an error that prevented (sub)transaction start from completing: * access to TopTransactionResourceOwner that might not exist * assert failure in AtEOXact_GUC, if AtStart_GUC not called yet * assert failure or core dump in AfterTriggerEndSubXact, if AfterTriggerBeginSubXact not called yet Per testing by injecting elog(ERROR) at successive steps in StartTransaction and StartSubTransaction. It's not clear whether all of these cases could really occur in the field, but at least one of them is easily exposed by simple stress testing, as per my accidental discovery yesterday.
* In HS, Startup process sets SIGALRM when waiting for buffer pin. IfSimon Riggs2010-01-23
| | | | | | | woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock.
* Replace ALTER TABLE ... SET STATISTICS DISTINCT with a more general mechanism.Robert Haas2010-01-22
| | | | | | | | | Attributes can now have options, just as relations and tablespaces do, and the reloptions code is used to parse, validate, and store them. For simplicity and because these options are not performance critical, we store them in a separate cache rather than the main relcache. Thanks to Alex Hunsaker for the review.
* Write a WAL record whenever we perform an operation without WAL-loggingHeikki Linnakangas2010-01-20
| | | | | | | | that would've been WAL-logged if archiving was enabled. If we encounter such records in archive recovery anyway, we know that some data is missing from the log. A WARNING is emitted in that case. Original patch by Fujii Masao, with changes by me.
* Fix incorrect comparison of scan key in GIN. Per report fromTeodor Sigaev2010-01-18
| | | | Vyacheslav Kalinin <vka@mgcp.com>
* Teach standby conflict resolution to use SIGUSR1Simon Riggs2010-01-16
| | | | | | | | | | Conflict reason is passed through directly to the backend, so we can take decisions about the effect of the conflict based upon the local state. No specific changes, as yet, though this prepares for later work. CancelVirtualTransaction() sends signals while holding ProcArrayLock. Introduce errdetail_abort() to give message detail explaining that the abort was caused by conflict processing. Remove CONFLICT_MODE states in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.