aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/transam/xlog.c
Commit message (Collapse)AuthorAge
* Remove unused structure member.Robert Haas2016-07-21
| | | | Michael Paquier
* Fix start WAL filename for concurrent backups from standbyMagnus Hagander2016-07-11
| | | | | | | | | | On a standby, ThisTimelineID is always 0, so we would generate a filename in timeline 0 even for other timelines. Instead, use starttli which we have retreived from the controlfile. Report by: Francesco Canovai in bug #14230 Author: Marco Nenciarini Reviewed by: Michael Paquier and Amit Kapila
* pgindent run for 9.6Robert Haas2016-06-09
|
* Fix poorly-worded log message.Tom Lane2016-05-08
| | | | Euler Taveira
* Implement backup API functions for non-exclusive backupsMagnus Hagander2016-04-05
| | | | | | | | | | | | | | | | | | | | | Previously non-exclusive backups had to be done using the replication protocol and pg_basebackup. With this commit it's now possible to make them using pg_start_backup/pg_stop_backup as well, as long as the backup program can maintain a persistent connection to the database. Doing this, backup_label and tablespace_map are returned as results from pg_stop_backup() instead of being written to the data directory. This makes the server safe from a crash during an ongoing backup, which can be a problem with exclusive backups. The old syntax of the functions remain and work exactly as before, but since the new syntax is safer this should eventually be deprecated and removed. Only reference documentation is included. The main section on backup still needs to be rewritten to cover this, but since that is already scheduled for a separate large rewrite, it's not included in this patch. Reviewed by David Steele and Amit Kapila
* Display WAL pointer in rm_redo error callbackAlvaro Herrera2016-04-04
| | | | | This makes it easier to identify the source of a recovery problem in case of a bug or data corruption.
* Add new replication mode synchronous_commit = 'remote_apply'.Robert Haas2016-03-29
| | | | | | | | | | | | | | | | | | | In this mode, the master waits for the transaction to be applied on the remote side, not just written to disk. That means that you can count on a transaction started on the standby to see all commits previously acknowledged by the master. To make this work, the standby sends a reply after replaying each commit record generated with synchronous_commit >= 'remote_apply'. This introduces a small inefficiency: the extra replies will be sent even by standbys that aren't the current synchronous standby. But previously-existing synchronous_commit levels make no attempt at all to optimize which replies are sent based on what the primary cares about, so this is no worse, and at least avoids any extra replies for people not using the feature at all. Thomas Munro, reviewed by Michael Paquier and by me. Some additional tweaks by me.
* Merge wal_level "archive" and "hot_standby" into new name "replica"Peter Eisentraut2016-03-18
| | | | | | | | | | | | | | | | | The distinction between "archive" and "hot_standby" existed only because at the time "hot_standby" was added, there was some uncertainty about stability. This is now a long time ago. We would like to move forward with simplifying the replication configuration, but this distinction is in the way, because a primary server cannot tell (without asking a standby or predicting the future) which one of these would be the appropriate level. Pick a new name for the combined setting to make it clearer that it covers all (non-logical) backup and replication uses. The old values are still accepted but are converted internally. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: David Steele <david@pgmasters.net>
* Fix typos in commentsAlvaro Herrera2016-03-15
|
* Avoid unlikely data-loss scenarios due to rename() without fsync.Andres Freund2016-03-09
| | | | | | | | | | | | | | | | | | | | | Renaming a file using rename(2) is not guaranteed to be durable in face of crashes. Use the previously added durable_rename()/durable_link_or_rename() in various places where we previously just renamed files. Most of the changed call sites are arguably not critical, but it seems better to err on the side of too much durability. The most prominent known case where the previously missing fsyncs could cause data loss is crashes at the end of a checkpoint. After the actual checkpoint has been performed, old WAL files are recycled. When they're filled, their contents are fdatasynced, but we did not fsync the containing directory. An OS/hardware crash in an unfortunate moment could then end up leaving that file with its old name, but new content; WAL replay would thus not replay it. Reported-By: Tomas Vondra Author: Michael Paquier, Tomas Vondra, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches
* Ignore recovery_min_apply_delay until recovery has reached consistent stateFujii Masao2016-03-06
| | | | | | | | | | | | | | | | | | | Previously recovery_min_apply_delay was applied even before recovery had reached consistency. This could cause us to wait a long time unexpectedly for read-only connections to be allowed. It's problematic because the standby was useless during that wait time. This patch changes recovery_min_apply_delay so that it's applied once the database has reached the consistent state. That is, even if the delay is set, the standby tries to replay WAL records as fast as possible until it has reached consistency. Author: Michael Paquier Reviewed-By: Julien Rouhaud Reported-By: Greg Clough Backpatch: 9.4, where recovery_min_apply_delay was added Bug: #13770 Discussion: http://www.postgresql.org/message-id/20151111155006.2644.84564@wrigleys.postgresql.org
* Allow the WAL writer to flush WAL at a reduced rate.Andres Freund2016-02-16
| | | | | | | | | | | | | | | | | | | | | | | | Commit 4de82f7d7 increased the WAL flush rate, mainly to increase the likelihood that hint bits can be set quickly. More quickly set hint bits can reduce contention around the clog et al. But unfortunately the increased flush rate can have a significant negative performance impact, I have measured up to a factor of ~4. The reason for this slowdown is that if there are independent writes to the underlying devices, for example because shared buffers is a lot smaller than the hot data set, or because a checkpoint is ongoing, the fdatasync() calls force cache flushes to be emitted to the storage. This is achieved by flushing WAL only if the last flush was longer than wal_writer_delay ago, or if more than wal_writer_flush_after (new GUC) unflushed blocks are pending. Based on some tests the default for wal_writer_delay is 1MB, which seems to work well both on SSD and rotational media. To avoid negative performance impact due to 4de82f7d7 an earlier commit (db76b1e) made SetHintBits() more likely to succeed; preventing performance regressions in the pgbench tests I performed. Discussion: 20160118163908.GW10941@awork2.anarazel.de
* Change delimiter used for display of NextXIDJoe Conway2016-02-12
| | | | | | | | | | | | | NextXID has been rendered in the form of a pg_lsn even though it really is not. This can cause confusion, so change the format from %u/%u to %u:%u, per discussion on hackers. Complaint by me, patch by me and Bruce, reviewed by Michael Paquier and Alvaro. Applied to HEAD only. Author: Joe Conway, Bruce Momjian Reviewed-by: Michael Paquier, Alvaro Herrera Backpatch-through: master
* Make builtin lwlock tranche names consistent.Robert Haas2016-02-12
| | | | | | Previously, we had a mix of styles. Amit Kapila
* Shift the responsibility for emitting "database system is shut down".Tom Lane2016-02-11
| | | | | | | | | | | | | | | | | | Historically this message has been emitted at the end of ShutdownXLOG(). That's not an insane place for it in a standalone backend, but in the postmaster environment we've grown a fair amount of stuff that happens later, including archiver/walsender shutdown, stats collector shutdown, etc. Recent buildfarm experimentation showed that on slower machines there could be many seconds' delay between finishing ShutdownXLOG() and actual postmaster exit. That's fairly confusing, both for testing purposes and for DBAs. Hence, move the code that prints this message into UnlinkLockFiles(), so that it comes out just after we remove the postmaster's pidfile. That is a more appropriate definition of "is shut down" from the point of view of "pg_ctl stop", for example. In general, removing the pidfile should be the last externally-visible action of either a postmaster or a standalone backend; compare commit d73d14c271653dff10c349738df79ea03b85236c for instance. So this seems like a reasonably future-proof approach.
* Revert "Temporarily make pg_ctl and server shutdown a whole lot chattier."Tom Lane2016-02-10
| | | | | | This reverts commit 3971f64843b02e4a55d854156bd53e46a0588e45 and a couple of followon debugging commits; I think we've learned what we can from them.
* Add more chattiness in server shutdown.Tom Lane2016-02-09
| | | | | | | Early returns from the buildfarm show that there's a bit of a gap in the logging I added in 3971f64843b02e4a: the portion of CreateCheckPoint() after CheckPointGuts() can take a fair amount of time. Add a few more log messages in that section of code. This too shall be reverted later.
* Temporarily make pg_ctl and server shutdown a whole lot chattier.Tom Lane2016-02-08
| | | | | | | | | | | | | This is a quick hack, due to be reverted when its purpose has been served, to try to gather information about why some of the buildfarm critters regularly fail with "postmaster does not shut down" complaints. Maybe they are just really overloaded, but maybe something else is going on. Hence, instrument pg_ctl to print the current time when it starts waiting for postmaster shutdown and when it gives up, and add a lot of logging of the current time in the server's checkpoint and shutdown code paths. No attempt has been made to make this pretty. I'm not even totally sure if it will build on Windows, but we'll soon find out.
* Speedup 2PC by skipping two phase state files in normal pathSimon Riggs2016-01-20
| | | | | | | | | | | | | | | | | 2PC state info is written only to WAL at PREPARE, then read back from WAL at COMMIT PREPARED/ABORT PREPARED. Prepared transactions that live past one bufmgr checkpoint cycle will be written to disk in the same form as previously. Crash recovery path is not altered. Measured performance gains of 50-100% for short 2PC transactions by completely avoiding writing files and fsyncing. Other optimizations still available, further patches in related areas expected. Stas Kelvich and heavily edited by Simon Riggs Based upon earlier ideas and patches by Michael Paquier and Heikki Linnakangas, a concrete example of how Postgres-XC has fed back ideas into PostgreSQL. Reviewed by Michael Paquier, Jeff Janes and Andres Freund Performance testing by Jesper Pedersen
* Maintain local LogwrtResult consistentlySimon Riggs2016-01-12
| | | | | Teach GetFlushRecPtr() to update LogwrtResult cache as performed by all other functions in xlog.c
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Rename (new|old)estCommitTs to (new|old)estCommitTsXidJoe Conway2015-12-28
| | | | | | | | | | | | | The variables newestCommitTs and oldestCommitTs sound as if they are timestamps, but in fact they are the transaction Ids that correspond to the newest and oldest timestamps rather than the actual timestamps. Rename these variables to reflect that they are actually xids: to wit newestCommitTsXid and oldestCommitTsXid respectively. Also modify related code in a similar fashion, particularly the user facing output emitted by pg_controldata and pg_resetxlog. Complaint and patch by me, review by Tom Lane and Alvaro Herrera. Backpatch to 9.5 where these variables were first introduced.
* Provide a way to predefine LWLock tranche IDs.Robert Haas2015-12-15
| | | | | | | | | | | It's a bit cumbersome to use LWLockNewTrancheId(), because the returned value needs to be shared between backends so that each backend can call LWLockRegisterTranche() with the correct ID. So, for built-in tranches, use a hard-coded value instead. This is motivated by an upcoming patch adding further built-in tranches. Andres Freund and Robert Haas
* Fix commit timestamp initializationAlvaro Herrera2015-12-11
| | | | | | | | | | | | | | | | | | | | | | This module needs explicit initialization in order to replay WAL records in recovery, but we had broken this recently following changes to make other (stranger) scenarios work correctly. To fix, rework the initialization sequence so that it always takes place before WAL replay commences for both master and standby. I could have gone for a more localized fix that just added a "startup" call for the master server, but it seemed better to restructure the existing callers as well so that the whole thing made more sense. As a drawback, there is more control logic in xlog.c now than previously, but doing otherwise meant passing down the ControlFile flag, which seemed uglier as a whole. This also meant adding a check to not re-execute ActivateCommitTs if it had already been called. Reported by Fujii Masao. Backpatch to 9.5.
* Further tweak commit_timestamp behaviorAlvaro Herrera2015-12-03
| | | | | | | | | | | | | | | | | | | | | | As pointed out by Fujii Masao, we weren't quite there on a standby behaving sanely: first because we were failing to acquire the correct state in the case where no XLOG_PARAMETER_CHANGE message was sent (because a checkpoint had already happened after the setting was changed in the master, and then the standby was restarted); and second because promoting the standby with the feature enabled failed to activate it if the master had the feature disabled. This patch fixes both those misbehaviors hopefully without re-introducing any old problems. Also change the hint emitted in a standby together with the error message about the feature being disabled, to make it point out that the place to chance the setting is the master. Otherwise, if the setting is already enabled in the standby, it is very confusing to have it say that the setting must be enabled ... Authors: Álvaro Herrera, Petr Jelínek. Backpatch to 9.5.
* Message improvementsPeter Eisentraut2015-11-16
|
* Fix commit_ts for standbyAlvaro Herrera2015-10-01
| | | | | | | | | | | | | | | | | | | | | | | Module initialization was still not completely correct after commit 6b61955135e9, per crash report from Takashi Ohnishi. To fix, instead of trying to monkey around with the value of the GUC setting directly, add a separate boolean flag that enables the feature on a standby, but only for the startup (recovery) process, when it sees that its master server has the feature enabled. Discussion: http://www.postgresql.org/message-id/ca44c6c7f9314868bdc521aea4f77cbf@MP-MSGSS-MBX004.msg.nttdata.co.jp Also change the deactivation routine to delete all segment files rather than leaving the last one around. (This doesn't need separate WAL-logging, because on recovery we execute the same deactivation routine anyway.) In passing, clean up the code structure somewhat, particularly so that xlog.c doesn't know so much about when to activate/deactivate the feature. Thanks to Fujii Masao for testing and Petr Jelínek for off-list discussion. Back-patch to 9.5, where commit_ts was introduced.
* Code review for transaction commit timestampsAlvaro Herrera2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There are three main changes here: 1. No longer cause a start failure in a standby if the feature is disabled in postgresql.conf but enabled in the master. This reverts one part of commit 4f3924d9cd43; what we keep is the ability of the standby to activate/deactivate the module (which includes creating and removing segments as appropriate) during replay of such actions in the master. 2. Replay WAL records affecting commitTS even if the feature is disabled. This means the standby will always have the same state as the master after replay. 3. Have COMMIT PREPARE record the transaction commit time as well. We were previously only applying it in the normal transaction commit path. Author: Petr Jelínek Discussion: http://www.postgresql.org/message-id/CAHGQGwHereDzzzmfxEBYcVQu3oZv6vZcgu1TPeERWbDc+gQ06g@mail.gmail.com Discussion: http://www.postgresql.org/message-id/CAHGQGwFuzfO4JscM9LCAmCDCxp_MfLvN4QdB+xWsS-FijbjTYQ@mail.gmail.com Additionally, I cleaned up nearby code related to replication origins, which I found a bit hard to follow, and fixed a couple of typos. Backpatch to 9.5, where this code was introduced. Per bug reports from Fujii Masao and subsequent discussion.
* Remove legacy multixact truncation support.Andres Freund2015-09-26
| | | | | | | | | | | | | In 9.5 and master there is no need to support legacy truncation. This is just committed separately to make it easier to backpatch the WAL logged multixact truncation to 9.3 and 9.4 if we later decide to do so. I bumped master's magic from 0xD086 to 0xD088 and 9.5's from 0xD085 to 0xD087 to avoid 9.5 reusing a value that has been in use on master while keeping the numbers increasing between major versions. Discussion: 20150621192409.GA4797@alap3.anarazel.de Backpatch: 9.5
* Rework the way multixact truncations work.Andres Freund2015-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fact that multixact truncations are not WAL logged has caused a fair share of problems. Amongst others it requires to do computations during recovery while the database is not in a consistent state, delaying truncations till checkpoints, and handling members being truncated, but offset not. We tried to put bandaids on lots of these issues over the last years, but it seems time to change course. Thus this patch introduces WAL logging for multixact truncations. This allows: 1) to perform the truncation directly during VACUUM, instead of delaying it to the checkpoint. 2) to avoid looking at the offsets SLRU for truncation during recovery, we can just use the master's values. 3) simplify a fair amount of logic to keep in memory limits straight, this has gotten much easier During the course of fixing this a bunch of additional bugs had to be fixed: 1) Data was not purged from memory the member's SLRU before deleting segments. This happened to be hard or impossible to hit due to the interlock between checkpoints and truncation. 2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but that doesn't work for offsets that haven't yet been flushed to disk. Add code to flush the SLRUs to fix. Not pretty, but it feels slightly safer to only make decisions based on actual on-disk state. 3) find_multixact_start() could be called concurrently with a truncation and thus fail. Via SetOffsetVacuumLimit() that could lead to a round of emergency vacuuming. The problem remains in pg_get_multixact_members(), but that's quite harmless. For now this is going to only get applied to 9.5+, leaving the issues in the older branches in place. It is quite possible that we need to backpatch at a later point though. For the case this gets backpatched we need to handle that an updated standby may be replaying WAL from a not-yet upgraded primary. We have to recognize that situation and use "old style" truncation (i.e. looking at the SLRUs) during WAL replay. In contrast to before, this now happens in the startup process, when replaying a checkpoint record, instead of the checkpointer. Doing truncation in the restartpoint is incorrect, they can happen much later than the original checkpoint, thereby leading to wraparound. To avoid "multixact_redo: unknown op code 48" errors standbys would have to be upgraded before primaries. A later patch will bump the WAL page magic, and remove the legacy truncation codepaths. Legacy truncation support is just included to make a possible future backpatch easier. Discussion: 20150621192409.GA4797@alap3.anarazel.de Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro Backpatch: 9.5 for now
* Add missing serial commaPeter Eisentraut2015-09-18
|
* Improve log messages related to tablespace_map fileFujii Masao2015-09-15
| | | | | | | | | | | | | | | | | | This patch changes the log message which is logged when the server successfully renames backup_label file to *.old but fails to rename tablespace_map file during the shutdown. Previously the WARNING message "online backup mode was not canceled" was logged in that case. However this message is confusing because the backup mode is treated as canceled whenever backup_label is successfully renamed. So this commit makes the server log the message "online backup mode canceled" in that case. Also this commit changes errdetail messages so that they follow the error message style guide. Back-patch to 9.5 where tablespace_map file is introduced. Original patch by Amit Kapila, heavily modified by me.
* Remove files signaling a standby promotion request at postmaster startupFujii Masao2015-09-09
| | | | | | | | | | | | | | | | | | | | | | This commit makes postmaster forcibly remove the files signaling a standby promotion request. Otherwise, the existence of those files can trigger a promotion too early, whether a user wants that or not. This removal of files is usually unnecessary because they can exist only during a few moments during a standby promotion. However there is a race condition: if pg_ctl promote is executed and creates the files during a promotion, the files can stay around even after the server is brought up to new master. Then, if new standby starts by using the backup taken from that master, the files can exist at the server startup and should be removed in order to avoid an unexpected promotion. Back-patch to 9.1 where promote signal file was introduced. Problem reported by Feike Steenbergen. Original patch by Michael Paquier, modified by me. Discussion: 20150528100705.4686.91426@wrigleys.postgresql.org
* Document that max_worker_processes must be high enough in standby.Fujii Masao2015-09-03
| | | | | | | | | The setting values of some parameters including max_worker_processes must be equal to or higher than the values on the master. However, previously max_worker_processes was not listed as such parameter in the document. So this commit adds it to that list. Back-patch to 9.4 where max_worker_processes was added.
* Make recovery rename tablespace_map to *.old if backup_label is not present.Fujii Masao2015-08-03
| | | | | | | | | | | | | | | If tablespace_map file is present without backup_label file, there is no use of such file. There is no harm in retaining it, but it is better to get rid of the map file so that we don't have any redundant file in data directory and it will avoid any sort of confusion. It seems prudent though to just rename the file out of the way rather than delete it completely, also we ignore any error that occurs in rename operation as even if map file is present without backup_label file, it is harmless. Back-patch to 9.5 where tablespace_map file was introduced. Amit Kapila, reviewed by Robert Haas, Alvaro Herrera and me.
* Fix race condition that lead to WALInsertLock deadlock with commit_delay.Heikki Linnakangas2015-08-02
| | | | | | | | | | | | | | | | | If a call to WaitForXLogInsertionsToFinish() returned a value in the middle of a page, and another backend then started to insert a record to the same page, and then you called WaitXLogInsertionsToFinish() again, the second call might return a smaller value than the first call. The problem was in GetXLogBuffer(), which always updated the insertingAt value to the beginning of the requested page, not the actual requested location. Because of that, the second call might return a xlog pointer to the beginning of the page, while the first one returned a later position on the same page. XLogFlush() performs two calls to WaitXLogInsertionsToFinish() in succession, and holds WALWriteLock on the second call, which can deadlock if the second call to WaitXLogInsertionsToFinish() blocks. Reported by Spiros Ioannou. Backpatch to 9.4, where the more scalable WALInsertLock mechanism, and this bug, was introduced.
* Fix issues around the "variable" support in the lwlock infrastructure.Andres Freund2015-08-02
| | | | | | | | | | | | | | | | | | | | | | | | | The lwlock scalability work introduced two race conditions into the lwlock variable support provided for xlog.c. First, and harmlessly on most platforms, it set/read the variable without the spinlock in some places. Secondly, due to the removal of the spinlock, it was possible that a backend missed changes to the variable's state if it changed in the wrong moment because checking the lock's state, the variable's state and the queuing are not protected by a single spinlock acquisition anymore. To fix first move resetting the variable's from LWLockAcquireWithVar to WALInsertLockRelease, via a new function LWLockReleaseClearVar. That prevents issues around waiting for a variable's value to change when a new locker has acquired the lock, but not yet set the value. Secondly re-check that the variable hasn't changed after enqueing, that prevents the issue that the lock has been released and already re-acquired by the time the woken up backend checks for the lock's state. Reported-By: Jeff Janes Analyzed-By: Heikki Linnakangas Reviewed-By: Heikki Linnakangas Discussion: 5592DB35.2060401@iki.fi Backpatch: 9.5, where the lwlock scalability went in
* Use appendStringInfoString/Char et al where appropriate.Heikki Linnakangas2015-07-02
| | | | | | Patch by David Rowley. Backpatch to 9.5, as some of the calls were new in 9.5, and keeping the code in sync with master makes future backpatching easier.
* Make XLogFileCopy() look the same as in 9.4.Fujii Masao2015-07-01
| | | | | | | | | | | | | | | | XLogFileCopy() was changed heavily in commit de76884. However it was partially reverted in commit 7abc685 and most of those changes to XLogFileCopy() were no longer needed. Then commit 7cbee7c removed those unnecessary code, but XLogFileCopy() looked different in master and 9.4 though the contents are almost the same. This patch makes XLogFileCopy() look the same in master and back-branches, which makes back-patching easier, per discussion on pgsql-hackers. Back-patch to 9.5. Discussion: 55760844.7090703@iki.fi Michael Paquier
* Also trigger restartpoints based on max_wal_size on standby.Heikki Linnakangas2015-06-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When archive recovery and restartpoints were initially introduced, checkpoint_segments was ignored on the grounds that the files restored from archive don't consume any space in the recovery server. That was changed in later releases, but even then it was arguably a feature rather than a bug, as performing restartpoints as often as checkpoints during normal operation might be excessive, but you might nevertheless not want to waste a lot of space for pre-allocated WAL by setting checkpoint_segments to a high value. But now that we have separate min_wal_size and max_wal_size settings, you can bound WAL usage with max_wal_size, and still avoid consuming excessive space usage by setting min_wal_size to a lower value, so that argument is moot. There are still some issues with actually limiting the space usage to max_wal_size: restartpoints in recovery can only start after seeing the checkpoint record, while a checkpoint starts flushing buffers as soon as the redo-pointer is set. Restartpoint is paced to happen at the same leisurily speed, determined by checkpoint_completion_target, as checkpoints, but because they are started later, max_wal_size can be exceeded by upto one checkpoint cycle's worth of WAL, depending on checkpoint_completion_target. But that seems better than not trying at all, and max_wal_size is a soft limit anyway. The documentation already claimed that max_wal_size is obeyed in recovery, so this just fixes the behaviour to match the docs. However, add some weasel-words there to mention that max_wal_size may well be exceeded by some amount in recovery.
* Be more conservative about removing tablespace "symlinks".Robert Haas2015-06-26
| | | | | | | | | | Don't apply rmtree(), which will gleefully remove an entire subtree, and don't even apply unlink() unless it's symlink or a directory, the only things that we expect to find. Amit Kapila, with minor tweaks by me, per extensive discussions involving Andrew Dunstan, Fujii Masao, and Heikki Linnakangas, at least some of whom also reviewed the code.
* Add missing check for wal_debug GUC.Andres Freund2015-06-21
| | | | | | | | | | 9a20a9b2 added a new elog(), enabled when WAL_DEBUG is defined. The other WAL_DEBUG dependant messages check for the wal_debug GUC, but this one did not. While at it replace 'upto' with 'up to'. Discussion: 20150610110253.GF3832@alap3.anarazel.de Backpatch to 9.4, the first release containing 9a20a9b2.
* Fix typosAlvaro Herrera2015-06-08
| | | | | | | tablesapce -> tablespace there -> their These were introduced in 72d422a52, so no need to backpatch.
* Refactor WAL segment copying code.Fujii Masao2015-06-09
| | | | | | | | | | | | | | | | | | | | * Remove unused argument "dstfname" and related code from XLogFileCopy(). * Previously XLogFileCopy() returned a pstrdup'd string so that InstallXLogFileSegment() used it later. Since the pstrdup'd string was never free'd, there could be a risk of memory leak. It was almost harmless because the startup process exited just after calling XLogFileCopy(), it existed. This commit changes XLogFileCopy() so that it directly calls InstallXLogFileSegment() and doesn't call pstrdup() at all. Which fixes that memory leak problem. * Extend InstallXLogFileSegment() so that the caller can specify the log level. Which allows us to emit an error when InstallXLogFileSegment() fails a disk file access like link() and rename(). Previously it was always logged with LOG level and additionally needed to be logged with ERROR when we wanted to treat it as an error. Michael Paquier
* Allow HotStandbyActiveInReplay() to be called in single user mode.Andres Freund2015-06-08
| | | | | | | | | | | | HotStandbyActiveInReplay, introduced in 061b079f, only allowed WAL replay to happen in the startup process, missing the single user case. This buglet is fairly harmless as it only causes problems when single user mode in an assertion enabled build is used to replay a btree vacuum record. Backpatch to 9.2. 061b079f was backpatched further, but the assertion was not.
* Fix fsync-at-startup code to not treat errors as fatal.Tom Lane2015-05-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2ce439f3379aed857517c8ce207485655000fc8e introduced a rather serious regression, namely that if its scan of the data directory came across any un-fsync-able files, it would fail and thereby prevent database startup. Worse yet, symlinks to such files also caused the problem, which meant that crash restart was guaranteed to fail on certain common installations such as older Debian. After discussion, we agreed that (1) failure to start is worse than any consequence of not fsync'ing is likely to be, therefore treat all errors in this code as nonfatal; (2) we should not chase symlinks other than those that are expected to exist, namely pg_xlog/ and tablespace links under pg_tblspc/. The latter restriction avoids possibly fsync'ing a much larger part of the filesystem than intended, if the user has left random symlinks hanging about in the data directory. This commit takes care of that and also does some code beautification, mainly moving the relevant code into fd.c, which seems a much better place for it than xlog.c, and making sure that the conditional compilation for the pre_sync_fname pass has something to do with whether pg_flush_data works. I also relocated the call site in xlog.c down a few lines; it seems a bit silly to be doing this before ValidateXLOGDirectoryStructure(). The similar logic in initdb.c ought to be made to match this, but that change is noncritical and will be dealt with separately. Back-patch to all active branches, like the prior commit. Abhijit Menon-Sen and Tom Lane
* pgindent run for 9.5Bruce Momjian2015-05-23
|
* Fix incorrect snprintf() limit.Tom Lane2015-05-23
| | | | | | | | Typo in commit 7cbee7c0a. No practical effect since the buffer should never actually be overrun, but various compilers and static analyzers will whine about it. Petr Jelinek
* At promotion, don't leave behind a partial segment on the old timeline.Heikki Linnakangas2015-05-22
| | | | | | | | | | | | | | | | | | With commit de768844, a copy of the partial segment was archived with the .partial suffix, but the original file was still left in pg_xlog, so it didn't actually solve the problems with archiving the partial segment that it was supposed to solve. With this patch, the partial segment is renamed rather than copied, so we only archive it with the .partial suffix. Also be more robust in detecting if the last segment is already being archived. Previously I used XLogArchiveIsBusy() for that, but that's not quite right. With archive_mode='always', there might be a .ready file for it, and we don't want to rename it to .partial in that case. The old segment is needed until we're fully committed to the new timeline, i.e. until we've written the end-of-recovery WAL record and updated the min recovery point and timeline in the control file. So move the renaming later in the startup sequence, after all that's been done.
* Make recovery_target_action = pause work.Fujii Masao2015-05-21
| | | | | | | | | | | | | | | Previously even if recovery_target_action was set to pause and the recovery target was reached, the recovery could never be paused. Because the setting of pause was *always* overridden with that of shutdown unexpectedly. This override is valid and intentional if hot_standby is not enabled because there is no way to resume the paused recovery in this case and the setting of pause is completely useless. But not if hot_standby is enabled. This patch changes the code so that the setting of pause is overridden with that of shutdown only when hot_standby is not enabled. Bug reported by Andres Freund