aboutsummaryrefslogtreecommitdiff
path: root/src/backend/replication/walreceiver.c
Commit message (Collapse)AuthorAge
* Add archive_mode='always' option.Heikki Linnakangas2015-05-15
| | | | | | | In 'always' mode, the standby independently archives all files it receives from the primary. Original patch by Fujii Masao, docs and review by me.
* Fix integer overflow in debug message of walreceiverTatsuo Ishii2015-03-14
| | | | | | | | | | | The message tries to tell the replication apply delay which fails if the first WAL record is not applied yet. Fix is, instead of telling overflowed minus numeric, showing "N/A" which indicates that the delay data is not yet available. Problem reported by me and patch by Fabrízio de Royes Mello. Back patched to 9.4, 9.3 and 9.2 stable branches (9.1 and 9.0 do not have the debug message).
* Commonalize process startup code.Andres Freund2015-01-14
| | | | | | | | | Move common code, that was duplicated in every postmaster child/every standalone process, into two functions in miscinit.c. Not only does that already result in a fair amount of net code reduction but it also makes it much easier to remove more duplication in the future. The prime motivation wasn't code deduplication though, but easier addition of new common code.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Fix bogus time printout in walreceiver's debug log messages.Tom Lane2014-04-04
| | | | | | | | | | | | | The displayed sendtime and receipttime were always exactly equal, because somebody forgot that timestamptz_to_str returns a static buffer (thereby simplifying life for most callers, at the cost of complicating it for those who need two results concurrently). Apply the same pstrdup solution used by the other call sites with this issue. Back-patch to 9.2 where the faulty code was introduced. Per bug #9849 from Haruka Takatsuka, though this is not exactly his patch. Possibly we should change timestamptz_to_str's API, but I wouldn't want to do so in the back branches.
* Introduce logical decoding.Robert Haas2014-03-03
| | | | | | | | | | | | | | | | | | | | | | This feature, building on previous commits, allows the write-ahead log stream to be decoded into a series of logical changes; that is, inserts, updates, and deletes and the transactions which contain them. It is capable of handling decoding even across changes to the schema of the effected tables. The output format is controlled by a so-called "output plugin"; an example is included. To make use of this in a real replication system, the output plugin will need to be modified to produce output in the format appropriate to that system, and to perform filtering. Currently, information can be extracted from the logical decoding system only via SQL; future commits will add the ability to stream changes via walsender. Andres Freund, with review and other contributions from many other people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan, Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve Singer.
* Fix some more bugs in signal handlers and process shutdown logic.Tom Lane2014-02-01
| | | | | | | | | | | | | WalSndKill was doing things exactly backwards: it should first clear MyWalSnd (to stop signal handlers from touching MyWalSnd->latch), then disown the latch, and only then mark the WalSnd struct unused by clearing its pid field. Also, WalRcvSigUsr1Handler and worker_spi_sighup failed to preserve errno, which is surely a requirement for any signal handler. Per discussion of recent buildfarm failures. Back-patch as far as the relevant code exists.
* Introduce replication slots.Robert Haas2014-01-31
| | | | | | | | | | | | | | | | Replication slots are a crash-safe data structure which can be created on either a master or a standby to prevent premature removal of write-ahead log segments needed by a standby, as well as (with hot_standby_feedback=on) pruning of tuples whose removal would cause replication conflicts. Slots have some advantages over existing techniques, as explained in the documentation. In a few places, we refer to the type of replication slots introduced by this patch as "physical" slots, because forthcoming patches for logical decoding will also have slots, but with somewhat different properties. Andres Freund and Robert Haas
* Fix Hot Standby feedback sending when streaming busily.Heikki Linnakangas2014-01-16
| | | | | | | | | | | | Commit 6f60fdd7015b032bf49273c99f80913d57eac284 accidentally removed a call to XLogWalRcvSendHSFeedback() after flushing received WAL to disk. The consequence is that when walsender is busy streaming WAL, it doesn't send HS feedback messages. One is sent if nothing is received from the master for 100ms, but if there's a steady stream of WAL, it never happens. Backpatch to 9.3. Andres Freund and Amit Kapila
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Fix more instances of "the the" in comments.Heikki Linnakangas2013-12-13
| | | | Plus one instance of "to to" in the docs.
* Message style improvementsPeter Eisentraut2013-07-28
|
* Minor spell checkingPeter Eisentraut2013-05-30
|
* pgindent run for release 9.3Bruce Momjian2013-05-29
| | | | | This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
* Force archive_status of .done for xlogs created by dearchival/replication.Simon Riggs2013-02-15
| | | | | | | | | This is a forward-patch of commit 6f4b8a4f4f7a2d683ff79ab59d3693714b965e3d, applied to 9.2 back in August. The plan was to do something else in master, but it looks like it's not going to happen, so let's just apply the 9.2 solution to master as well. Fujii Masao
* Reset master xmin when hot_standby_feedback disabled.Simon Riggs2013-02-04
| | | | | | If walsender has xmin of standby then ensure we reset the value to 0 when we change from hot_standby_feedback=on to hot_standby_feedback=off.
* Now that START_REPLICATION returns the next timeline's ID after reaching endHeikki Linnakangas2013-01-18
| | | | | | | | | | | | | | of timeline, take advantage of that in walreceiver. Startup process is still in control of choosign the target timeline, by scanning the timeline history files present in pg_xlog, but walreceiver now uses the next timeline's ID to fetch its history file immediately after it has finished streaming the old timeline. Before, the standby would first try to restart streaming on the old timeline, which fetches the missing timeline history file as a side-effect, and only then restart from the new timeline. This patch eliminates the extra iteration, which speeds up the timeline switch and reduces the noise in the log caused by the extra restart on the old timeline.
* I added a result set to START_STREAMING command, but neglected walreceiver.Heikki Linnakangas2013-01-17
| | | | | | | The patch to allow pg_receivexlog to switch timeline added a result set after copy has ended in START_STREAMING command, to return the next timeline's ID to the client. But walreceived didn't get the memo, and threw an error on the unexpected result set. Fix.
* Delay reading timeline history file until it's fetched from master.Heikki Linnakangas2013-01-03
| | | | | | | | | | | | | | | | | | Streaming replication can fetch any missing timeline history files from the master, but recovery would read the timeline history file for the target timeline before reading the checkpoint record, and before walreceiver has had a chance to fetch it from the master. Delay reading it, and the sanity checks involving timeline history, until after reading the checkpoint record. There is at least one scenario where this makes a difference: if you take a base backup from a standby server right after a timeline switch, the WAL segment containing the initial checkpoint record will begin with an older timeline ID. Without the timeline history file, recovering that file will fail as the older timeline ID is not recognized to be an ancestor of the target timeline. If you try to recover from such a backup, using only streaming replication to fetch the WAL, this patch is required for that to work.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Remove obsolete XLogRecPtr macrosAlvaro Herrera2012-12-28
| | | | | | | | | | | | | | | | | This gets rid of XLByteLT, XLByteLE, XLByteEQ and XLByteAdvance. These were useful for brevity when XLogRecPtrs were split in xlogid/xrecoff; but now that they are simple uint64's, they are just clutter. The only downside to making this change would be ease of backporting patches, but that has been negated by other substantive changes to the involved code anyway. The clarity of simpler expressions makes the change worthwhile. Most of the changes are mechanical, but in a couple of places, the patch author chose to invert the operator sense, making the code flow more logical (and more in line with preceding comments). Author: Andres Freund Eyeballed by Dimitri Fontaine and Alvaro Herrera
* Follow TLI of last replayed record, not recovery target TLI, in walsenders.Heikki Linnakangas2012-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the time, the last replayed record comes from the recovery target timeline, but there is a corner case where it makes a difference. When the startup process scans for a new timeline, and decides to change recovery target timeline, there is a window where the recovery target TLI has already been bumped, but there are no WAL segments from the new timeline in pg_xlog yet. For example, if we have just replayed up to point 0/30002D8, on timeline 1, there is a WAL file called 000000010000000000000003 in pg_xlog that contains the WAL up to that point. When recovery switches recovery target timeline to 2, a walsender can immediately try to read WAL from 0/30002D8, from timeline 2, so it will try to open WAL file 000000020000000000000003. However, that doesn't exist yet - the startup process hasn't copied that file from the archive yet nor has the walreceiver streamed it yet, so walsender fails with error "requested WAL segment 000000020000000000000003 has already been removed". That's harmless, in that the standby will try to reconnect later and by that time the segment is already created, but error messages that should be ignored are not good. To fix that, have walsender track the TLI of the last replayed record, instead of the recovery target timeline. That way walsender will not try to read anything from timeline 2, until the WAL segment has been created and at least one record has been replayed from it. The recovery target timeline is now xlog.c's internal affair, it doesn't need to be exposed in shared memory anymore. This fixes the error reported by Thom Brown. depesz the same error message, but I'm not sure if this fixes his scenario.
* Allow a streaming replication standby to follow a timeline switch.Heikki Linnakangas2012-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, streaming replication would refuse to start replicating if the timeline in the primary doesn't exactly match the standby. The situation where it doesn't match is when you have a master, and two standbys, and you promote one of the standbys to become new master. Promoting bumps up the timeline ID, and after that bump, the other standby would refuse to continue. There's significantly more timeline related logic in streaming replication now. First of all, when a standby connects to primary, it will ask the primary for any timeline history files that are missing from the standby. The missing files are sent using a new replication command TIMELINE_HISTORY, and stored in standby's pg_xlog directory. Using the timeline history files, the standby can follow the latest timeline present in the primary (recovery_target_timeline='latest'), just as it can follow new timelines appearing in an archive directory. START_REPLICATION now takes a TIMELINE parameter, to specify exactly which timeline to stream WAL from. This allows the standby to request the primary to send over WAL that precedes the promotion. The replication protocol is changed slightly (in a backwards-compatible way although there's little hope of streaming replication working across major versions anyway), to allow replication to stop when the end of timeline reached, putting the walsender back into accepting a replication command. Many thanks to Amit Kapila for testing and reviewing various versions of this patch.
* Make the streaming replication protocol messages architecture-independent.Heikki Linnakangas2012-11-07
| | | | | | | | | | | We used to send structs wrapped in CopyData messages, which works as long as the client and server agree on things like endianess, timestamp format and alignment. That's good enough for running a standby server, which has to run on the same platform anyway, but it's useful for tools like pg_receivexlog to work across platforms. This breaks protocol compatibility of streaming replication, but we never promised that to be compatible across versions, anyway.
* Fix typo in comment.Heikki Linnakangas2012-10-15
| | | | Fujii Masao
* Improve replication connection timeouts.Heikki Linnakangas2012-10-11
| | | | | | | | | | | | | | | | Rename replication_timeout to wal_sender_timeout, and add a new setting called wal_receiver_timeout that does the same at the walreceiver side. There was previously no timeout in walreceiver, so if the network went down, for example, the walreceiver could take a long time to notice that the connection was lost. Now with the two settings, both sides of a replication connection will detect a broken connection similarly. It is no longer necessary to manually set wal_receiver_status_interval to a value smaller than the timeout. Both wal sender and receiver now automatically send a "ping" message if more than 1/2 of the configured timeout has elapsed, and it hasn't received any messages from the other end. Amit Kapila, heavily edited by me.
* Ensure all replication message info is available and correct via WalRcvSimon Riggs2012-08-09
|
* Fix management of pendingOpsTable in auxiliary processes.Tom Lane2012-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mdinit() was misusing IsBootstrapProcessingMode() to decide whether to create an fsync pending-operations table in the current process. This led to creating a table not only in the startup and checkpointer processes as intended, but also in the bgwriter process, not to mention other auxiliary processes such as walwriter and walreceiver. Creation of the table in the bgwriter is fatal, because it absorbs fsync requests that should have gone to the checkpointer; instead they just sit in bgwriter local memory and are never acted on. So writes performed by the bgwriter were not being fsync'd which could result in data loss after an OS crash. I think there is no live bug with respect to walwriter and walreceiver because those never perform any writes of shared buffers; but the potential is there for future breakage in those processes too. To fix, make AuxiliaryProcessMain() export the current process's AuxProcType as a global variable, and then make mdinit() test directly for the types of aux process that should have a pendingOpsTable. Having done that, we might as well also get rid of the random bool flags such as am_walreceiver that some of the aux processes had grown. (Note that we could not have fixed the bug by examining those variables in mdinit(), because it's called from BaseInit() which is run by AuxiliaryProcessMain() before entering any of the process-type-specific code.) Back-patch to 9.2, where the problem was introduced by the split-up of bgwriter and checkpointer processes. The bogus pendingOpsTable exists in walwriter and walreceiver processes in earlier branches, but absent any evidence that it causes actual problems there, I'll leave the older branches alone.
* Don't initialize TLI variable to -1, as TimeLineID is unsigned.Heikki Linnakangas2012-07-14
| | | | | | | | This was causing a compiler warning with Solaris compiler. Use 0 instead. The variable is initialized just for the sake of tidyness and/or debugging, it's not used for anything before setting it to a real value. Per report and suggestion from Peter Eisentraut.
* Always treat a standby returning an an invalid flush location as asyncMagnus Hagander2012-07-04
| | | | | | | | This ensures that a standby such as pg_receivexlog will not be selected as sync standby - which would cause the master to block waiting for a location that could never happen. Fujii Masao
* I neglected many comments in the log+seg -> 64-bit segno patch. Fix.Heikki Linnakangas2012-06-27
| | | | Reported by Amit Kapila.
* Replace XLogRecPtr struct with a 64-bit integer.Heikki Linnakangas2012-06-24
| | | | | | | | | | | | | | This simplifies code that needs to do arithmetic on XLogRecPtrs. To avoid changing on-disk format of data pages, the LSN on data pages is still stored in the old format. That should keep pg_upgrade happy. However, we have XLogRecPtrs embedded in the control file, and in the structs that are sent over the replication protocol, so this changes breaks compatibility of pg_basebackup and server. I didn't do anything about this in this patch, per discussion on -hackers, the right thing to do would to be to change the replication protocol to be architecture-independent, so that you could use a newer version of pg_receivexlog, for example, against an older server version.
* Don't waste the last segment of each 4GB logical log file.Heikki Linnakangas2012-06-24
| | | | | | | | | | | | | | | | The comments claimed that wasting the last segment made it easier to do calculations with XLogRecPtrs, because you don't have problems representing last-byte-position-plus-1 that way. In my experience, however, it only made things more complicated, because the there was two ways to represent the boundary at the beginning of a logical log file: logid = n+1 and xrecoff = 0, or as xlogid = n and xrecoff = 4GB - XLOG_SEG_SIZE. Some functions were picky about which representation was used. Also, use a 64-bit segment number instead of the log/seg combination, to point to a certain WAL segment. We assume that all platforms have a working 64-bit integer type nowadays. This is an incompatible change in WAL format, so bumping WAL version number.
* Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian2012-06-10
| | | | commit-fest.
* Typo fix.Robert Haas2012-01-13
|
* Minor but necessary improvements to WAL keepalivesSimon Riggs2012-01-13
| | | | Fujii Masao
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Send new protocol keepalive messages to standby servers.Simon Riggs2011-12-31
| | | | | Allows streaming replication users to calculate transfer latency and apply delay via internal functions. No external functions yet.
* Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h.Tom Lane2011-09-09
| | | | | | | | | | | As per my recent proposal, this refactors things so that these typedefs and macros are available in a header that can be included in frontend-ish code. I also changed various headers that were undesirably including utils/timestamp.h to include datatype/timestamp.h instead. Unsurprisingly, this showed that half the system was getting utils/timestamp.h by way of xlog.h. No actual code changes here, just header refactoring.
* Clean up the #include mess a little.Tom Lane2011-09-04
| | | | | | | | | | | | | | | | | walsender.h should depend on xlog.h, not vice versa. (Actually, the inclusion was circular until a couple hours ago, which was even sillier; but Bruce broke it in the expedient rather than logically correct direction.) Because of that poor decision, plus blind application of pgrminclude, we had a situation where half the system was depending on xlog.h to include such unrelated stuff as array.h and guc.h. Clean up the header inclusion, and manually revert a lot of what pgrminclude had done so things build again. This episode reinforces my feeling that pgrminclude should not be run without adult supervision. Inclusion changes in header files in particular need to be reviewed with great care. More generally, it'd be good if we had a clearer notion of module layering to dictate which headers can sanely include which others ... but that's a big task for another day.
* Remove unnecessary #include references, per pgrminclude script.Bruce Momjian2011-09-01
|
* Change the autovacuum launcher to use WaitLatch instead of a poll loop.Tom Lane2011-08-10
| | | | | | | | | | | | | | | | | In pursuit of this (and with the expectation that WaitLatch will be needed in more places), convert the latch field that was already added to PGPROC for sync rep into a generic latch that is activated for all PGPROC-owning processes, and change many of the standard backend signal handlers to set that latch when a signal happens. This will allow WaitLatch callers to be wakened properly by these signals. In passing, fix a whole bunch of signal handlers that had been hacked to do things that might change errno, without adding the necessary save/restore logic for errno. Also make some minor fixes in unix_latch.c, and clean up bizarre and unsafe scheme for disowning the process's latch. Much of this has to be back-patched into 9.1. Peter Geoghegan, with additional work by Tom
* Cascading replication feature for streaming log-based replication.Simon Riggs2011-07-19
| | | | | | | | | Standby servers can now have WALSender processes, which can work with either WALReceiver or archive_commands to pass data. Fully updated docs, including new conceptual terms of sending server, upstream and downstream servers. WALSenders terminated when promote to master. Fujii Masao, review, rework and doc rewrite by Simon Riggs
* Introduce a pipe between postmaster and each backend, which can be used toHeikki Linnakangas2011-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | detect postmaster death. Postmaster keeps the write-end of the pipe open, so when it dies, children get EOF in the read-end. That can conveniently be waited for in select(), which allows eliminating some of the polling loops that check for postmaster death. This patch doesn't yet change all the loops to use the new mechanism, expect a follow-on patch to do that. This changes the interface to WaitLatch, so that it takes as argument a bitmask of events that it waits for. Possible events are latch set, timeout, postmaster death, and socket becoming readable or writeable. The pipe method behaves slightly differently from the kill() method previously used in PostmasterIsAlive() in the case that postmaster has died, but its parent has not yet read its exit code with waitpid(). The pipe returns EOF as soon as the process dies, but kill() continues to return true until waitpid() has been called (IOW while the process is a zombie). Because of that, change PostmasterIsAlive() to use the pipe too, otherwise WaitLatch() would return immediately with WL_POSTMASTER_DEATH, while PostmasterIsAlive() would claim it's still alive. That could easily lead to busy-waiting while postmaster is in zombie state. Peter Geoghegan with further changes by me, reviewed by Fujii Masao and Florian Pflug.
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-10
|
* Make walreceiver send a reply after receiving data but before flushing it.Robert Haas2011-03-25
| | | | | | | It originally worked this way, but was changed by commit a8a8a3e0965201df88bdfdff08f50e5c06c552b7, since which time it's been impossible for walreceiver to ever send a reply with write_location and flush_location set to different values.
* Efficient transaction-controlled synchronous replication.Simon Riggs2011-03-06
| | | | | | | | | | | | | | | | | | If a standby is broadcasting reply messages and we have named one or more standbys in synchronous_standby_names then allow users who set synchronous_replication to wait for commit, which then provides strict data integrity guarantees. Design avoids sending and receiving transaction state information so minimises bookkeeping overheads. We synchronize with the highest priority standby that is connected and ready to synchronize. Other standbys can be defined to takeover in case of standby failure. This version has very strict behaviour; more relaxed options may be added at a later date. Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime Casanova, Heikki Linnakangas and Robert Haas, plus the assistance of many other design reviewers.
* Change pg_last_xlog_receive_location() not to move backwards. That makesHeikki Linnakangas2011-03-01
| | | | | | | | | | it a lot more useful for determining which standby is most up-to-date, for example. There was long discussions on whether overwriting existing existing WAL makes sense to begin with, and whether we should do some more extensive variable renaming, but this change nevertheless seems quite uncontroversial. Fujii Masao, reviewed by Jeff Janes, Robert Haas, Stephen Frost.
* Avoid excessive Hot Standby feedback messages.Robert Haas2011-03-01
| | | | | | | | Without this patch, when wal_receiver_status_interval=0, indicating that no status messages should be sent, Hot Standby feedback messages are instead sent extremely frequently. Fujii Masao, with documentation changes by me.