diff options
author | Heikki Linnakangas <heikki.linnakangas@iki.fi> | 2012-12-13 19:00:00 +0200 |
---|---|---|
committer | Heikki Linnakangas <heikki.linnakangas@iki.fi> | 2012-12-13 19:17:32 +0200 |
commit | abfd192b1b5ba5216ac4b1f31dcd553106304b19 (patch) | |
tree | 9dc145a8f72c500e06ccc779a2d54784ff1681c1 /src/backend/replication/basebackup.c | |
parent | 527668717a660e67c2a6cfd4e85f7a513f99f6f2 (diff) | |
download | postgresql-abfd192b1b5ba5216ac4b1f31dcd553106304b19.tar.gz postgresql-abfd192b1b5ba5216ac4b1f31dcd553106304b19.zip |
Allow a streaming replication standby to follow a timeline switch.
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.
Diffstat (limited to 'src/backend/replication/basebackup.c')
-rw-r--r-- | src/backend/replication/basebackup.c | 21 |
1 files changed, 17 insertions, 4 deletions
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 04681f41962..65200c129aa 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -56,6 +56,9 @@ static void perform_base_backup(basebackup_options *opt, DIR *tblspcdir); static void parse_basebackup_options(List *options, basebackup_options *opt); static void SendXlogRecPtrResult(XLogRecPtr ptr); +/* Was the backup currently in-progress initiated in recovery mode? */ +static bool backup_started_in_recovery = false; + /* * Size of each block sent into the tar stream for larger files. * @@ -94,6 +97,8 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir) XLogRecPtr endptr; char *labelfile; + backup_started_in_recovery = RecoveryInProgress(); + startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &labelfile); SendXlogRecPtrResult(startptr); @@ -261,7 +266,7 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir) * http://lists.apple.com/archives/xcode-users/2003/Dec//msg000 * 51.html */ - XLogRead(buf, ptr, TAR_SEND_SIZE); + XLogRead(buf, ThisTimeLineID, ptr, TAR_SEND_SIZE); if (pq_putmessage('d', buf, TAR_SEND_SIZE)) ereport(ERROR, (errmsg("base backup could not send data, aborting backup"))); @@ -592,11 +597,19 @@ sendDir(char *path, int basepathlen, bool sizeonly) /* * Check if the postmaster has signaled us to exit, and abort with an * error in that case. The error handler further up will call - * do_pg_abort_backup() for us. + * do_pg_abort_backup() for us. Also check that if the backup was + * started while still in recovery, the server wasn't promoted. + * dp_pg_stop_backup() will check that too, but it's better to stop + * the backup early than continue to the end and fail there. */ - if (ProcDiePending || walsender_ready_to_stop) + CHECK_FOR_INTERRUPTS(); + if (RecoveryInProgress() != backup_started_in_recovery) ereport(ERROR, - (errmsg("shutdown requested, aborting active base backup"))); + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("the standby was promoted during online backup"), + errhint("This means that the backup being taken is corrupt " + "and should not be used. " + "Try taking another online backup."))); snprintf(pathbuf, MAXPGPATH, "%s/%s", path, de->d_name); |