From 5aa2350426c4fdb3d04568b65aadac397012bbcb Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 29 Apr 2015 19:30:53 +0200 Subject: Introduce replication progress tracking infrastructure. When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de --- doc/src/sgml/catalogs.sgml | 123 +++++++++++++++++++++ doc/src/sgml/filelist.sgml | 1 + doc/src/sgml/func.sgml | 201 +++++++++++++++++++++++++++++++++- doc/src/sgml/logicaldecoding.sgml | 35 +++++- doc/src/sgml/postgres.sgml | 1 + doc/src/sgml/replication-origins.sgml | 93 ++++++++++++++++ 6 files changed, 448 insertions(+), 6 deletions(-) create mode 100644 doc/src/sgml/replication-origins.sgml (limited to 'doc/src') diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 898865eea19..4b79958b357 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -238,6 +238,16 @@ query rewrite rules + + pg_replication_origin + registered replication origins + + + + pg_replication_origin_status + information about replication origins, including replication progress + + pg_replication_slots replication slot information @@ -5337,6 +5347,119 @@ + + <structname>pg_replication_origin</structname> + + + pg_replication_origin + + + + The pg_replication_origin catalog contains + all replication origins created. For more on replication origins + see . + + + + + <structname>pg_replication_origin</structname> Columns + + + + + Name + Type + References + Description + + + + + + roident + Oid + + A unique, cluster-wide identifier for the replication + origin. Should never leave the system. + + + + roname + text + + The external, user defined, name of a replication + origin. + + + +
+
+ + + <structname>pg_replication_origin_status</structname> + + + pg_replication_origin_status + + + + The pg_replication_origin_status view + contains information about how far replay for a certain origin has + progressed. For more on replication origins + see . + + + + + <structname>pg_replication_origin_status</structname> Columns + + + + + Name + Type + References + Description + + + + + + local_id + Oid + pg_replication_origin.roident + internal node identifier + + + + external_id + text + pg_replication_origin.roname + external node identifier + + + + remote_lsn + pg_lsn + + The origin node's LSN up to which data has been replicated. + + + + + local_lsn + pg_lsn + + This node's LSN that at + which remote_lsn has been replicated. Used to + flush commit records before persisting data to disk when using + asynchronous commits. + + + +
+
+ <structname>pg_replication_slots</structname> diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 26aa7ee50ee..6268d5496bd 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -95,6 +95,7 @@ + diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 0053d7d4101..dcade93e439 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -16879,11 +16879,13 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); Replication Functions - The functions shown in are - for controlling and interacting with replication features. - See - and for information about the - underlying features. Use of these functions is restricted to superusers. + The functions shown + in are for + controlling and interacting with replication features. + See , + , + for information about the underlying features. Use of these + functions is restricted to superusers. @@ -17040,6 +17042,195 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); on future calls.
+ + + + + pg_replication_origin_create + + pg_replication_origin_create(node_name text) + + + internal_id oid + + + Create a replication origin with the the passed in external + name, and create an internal id for it. + + + + + + + pg_replication_origin_drop + + pg_replication_origin_drop(node_name text) + + + void + + + Delete a previously created replication origin, including the + associated replay progress. + + + + + + + pg_replication_origin_oid + + pg_replication_origin_oid(node_name text) + + + internal_id oid + + + Lookup replication origin by name and return the internal + oid. If no corresponding replication origin is found a error + is thrown. + + + + + + + pg_replication_origin_session_setup + + pg_replication_origin_setup_session(node_name text) + + + void + + + Configure the current session to be replaying from the passed in + origin, allowing replay progress to be tracked. Use + pg_replication_origin_session_reset to revert. + Can only be used if no previous origin is configured. + + + + + + + pg_replication_origin_session_reset + + pg_replication_origin_session_reset() + + + void + + + Cancel the effects + of pg_replication_origin_session_setup(). + + + + + + + pg_replication_session_is_setup + + pg_replication_session_is_setup() + + + bool + + + Has a replication origin been configured in the current session? + + + + + + + pg_replication_origin_session_progress + + pg_replication_origin_progress(flush bool) + + + pg_lsn + + + Return the replay position for the replication origin configured in + the current session. The parameter flush + determines whether the corresponding local transaction will be + guaranteed to have been flushed to disk or not. + + + + + + + pg_replication_origin_xact_setup + + pg_replication_origin_xact_setup(origin_lsn pg_lsn, origin_timestamp timestamptz) + + + void + + + Mark the current transaction to be replaying a transaction that has + committed at the passed in LSN and timestamp. Can + only be called when a replication origin has previously been + configured using + pg_replication_origin_session_setup(). + + + + + + + pg_replication_origin_xact_reset + + pg_replication_origin_xact_reset() + + + void + + + Cancel the effects of + pg_replication_origin_xact_setup(). + + + + + + + pg_replication_origin_advance + + pg_replication_origin_advance(node_name text, pos pg_lsn) + + + void + + + Set replication progress for the passed in node to the passed in + position. This primarily is useful for setting up the initial position + or a new position after configuration changes and similar. Be aware + that careless use of this function can lead to inconsistently + replicated data. + + + + + + + pg_replication_origin_progress + + pg_replication_origin_progress(node_name text, flush bool) + + + pg_lsn + + + Return the replay position for the passed in replication origin. The + parameter flush determines whether the + corresponding local transaction will be guaranteed to have been + flushed to disk or not. + + + diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml index 0810a2d1f97..f817af3ea8a 100644 --- a/doc/src/sgml/logicaldecoding.sgml +++ b/doc/src/sgml/logicaldecoding.sgml @@ -363,6 +363,7 @@ typedef struct OutputPluginCallbacks LogicalDecodeBeginCB begin_cb; LogicalDecodeChangeCB change_cb; LogicalDecodeCommitCB commit_cb; + LogicalDecodeFilterByOriginCB filter_by_origin_cb; LogicalDecodeShutdownCB shutdown_cb; } OutputPluginCallbacks; @@ -370,7 +371,8 @@ typedef void (*LogicalOutputPluginInit)(struct OutputPluginCallbacks *cb); The begin_cb, change_cb and commit_cb callbacks are required, - while startup_cb + while startup_cb, + filter_by_origin_cb and shutdown_cb are optional. @@ -569,6 +571,37 @@ typedef void (*LogicalDecodeChangeCB) ( + + + Origin Filter Callback + + + The optional filter_by_origin_cb callback + is called to determine wheter data that has been replayed + from origin_id is of interest to the + output plugin. + +typedef bool (*LogicalDecodeChangeCB) ( + struct LogicalDecodingContext *ctx, + RepNodeId origin_id +); + + The ctx parameter has the same contents + as for the other callbacks. No information but the origin is + available. To signal that changes originating on the passed in + node are irrelevant, return true, causing them to be filtered + away; false otherwise. The other callbacks will not be called + for transactions and changes that have been filtered away. + + + This is useful when implementing cascading or multi directional + replication solutions. Filtering by the origin allows to + prevent replicating the same changes back and forth in such + setups. While transactions and changes also carry information + about the origin, filtering via this callback is noticeably + more efficient. + + diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index e378d6978d0..4a45138bf72 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -220,6 +220,7 @@ &spi; &bgworker; &logicaldecoding; + &replication-origins; diff --git a/doc/src/sgml/replication-origins.sgml b/doc/src/sgml/replication-origins.sgml new file mode 100644 index 00000000000..c5310229119 --- /dev/null +++ b/doc/src/sgml/replication-origins.sgml @@ -0,0 +1,93 @@ + + + Replication Progress Tracking + + Replication Progress Tracking + + + Replication Origins + + + + Replication origins are intended to make it easier to implement + logical replication solutions on top + of . They provide a solution to two + common problems: + + How to safely keep track of replication progress + How to change replication behavior, based on the + origin of a row; e.g. to avoid loops in bi-directional replication + setups + + + + + Replication origins consist out of a name and a oid. The name, which + is what should be used to refer to the origin across systems, is + free-form text. It should be used in a way that makes conflicts + between replication origins created by different replication + solutions unlikely; e.g. by prefixing the replication solution's + name to it. The oid is used only to avoid having to store the long + version in situations where space efficiency is important. It should + never be shared between systems. + + + + Replication origins can be created using the + pg_replication_origin_create(); + dropped using + pg_replication_origin_drop(); + and seen in the + pg_replication_origin + catalog. + + + + When replicating from one system to another (independent of the fact that + those two might be in the same cluster, or even same database) one + nontrivial part of building a replication solution is to keep track of + replay progress in a safe manner. When the applying process, or the whole + cluster, dies, it needs to be possible to find out up to where data has + successfully been replicated. Naive solutions to this like updating a row in + a table for every replayed transaction have problems like runtime overhead + bloat. + + + + Using the replication origin infrastructure a session can be + marked as replaying from a remote node (using the + pg_replication_origin_session_setup() + function. Additionally the LSN and commit + timestamp of every source transaction can be configured on a per + transaction basis using + pg_replication_origin_xact-setup(). + If that's done replication progress will be persist in a crash safe + manner. Replay progress for all replication origins can be seen in the + + pg_replication_origin_status + view. A individual origin's progress, e.g. when resuming + replication, can be acquired using + pg_replication_origin_progress() + for any origin or + pg_replication_origin_session_progress() + for the origin configured in the current session. + + + + In more complex replication topologies than replication from exactly one + system to one other, another problem can be that, that it is hard to avoid + replicating replayed rows again. That can lead both to cycles in the + replication and inefficiencies. Replication origins provide a optional + mechanism to recognize and prevent that. When configured using the functions + referenced in the previous paragraph, every change and transaction passed to + output plugin callbacks (see ) + generated by the session is tagged with the replication origin of the + generating session. This allows to treat them differently in the output + plugin, e.g. ignoring all but locally originating rows. Additionally + the + filter_by_origin_cb callback can be used + to filter the logical decoding change stream based on the + source. While less flexible, filtering via that callback is + considerably more efficient. + + -- cgit v1.2.3