diff options
Diffstat (limited to 'doc/src/sgml/replication-origins.sgml')
-rw-r--r-- | doc/src/sgml/replication-origins.sgml | 93 |
1 files changed, 93 insertions, 0 deletions
diff --git a/doc/src/sgml/replication-origins.sgml b/doc/src/sgml/replication-origins.sgml new file mode 100644 index 00000000000..c5310229119 --- /dev/null +++ b/doc/src/sgml/replication-origins.sgml @@ -0,0 +1,93 @@ +<!-- doc/src/sgml/replication-origins.sgml --> +<chapter id="replication-origins"> + <title>Replication Progress Tracking</title> + <indexterm zone="replication-origins"> + <primary>Replication Progress Tracking</primary> + </indexterm> + <indexterm zone="replication-origins"> + <primary>Replication Origins</primary> + </indexterm> + + <para> + Replication origins are intended to make it easier to implement + logical replication solutions on top + of <xref linkend="logicaldecoding">. They provide a solution to two + common problems: + <itemizedlist> + <listitem><para>How to safely keep track of replication progress</para></listitem> + <listitem><para>How to change replication behavior, based on the + origin of a row; e.g. to avoid loops in bi-directional replication + setups</para></listitem> + </itemizedlist> + </para> + + <para> + Replication origins consist out of a name and a oid. The name, which + is what should be used to refer to the origin across systems, is + free-form text. It should be used in a way that makes conflicts + between replication origins created by different replication + solutions unlikely; e.g. by prefixing the replication solution's + name to it. The oid is used only to avoid having to store the long + version in situations where space efficiency is important. It should + never be shared between systems. + </para> + + <para> + Replication origins can be created using the + <link linkend="pg-replication-origin-create"><function>pg_replication_origin_create()</function></link>; + dropped using + <link linkend="pg-replication-origin-drop"><function>pg_replication_origin_drop()</function></link>; + and seen in the + <link linkend="catalog-pg-replication-origin"><structname>pg_replication_origin</structname></link> + catalog. + </para> + + <para> + When replicating from one system to another (independent of the fact that + those two might be in the same cluster, or even same database) one + nontrivial part of building a replication solution is to keep track of + replay progress in a safe manner. When the applying process, or the whole + cluster, dies, it needs to be possible to find out up to where data has + successfully been replicated. Naive solutions to this like updating a row in + a table for every replayed transaction have problems like runtime overhead + bloat. + </para> + + <para> + Using the replication origin infrastructure a session can be + marked as replaying from a remote node (using the + <link linkend="pg-replication-origin-session-setup"><function>pg_replication_origin_session_setup()</function></link> + function. Additionally the <acronym>LSN</acronym> and commit + timestamp of every source transaction can be configured on a per + transaction basis using + <link linkend="pg-replication-origin-xact-setup"><function>pg_replication_origin_xact-setup()</function></link>. + If that's done replication progress will be persist in a crash safe + manner. Replay progress for all replication origins can be seen in the + <link linkend="catalog-pg-replication-origin-status"> + <structname>pg_replication_origin_status</structname> + </link> view. A individual origin's progress, e.g. when resuming + replication, can be acquired using + <link linkend="pg-replication-origin-progress"><function>pg_replication_origin_progress()</function></link> + for any origin or + <link linkend="pg-replication-origin-session-progress"><function>pg_replication_origin_session_progress()</function></link> + for the origin configured in the current session. + </para> + + <para> + In more complex replication topologies than replication from exactly one + system to one other, another problem can be that, that it is hard to avoid + replicating replayed rows again. That can lead both to cycles in the + replication and inefficiencies. Replication origins provide a optional + mechanism to recognize and prevent that. When configured using the functions + referenced in the previous paragraph, every change and transaction passed to + output plugin callbacks (see <xref linkend="logicaldecoding-output-plugin">) + generated by the session is tagged with the replication origin of the + generating session. This allows to treat them differently in the output + plugin, e.g. ignoring all but locally originating rows. Additionally + the <link linkend="logicaldecoding-output-plugin-filter-by-origin"> + <function>filter_by_origin_cb</function></link> callback can be used + to filter the logical decoding change stream based on the + source. While less flexible, filtering via that callback is + considerably more efficient. + </para> +</chapter> |