aboutsummaryrefslogtreecommitdiff
path: root/doc/src/sgml/logical-replication.sgml
blob: e7526905f8c37459c20d07ed433ce4b904399cc6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
<!-- doc/src/sgml/logical-replication.sgml -->

<chapter id="logical-replication">
 <title>Logical Replication</title>

 <para>
  Logical replication is a method of replicating data objects and their
  changes, based upon their replication identity (usually a primary key).  We
  use the term logical in contrast to physical replication, which uses exact
  block addresses and byte-by-byte replication.  PostgreSQL supports both
  mechanisms concurrently, see <xref linkend="high-availability">.  Logical
  replication allows fine-grained control over both data replication and
  security.
 </para>

 <para>
  Logical replication uses a <firstterm>publish</firstterm>
  and <firstterm>subscribe</firstterm> model with one or
  more <firstterm>subscribers</firstterm> subscribing to one or more
  <firstterm>publications</firstterm> on a <firstterm>publisher</firstterm>
  node.  Subscribers pull data from the publications they subscribe to and may
  subsequently re-publish data to allow cascading replication or more complex
  configurations.
 </para>

 <para>
  Logical replication of a table typically starts with taking a snapshot
  of the data on the publisher database and copying that to the subscriber.
  Once that is done, the changes on the publisher are sent to the subscriber
  as they occur in real-time.  The subscriber applies the data in the same
  order as the publisher so that transactional consistency is guaranteed for
  publications within a single subscription.  This method of data replication
  is sometimes referred to as transactional replication.
 </para>

 <para>
  The typical use-cases for logical replication are:

  <itemizedlist>
   <listitem>
    <para>
     Sending incremental changes in a single database or a subset of a
     database to subscribers as they occur.
    </para>
   </listitem>

   <listitem>
    <para>
     Firing triggers for individual changes as they arrive on the
     subscriber.
    </para>
   </listitem>

   <listitem>
    <para>
     Consolidating multiple databases into a single one (for example for
     analytical purposes).
    </para>
   </listitem>

   <listitem>
    <para>
     Replicating between different major versions of PostgreSQL.
    </para>
   </listitem>

   <listitem>
    <para>
     Giving access to replicated data to different groups of users.
    </para>
   </listitem>

   <listitem>
    <para>
     Sharing a subset of the database between multiple databases.
    </para>
   </listitem>
  </itemizedlist>
 </para>

 <para>
  The subscriber database behaves in the same way as any other PostgreSQL
  instance and can be used as a publisher for other databases by defining its
  own publications.  When the subscriber is treated as read-only by
  application, there will be no conflicts from a single subscription.  On the
  other hand, if there are other writes done either by an application or by other
  subscribers to the same set of tables, conflicts can arise.
 </para>

 <sect1 id="logical-replication-publication">
  <title>Publication</title>

  <para>
   A <firstterm>publication</firstterm> can be defined on any physical
   replication master.  The node where a publication is defined is referred to
   as <firstterm>publisher</firstterm>.  A publication is a set of changes
   generated from a table or a group of tables, and might also be described as
   a change set or replication set.  Each publication exists in only one database.
  </para>

  <para>
   Publications are different from schemas and do not affect how the table is
   accessed.  Each table can be added to multiple publications if needed.
   Publications may currently only contain tables.  Objects must be added
   explicitly, except when a publication is created for <literal>ALL
   TABLES</literal>.
  </para>

  <para>
   Publications can choose to limit the changes they produce to
   any combination of <command>INSERT</command>, <command>UPDATE</command>, and
   <command>DELETE</command>, similar to how triggers are fired by
   particular event types.  If a table without a <literal>REPLICA
   IDENTITY</literal> is added to a publication that
   replicates <command>UPDATE</command> or <command>DELETE</command>
   operations then subsequent <command>UPDATE</command>
   or <command>DELETE</command> operations will fail on the publisher.
  </para>

  <para>
   Every publication can have multiple subscribers.
  </para>

  <para>
   A publication is created using the <xref linkend="sql-createpublication">
   command and may later be altered or dropped using corresponding commands.
  </para>

  <para>
   The individual tables can be added and removed dynamically using
   <xref linkend="sql-alterpublication">.  Both the <literal>ADD
   TABLE</literal> and <literal>DROP TABLE</literal> operations are
   transactional; so the table will start or stop replicating at the correct
   snapshot once the transaction has committed.
  </para>
 </sect1>

 <sect1 id="logical-replication-subscription">
  <title>Subscription</title>

  <para>
   A <firstterm>subscription</firstterm> is the downstream side of logical
   replication.  The node where a subscription is defined is referred to as
   the <firstterm>subscriber</firstterm>.  A subscription defines the connection
   to another database and set of publications (one or more) to which it wants
   to subscribe.
  </para>

  <para>
   The subscriber database behaves in the same way as any other PostgreSQL
   instance and can be used as a publisher for other databases by defining its
   own publications.
  </para>

  <para>
   A subscriber node may have multiple subscriptions if desired.  It is
   possible to define multiple subscriptions between a single
   publisher-subscriber pair, in which case care must be taken to ensure
   that the subscribed publication objects don't overlap.
  </para>

  <para>
   Each subscription will receive changes via one replication slot (see
   <xref linkend="streaming-replication-slots">).  Additional temporary
   replication slots may be required for the initial data synchronization
   of pre-existing table data.
  </para>

  <para>
   Subscriptions are dumped by <command>pg_dump</command> if the current user
   is a superuser.  Otherwise a warning is written and subscriptions are
   skipped, because non-superusers cannot read all subscription information
   from the <structname>pg_subscription</structname> catalog.
  </para>

  <para>
   The subscription is added using <xref linkend="sql-createsubscription"> and
   can be stopped/resumed at any time using the
   <xref linkend="sql-altersubscription"> command and removed using
   <xref linkend="sql-dropsubscription">.
  </para>

  <para>
   When a subscription is dropped and recreated, the synchronization
   information is lost.  This means that the data has to be resynchronized
   afterwards.
  </para>

  <para>
   The schema definitions are not replicated, and the published tables must
   exist on the subscriber.  Only regular tables may be
   the target of replication.  For example, you can't replicate to a view.
  </para>

  <para>
   The tables are matched between the publisher and the subscriber using the
   fully qualified table name.  Replication to differently-named tables on the
   subscriber is not supported.
  </para>

  <para>
   Columns of a table are also matched by name.  A different order of columns
   in the target table is allowed, but the column types have to match.  The
   target table can have additional columns not provided by the published
   table.  Those will be filled with their default values.
  </para>
 </sect1>

 <sect1 id="logical-replication-conflicts">
  <title>Conflicts</title>

  <para>
   Logical replication behaves similarly to normal DML operations in that
   the data will be updated even if it was changed locally on the subscriber
   node.  If incoming data violates any constraints the replication will
   stop.  This is referred to as a <firstterm>conflict</firstterm>.  When
   replicating <command>UPDATE</command> or <command>DELETE</command>
   operations, missing data will not produce a conflict and such operations
   will simply be skipped.
  </para>

  <para>
   A conflict will produce an error and will stop the replication; it must be
   resolved manually by the user.  Details about the conflict can be found in
   the subscriber's server log.
  </para>

  <para>
   The resolution can be done either by changing data on the subscriber so
   that it does not conflict with the incoming change or by skipping the
   transaction that conflicts with the existing data.  The transaction can be
   skipped by calling the <link linkend="pg-replication-origin-advance">
   <function>pg_replication_origin_advance()</function></link> function with
   a <parameter>node_name</parameter> corresponding to the subscription name,
   and a position.  The current position of origins can be seen in the
   <link linkend="view-pg-replication-origin-status">
   <structname>pg_replication_origin_status</structname></link> system view.
  </para>
 </sect1>

 <sect1 id="logical-replication-architecture">
  <title>Architecture</title>

  <para>
   Logical replication starts by copying a snapshot of the data on the
   publisher database.  Once that is done, changes on the publisher are sent
   to the subscriber as they occur in real time.  The subscriber applies data
   in the order in which commits were made on the publisher so that
   transactional consistency is guaranteed for the publications within any
   single subscription.
  </para>

  <para>
   Logical replication is built with an architecture similar to physical
   streaming replication (see <xref linkend="streaming-replication">).  It is
   implemented by <quote>walsender</quote> and <quote>apply</quote>
   processes.  The walsender process starts logical decoding (described
   in <xref linkend="logicaldecoding">) of the WAL and loads the standard
   logical decoding plugin (pgoutput).  The plugin transforms the changes read
   from WAL to the logical replication protocol
   (see <xref linkend="protocol-logical-replication">) and filters the data
   according to the publication specification.  The data is then continuously
   transferred using the streaming replication protocol to the apply worker,
   which maps the data to local tables and applies the individual changes as
   they are received, in correct transactional order.
  </para>

  <para>
   The apply process on the subscriber database always runs with
   <varname>session_replication_role</varname> set
   to <literal>replica</literal>, which produces the usual effects on triggers
   and constraints.
  </para>

  <sect2 id="logical-replication-snapshot">
    <title>Initial Snapshot</title>
    <para>
      The initial data in existing subscribed tables are snapshotted and
      copied in a parallel instance of a special kind of apply process.
      This process will create its own temporary replication slot and
      copy the existing data. Once existing data is copied, the worker
      enters synchronization mode, which ensures that the table is brought
      up to a synchronized state with the main apply process by streaming
      any changes that happened during the initial data copy using standard
      logical replication. Once the synchronization is done, the control
      of the replication of the table is given back to the main apply
      process where the replication continues as normal.
    </para>
  </sect2>
 </sect1>

 <sect1 id="logical-replication-monitoring">
  <title>Monitoring</title>

  <para>
   Because logical replication is based on a similar architecture as
   <link linkend="streaming-replication">physical streaming replication</link>,
   the monitoring on a publication node is similar to monitoring of a
   physical replication master
   (see <xref linkend="streaming-replication-monitoring">).
  </para>

  <para>
   The monitoring information about subscription is visible in
   <link linkend="pg-stat-subscription"><literal>pg_stat_subscription</literal></link>.
   This view contains one row for every subscription worker.  A subscription
   can have zero or more active subscription workers depending on its state.
  </para>

  <para>
   Normally, there is a single apply process running for an enabled
   subscription.  A disabled subscription or a crashed subscription will have
   zero rows in this view.  If the initial data synchronization of any
   table is in progress, there will be additional workers for the tables
   being synchronized.
  </para>
 </sect1>

 <sect1 id="logical-replication-security">
  <title>Security</title>

  <para>
   The role used for the replication connection must have
   the <literal>REPLICATION</literal> attribute.  Access for the role must be
   configured in <filename>pg_hba.conf</filename>.
  </para>

  <para>
   To create a publication, the user must have the <literal>CREATE</literal>
   privilege in the database.
  </para>

  <para>
   To add tables to a publication, the user must have ownership rights on the
   table.  To create a publication that publishes all tables automatically,
   the user must be a superuser.
  </para>

  <para>
   To create a subscription, the user must be a superuser.
  </para>

  <para>
   The subscription apply process will run in the local database with the
   privileges of a superuser.
  </para>

  <para>
   Privileges are only checked once at the start of a replication connection.
   They are not re-checked as each change record is read from the publisher,
   nor are they re-checked for each change when applied.
  </para>
 </sect1>

 <sect1 id="logical-replication-config">
  <title>Configuration Settings</title>

  <para>
   Logical replication requires several configuration options to be set.
  </para>

  <para>
   On the publisher side, <varname>wal_level</varname> must be set to
   <literal>logical</literal>, and <varname>max_replication_slots</varname>
   must be set to at least the number of subscriptions expected to connect,
   plus some reserve for table synchronization.  And
   <varname>max_wal_senders</varname> should be set to at least the same as
   <varname>max_replication_slots</varname> plus the number of physical
   replicas that are connected at the same time.
  </para>

  <para>
   The subscriber also requires the <varname>max_replication_slots</varname>
   to be set.  In this case it should be set to at least the number of
   subscriptions that will be added to the subscriber.
   <varname>max_logical_replication_workers</varname> must be set to at
   least the number of subscriptions, again plus some reserve for the table
   synchronization.  Additionally the <varname>max_worker_processes</varname>
   may need to be adjusted to accommodate for replication workers, at least
   (<varname>max_logical_replication_workers</varname>
   + <literal>1</literal>).  Note that some extensions and parallel queries
   also take worker slots from <varname>max_worker_processes</varname>.
  </para>
 </sect1>

 <sect1 id="logical-replication-quick-setup">
  <title>Quick Setup</title>

  <para>
   First set the configuration options in <filename>postgresql.conf</filename>:
<programlisting>
wal_level = logical
</programlisting>
   The other required settings have default values that are sufficient for a
   basic setup.
  </para>

  <para>
   <filename>pg_hba.conf</filename> needs to be adjusted to allow replication
   (the values here depend on your actual network configuration and user you
   want to use for connecting):
<programlisting>
host     all     repuser     0.0.0.0/0     md5
</programlisting>
  </para>

  <para>
   Then on the publisher database:
<programlisting>
CREATE PUBLICATION mypub FOR TABLE users, departments;
</programlisting>
  </para>

  <para>
   And on the subscriber database:
<programlisting>
CREATE SUBSCRIPTION mysub CONNECTION 'dbname=foo host=bar user=repuser' PUBLICATION mypub;
</programlisting>
  </para>

  <para>
   The above will start the replication process, which synchronizes the
   initial table contents of the tables <literal>users</literal> and
   <literal>departments</literal> and then starts replicating
   incremental changes to those tables.
  </para>
 </sect1>
</chapter>