aboutsummaryrefslogtreecommitdiff
path: root/src/backend/replication/logical/worker.c
Commit message (Collapse)AuthorAge
...
* Don't try to print data type names in slot_store_error_callback().Tom Lane2021-07-02
| | | | | | | | | | | | | | | | | | | | | | The existing code tried to do syscache lookups in an already-failed transaction, which is problematic to say the least. After some consideration of alternatives, the best fix seems to be to just drop type names from the error message altogether. The table and column names seem like sufficient localization. If the user is unsure what types are involved, she can check the local and remote table definitions. Having done that, we can also discard the LogicalRepTypMap hash table, which had no other use. Arguably, LOGICAL_REP_MSG_TYPE replication messages are now obsolete as well; but we should probably keep them in case some other use emerges. (The complexity of removing something from the replication protocol would likely outweigh any savings anyhow.) Masahiko Sawada and Bharath Rupireddy, per complaint from Andres Freund. Back-patch to v10 where this code originated. Discussion: https://postgr.es/m/20210106020229.ne5xnuu6wlondjpe@alap3.anarazel.de
* Improve SQLSTATE reporting in some replication-related code.Tom Lane2021-06-16
| | | | | | | | | | | | | | | | I started out with the goal of reporting ERRCODE_CONNECTION_FAILURE when walrcv_connect() fails, but as I looked around I realized that whoever wrote this code was of the opinion that errcodes are purely optional. That's not my understanding of our project policy. Hence, make sure that an errcode is provided in each ereport that (a) is ERROR or higher level and (b) isn't arguably an internal logic error. Also fix some very dubious existing errcode assignments. While this is not per policy, it's also largely cosmetic, since few of these cases could get reported to applications. So I don't feel a need to back-patch. Discussion: https://postgr.es/m/2189704.1623512522@sss.pgh.pa.us
* Don't use Asserts to check for violations of replication protocol.Tom Lane2021-06-12
| | | | | | | | | | | | | | | | | | | | | | | | | Using an Assert to check the validity of incoming messages is an extremely poor decision. In a debug build, it should not be that easy for a broken or malicious remote client to crash the logrep worker. The consequences could be even worse in non-debug builds, which will fail to make such checks at all, leading to who-knows-what misbehavior. Hence, promote every Assert that could possibly be triggered by wrong or out-of-order replication messages to a full test-and-ereport. To avoid bloating the set of messages the translation team has to cope with, establish a policy that replication protocol violation error reports don't need to be translated. Hence, all the new messages here use errmsg_internal(). A couple of old messages are changed likewise for consistency. Along the way, fix some non-idiomatic or outright wrong uses of hash_search(). Most of these mistakes are new with the "streaming replication" patch (commit 464824323), but a couple go back a long way. Back-patch as appropriate. Discussion: https://postgr.es/m/1719083.1623351052@sss.pgh.pa.us
* Fix multiple crasher bugs in partitioned-table replication logic.Tom Lane2021-06-11
| | | | | | | | | | | | | | | | | | | | | | | | apply_handle_tuple_routing(), having detected and reported that the tuple it needed to update didn't exist, tried to update that tuple anyway, leading to a null-pointer dereference. logicalrep_partition_open() failed to ensure that the LogicalRepPartMapEntry it built for a partition was fully independent of that for the partition root, leading to trouble if the root entry was later freed or rebuilt. Meanwhile, on the publisher's side, pgoutput_change() sometimes attempted to apply execute_attr_map_tuple() to a NULL tuple. The first of these was reported by Sergey Bernikov in bug #17055; I found the other two while developing some test cases for this sadly under-tested code. Diagnosis and patch for the first issue by Amit Langote; patches for the others by me; new test cases by me. Back-patch to v13 where this logic came in. Discussion: https://postgr.es/m/17055-9ba800ec8522668b@postgresql.org
* Rearrange logrep worker's snapshot handling some more.Tom Lane2021-06-10
| | | | | | | | | | | | | | | | | | It turns out that worker.c's code path for TRUNCATE was also careless about establishing a snapshot while executing user-defined code, allowing the checks added by commit 84f5c2908 to fail when a trigger is fired in that context. We could just wrap Push/PopActiveSnapshot around the truncate call, but it seems better to establish a policy of holding a snapshot throughout execution of a replication step. To help with that and possible future requirements, replace the previous ensure_transaction calls with pairs of begin/end_replication_step calls. Per report from Mark Dilger. Back-patch to v11, like the previous changes. Discussion: https://postgr.es/m/B4A3AF82-79ED-4F4C-A4E5-CD2622098972@enterprisedb.com
* Fix access to no-longer-open relcache entry in logical-rep worker.Tom Lane2021-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we redirected a replicated tuple operation into a partition child table, and then tried to fire AFTER triggers for that event, the relation cache entry for the child table was already closed. This has no visible ill effects as long as the entry is still there and still valid, but an unluckily-timed cache flush could result in a crash or other misbehavior. To fix, postpone the ExecCleanupTupleRouting call (which is what closes the child table) until after we've fired triggers. This requires a bit of refactoring so that the cleanup function can have access to the necessary state. In HEAD, I took the opportunity to simplify some of worker.c's function APIs based on use of the new ApplyExecutionData struct. However, it doesn't seem safe/practical to back-patch that aspect, at least not without a lot of analysis of possible interactions with a04daa97a. In passing, add an Assert to afterTriggerInvokeEvents to catch such cases. This seems worthwhile because we've grown a number of fairly unstructured ways of calling AfterTriggerEndQuery. Back-patch to v13, where worker.c grew the ability to deal with partitioned target tables. Discussion: https://postgr.es/m/3382681.1621381328@sss.pgh.pa.us
* Restore the portal-level snapshot after procedure COMMIT/ROLLBACK.Tom Lane2021-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | COMMIT/ROLLBACK necessarily destroys all snapshots within the session. The original implementation of intra-procedure transactions just cavalierly did that, ignoring the fact that this left us executing in a rather different environment than normal. In particular, it turns out that handling of toasted datums depends rather critically on there being an outer ActiveSnapshot: otherwise, when SPI or the core executor pop whatever snapshot they used and return, it's unsafe to dereference any toasted datums that may appear in the query result. It's possible to demonstrate "no known snapshots" and "missing chunk number N for toast value" errors as a result of this oversight. Historically this outer snapshot has been held by the Portal code, and that seems like a good plan to preserve. So add infrastructure to pquery.c to allow re-establishing the Portal-owned snapshot if it's not there anymore, and add enough bookkeeping support that we can tell whether it is or not. We can't, however, just re-establish the Portal snapshot as part of COMMIT/ROLLBACK. As in normal transaction start, acquiring the first snapshot should wait until after SET and LOCK commands. Hence, teach spi.c about doing this at the right time. (Note that this patch doesn't fix the problem for any PLs that try to run intra-procedure transactions without using SPI to execute SQL commands.) This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD, rename that to allow_nonatomic. replication/logical/worker.c also needs some fixes, because it wasn't careful to hold a snapshot open around AFTER trigger execution. That code doesn't use a Portal, which I suspect someday we're gonna have to fix. But for now, just rearrange the order of operations. This includes back-patching the recent addition of finish_estate() to centralize the cleanup logic there. This also back-patches commit 2ecfeda3e into v13, to improve the test coverage for worker.c (it was that test that exposed that worker.c's snapshot management is wrong). Per bug #15990 from Andreas Wicht. Back-patch to v11 where intra-procedure COMMIT was added. Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org
* Fix deadlock for multiple replicating truncates of the same table.Amit Kapila2021-05-21
| | | | | | | | | | | | | | | | | | While applying the truncate change, the logical apply worker acquires RowExclusiveLock on the relation being truncated. This allowed truncate on the relation at a time by two apply workers which lead to a deadlock. The reason was that one of the workers after updating the pg_class tuple tries to acquire SHARE lock on the relation and started to wait for the second worker which has acquired RowExclusiveLock on the relation. And when the second worker tries to update the pg_class tuple, it starts to wait for the first worker which leads to a deadlock. Fix it by acquiring AccessExclusiveLock on the relation before applying the truncate change as we do for normal truncate operation. Author: Peter Smith, test case by Haiying Tang Reviewed-by: Dilip Kumar, Amit Kapila Backpatch-through: 11 Discussion: https://postgr.es/m/CAHut+PsNm43p0jM+idTvWwiGZPcP0hGrHMPK9TOAkc+a4UpUqw@mail.gmail.com
* Rename the logical replication global "wrconn"Alvaro Herrera2021-05-12
| | | | | | | | | | | | | | The worker.c global wrconn is only meant to be used by logical apply/ tablesync workers, but there are other variables with the same name. To reduce future confusion rename the global from "wrconn" to "LogRepWorkerWalRcvConn". While this is just cosmetic, it seems better to backpatch it all the way back to 10 where this code appeared, to avoid future backpatching issues. Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/CAHut+Pu7Jv9L2BOEx_Z0UtJxfDevQSAUW2mJqWU+CtmDrEZVAg@mail.gmail.com
* Don't pass "ONLY" options specified in TRUNCATE to foreign data wrapper.Fujii Masao2021-04-27
| | | | | | | | | | | | | | | | | | | | | | Commit 8ff1c94649 allowed TRUNCATE command to truncate foreign tables. Previously the information about "ONLY" options specified in TRUNCATE command were passed to the foreign data wrapper. Then postgres_fdw constructed the TRUNCATE command to issue the remote server and included "ONLY" options in it based on the passed information. On the other hand, "ONLY" options specified in SELECT, UPDATE or DELETE have no effect when accessing or modifying the remote table, i.e., are not passed to the foreign data wrapper. So it's inconsistent to make only TRUNCATE command pass the "ONLY" options to the foreign data wrapper. Therefore this commit changes the TRUNCATE command so that it doesn't pass the "ONLY" options to the foreign data wrapper, for the consistency with other statements. Also this commit changes postgres_fdw so that it always doesn't include "ONLY" options in the TRUNCATE command that it constructs. Author: Fujii Masao Reviewed-by: Bharath Rupireddy, Kyotaro Horiguchi, Justin Pryzby, Zhihong Yu Discussion: https://postgr.es/m/551ed8c1-f531-818b-664a-2cecdab99cd8@oss.nttdata.com
* Fix relation leak for subscribers firing triggers in logical replicationMichael Paquier2021-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | Creating a trigger on a relation to which an apply operation is triggered would cause a relation leak once the change gets committed, as the executor would miss that the relation needs to be closed beforehand. This issue got introduced with the refactoring done in 1375422c, where it becomes necessary to track relations within es_opened_result_relations to make sure that they are closed. We have discussed using ExecInitResultRelation() coupled with ExecCloseResultRelations() for the relations in need of tracking by the apply operations in the subscribers, which would simplify greatly the opening and closing of indexes, but this requires a larger rework and reorganization of the worker code, particularly for the tuple routing part. And that's not really welcome post feature freeze. So, for now, settle down to the same solution as TRUNCATE which is to fill in es_opened_result_relations with the relation opened, to make sure that ExecGetTriggerResultRel() finds them and that they get closed. The code is lightly refactored so as a relation is not registered three times for each DML code path, making the whole a bit easier to follow. Reported-by: Tang Haiying, Shi Yu, Hou Zhijie Author: Amit Langote, Masahiko Sawada, Hou Zhijie Reviewed-by: Amit Kapila, Michael Paquier Discussion: https://postgr.es/m/OS0PR01MB611383FA0FE92EB9DE21946AFB769@OS0PR01MB6113.jpnprd01.prod.outlook.com
* Allow TRUNCATE command to truncate foreign tables.Fujii Masao2021-04-08
| | | | | | | | | | | | | | | | | | | | | | | This commit introduces new foreign data wrapper API for TRUNCATE. It extends TRUNCATE command so that it accepts foreign tables as the targets to truncate and invokes that API. Also it extends postgres_fdw so that it can issue TRUNCATE command to foreign servers, by adding new routine for that TRUNCATE API. The information about options specified in TRUNCATE command, e.g., ONLY, CACADE, etc is passed to FDW via API. The list of foreign tables to truncate is also passed to FDW. FDW truncates the foreign data sources that the passed foreign tables specify, based on those information. For example, postgres_fdw constructs TRUNCATE command using them and issues it to the foreign server. For performance, TRUNCATE command invokes the FDW routine for TRUNCATE once per foreign server that foreign tables to truncate belong to. Author: Kazutaka Onishi, Kohei KaiGai, slightly modified by Fujii Masao Reviewed-by: Bharath Rupireddy, Michael Paquier, Zhihong Yu, Alvaro Herrera, Stephen Frost, Ashutosh Bapat, Amit Langote, Daniel Gustafsson, Ibrar Ahmed, Fujii Masao Discussion: https://postgr.es/m/CAOP8fzb_gkReLput7OvOK+8NHgw-RKqNv59vem7=524krQTcWA@mail.gmail.com Discussion: https://postgr.es/m/CAJuF6cMWDDqU-vn_knZgma+2GMaout68YUgn1uyDnexRhqqM5Q@mail.gmail.com
* Postpone some stuff out of ExecInitModifyTable.Tom Lane2021-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arrange to do some things on-demand, rather than immediately during executor startup, because there's a fair chance of never having to do them at all: * Don't open result relations' indexes until needed. * Don't initialize partition tuple routing, nor the child-to-root tuple conversion map, until needed. This wins in UPDATEs on partitioned tables when only some of the partitions will actually receive updates; with larger partition counts the savings is quite noticeable. Also, we can remove some sketchy heuristics in ExecInitModifyTable about whether to set up tuple routing. Also, remove execPartition.c's private hash table tracking which partitions were already opened by the ModifyTable node. Instead use the hash added to ModifyTable itself by commit 86dc90056. To allow lazy computation of the conversion maps, we now set ri_RootResultRelInfo in all child ResultRelInfos. We formerly set it only in some, not terribly well-defined, cases. This has user-visible side effects in that now more error messages refer to the root relation instead of some partition (and provide error data in the root's column order, too). It looks to me like this is a strict improvement in consistency, so I don't have a problem with the output changes visible in this commit. Extracted from a larger patch, which seemed to me to be too messy to push in one commit. Amit Langote, reviewed at different times by Heikki Linnakangas and myself Discussion: https://postgr.es/m/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com
* Allow pgoutput to send logical decoding messages.Amit Kapila2021-04-06
| | | | | | | | | | | | | | The output plugin accepts a new parameter (messages) that controls if logical decoding messages are written into the replication stream. It is useful for those clients that use pgoutput as an output plugin and needs to process messages that were written by pg_logical_emit_message(). Although logical streaming replication protocol supports logical decoding messages now, logical replication does not use this feature yet. Author: David Pirotte, Euler Taveira Reviewed-by: Euler Taveira, Andres Freund, Ashutosh Bapat, Amit Kapila Discussion: https://postgr.es/m/CADK3HHJ-+9SO7KuRLH=9Wa1rAo60Yreq1GFNkH_kd0=CdaWM+A@mail.gmail.com
* Fix dangling pointer reference in stream_cleanup_files.Amit Kapila2021-03-23
| | | | | | | We can't access the entry after it is removed from dynahash. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Ps-pL++f6CJwPx2+vUqXuew=Xt-9Bi-6kCyxn+Fwi2M7w@mail.gmail.com
* Small debug message tweakPeter Eisentraut2021-03-10
| | | | This makes the wording of the delete case match the update case.
* Simplify printing of LSNsPeter Eisentraut2021-02-23
| | | | | | | | | | Add a macro LSN_FORMAT_ARGS for use in printf-style printing of LSNs. Convert all applicable code to use it. Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ub5NaTELZ3hJUCE6amuvqAtsSxc7O+uK7y4t9Rrk23cw@mail.gmail.com
* Allow multiple xacts during table sync in logical replication.Amit Kapila2021-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the initial table data synchronization in logical replication, we use a single transaction to copy the entire table and then synchronize the position in the stream with the main apply worker. There are multiple downsides of this approach: (a) We have to perform the entire copy operation again if there is any error (network breakdown, error in the database operation, etc.) while we synchronize the WAL position between tablesync worker and apply worker; this will be onerous especially for large copies, (b) Using a single transaction in the synchronization-phase (where we can receive WAL from multiple transactions) will have the risk of exceeding the CID limit, (c) The slot will hold the WAL till the entire sync is complete because we never commit till the end. This patch solves all the above downsides by allowing multiple transactions during the tablesync phase. The initial copy is done in a single transaction and after that, we commit each transaction as we receive. To allow recovery after any error or crash, we use a permanent slot and origin to track the progress. The slot and origin will be removed once we finish the synchronization of the table. We also remove slot and origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not finished. The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true cannot be executed inside a transaction block because they can now drop the slots for which we have no provision to rollback. This will also open up the path for logical replication of 2PC transactions on the subscriber side. Previously, we can't do that because of the requirement of maintaining a single transaction in tablesync workers. Bump catalog version due to change of state in the catalog (pg_subscription_rel). Author: Peter Smith, Amit Kapila, and Takamichi Osumi Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
* pgindent worker.c.Amit Kapila2021-01-19
| | | | | | | | | This is a leftover from commit 0926e96c49. Changing this separately because this file is being modified for upcoming patch logical replication of 2PC. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Ps+EgG8KzcmAyAgBUi_vuTps6o9ZA8DG6SdnO0-YuOhPQ@mail.gmail.com
* Pass down "logically unchanged index" hint.Peter Geoghegan2021-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an executor aminsert() hint mechanism that informs index AMs that the incoming index tuple (the tuple that accompanies the hint) is not being inserted by execution of an SQL statement that logically modifies any of the index's key columns. The hint is received by indexes when an UPDATE takes place that does not apply an optimization like heapam's HOT (though only for indexes where all key columns are logically unchanged). Any index tuple that receives the hint on insert is expected to be a duplicate of at least one existing older version that is needed for the same logical row. Related versions will typically be stored on the same index page, at least within index AMs that apply the hint. Recognizing the difference between MVCC version churn duplicates and true logical row duplicates at the index AM level can help with cleanup of garbage index tuples. Cleanup can intelligently target tuples that are likely to be garbage, without wasting too many cycles on less promising tuples/pages (index pages with little or no version churn). This is infrastructure for an upcoming commit that will teach nbtree to perform bottom-up index deletion. No index AM actually applies the hint just yet. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Victor Yegorov <vyegorov@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=CEKFa74EScx_hFVshCOn6AA5T-ajFASTdzipdkLTNQQ@mail.gmail.com
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Use HASH_BLOBS for xidhash.Noah Misch2020-12-12
| | | | | | | | This caused BufFile errors on buildfarm member sungazer, and SIGSEGV was possible. Conditions for reaching those symptoms were more frequent on big-endian systems. Discussion: https://postgr.es/m/20201129214441.GA691200@rfd.leadboat.com
* Correct behavior descriptions in comments, and correct a test name.Noah Misch2020-12-12
|
* Fix replication of in-progress transactions in tablesync worker.Amit Kapila2020-11-27
| | | | | | | | | | | | | | | | | Tablesync worker runs under a single transaction but in streaming mode, we were committing the transaction on stream_stop, stream_abort, and stream_commit. We need to avoid committing the transaction in a streaming mode in tablesync worker. In passing move the call to process_syncing_tables in apply_handle_stream_commit after clean up of stream files. This will allow clean up of files to happen before the exit of tablesync worker which would otherwise be handled by one of the proc exit routines. Author: Dilip Kumar Reviewed-by: Amit Kapila and Peter Smith Tested-by: Peter Smith Discussion: https://postgr.es/m/CAHut+Pt4PyKQCwqzQ=EFF=bpKKJD7XKt_S23F6L20ayQNxg77A@mail.gmail.com
* Use Enums for logical replication message types at more places.Amit Kapila2020-11-26
| | | | | | | | | Commit 644f0d7cc9 added logical replication message type enums to use instead of character literals but some char substitutions were overlooked. Author: Peter Smith Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAHut+PsTG=Vrv8hgrvOnAvCNR21jhqMdPk2n0a1uJPoW0p+UfQ@mail.gmail.com
* Use Enum for top level logical replication message types.Amit Kapila2020-11-02
| | | | | | | | | | | | | | | | | | Logical replication protocol uses a single byte character to identify a message type in logical replication protocol. The code uses string literals for the same. Use Enum so that 1. All the string literals used can be found at a single place. This makes it easy to add more types without the risk of conflicts. 2. It's easy to locate the code handling a given message type. 3. When used with switch statements, it is easy to identify the missing cases using -Wswitch. Author: Ashutosh Bapat Reviewed-by: Kyotaro Horiguchi, Andres Freund, Peter Smith and Amit Kapila Discussion: https://postgr.es/m/CAExHW5uPzQ7L0oAd_ENyvaiYMOPgkrAoJpE+ZY5-obdcVT6NPg@mail.gmail.com
* Calculate extraUpdatedCols in query rewriter, not parser.Tom Lane2020-10-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's unsafe to do this at parse time because addition of generated columns to a table would not invalidate stored rules containing UPDATEs on the table ... but there might now be dependent generated columns that were not there when the rule was made. This also fixes an oversight that rewriteTargetView failed to update extraUpdatedCols when transforming an UPDATE on an updatable view. (Since the new calculation is downstream of that, rewriteTargetView doesn't actually need to do anything; but before, there was a demonstrable bug there.) In v13 and HEAD, this leads to easily-visible bugs because (since commit c6679e4fc) we won't recalculate generated columns that aren't listed in extraUpdatedCols. In v12 this bitmap is mostly just used for trigger-firing decisions, so you'd only notice a problem if a trigger cared whether a generated column had been updated. I'd complained about this back in May, but then forgot about it until bug #16671 from Michael Paul Killian revived the issue. Back-patch to v12 where this field was introduced. If existing stored rules contain any extraUpdatedCols values, they'll be ignored because the rewriter will overwrite them, so the bug will be fixed even for existing rules. (But note that if someone were to update to 13.1 or 12.5, store some rules with UPDATEs on tables having generated columns, and then downgrade to a prior minor version, they might observe issues similar to what this patch fixes. That seems unlikely enough to not be worth going to a lot of effort to fix.) Discussion: https://postgr.es/m/10206.1588964727@sss.pgh.pa.us Discussion: https://postgr.es/m/16671-2fa55851859fb166@postgresql.org
* Remove PartitionRoutingInfo struct.Heikki Linnakangas2020-10-19
| | | | | | | | | | The extra indirection neeeded to access its members via its enclosing ResultRelInfo seems pointless. Move all the fields from PartitionRoutingInfo to ResultRelInfo. Author: Amit Langote Reviewed-by: Alvaro Herrera Discussion: https://www.postgresql.org/message-id/CA%2BHiwqFViT47Zbr_ASBejiK7iDG8%3DQ1swQ-tjM6caRPQ67pT%3Dw%40mail.gmail.com
* Review logical replication tablesync codeAlvaro Herrera2020-10-15
| | | | | | | | | | | | | | | | | | | | | | | | Most importantly, remove optimization in LogicalRepSyncTableStart that skips the normal walrcv_startstreaming/endstreaming dance. The optimization is not critically important for production uses anyway, since it only fires in cases with no activity, and saves an uninteresting amount of work even then. Critically, it obscures bugs by hiding the interesting code path from test cases. Also: in GetSubscriptionRelState, remove pointless relation open; access pg_subscription_rel->srsubstate with GETSTRUCT as is typical rather than SysCacheGetAttr; remove unused 'missing_ok' argument. In wait_for_relation_state_change, use explicit catalog snapshot invalidation rather than obscurely (and expensively) through GetLatestSnapshot. In various places: sprinkle comments more liberally and rewrite a number of them. Other cosmetic code improvements. No backpatch, since no bug is being fixed here. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Petr Jelínek <petr.jelinek@2ndquadrant.com> Discussion: https://postgr.es/m/20201010190637.GA5774@alvherre.pgsql
* Restore replication protocol's duplicate command tagsAlvaro Herrera2020-10-14
| | | | | | | | | | | | | | | | | | | | I removed the duplicate command tags for START_REPLICATION inadvertently in commit 07082b08cc5d, but the replication protocol requires them. The fact that the replication protocol was broken was not noticed because all our test cases use an optimized code path that exits early, failing to verify that the behavior is correct for non-optimized cases. Put them back. Also document this protocol quirk. Add a test case that shows the failure. It might still succeed even without the patch when run on a fast enough server, but it suffices to show the bug in enough cases that it would be noticed in buildfarm. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Henry Hinze <henry.hinze@gmail.com> Reviewed-by: Petr Jelínek <petr.jelinek@2ndquadrant.com> Discussion: https://postgr.es/m/16643-eaadeb2a1a58d28c@postgresql.org
* Remove es_result_relation_info from EState.Heikki Linnakangas2020-10-14
| | | | | | | | | | | | | | Maintaining 'es_result_relation_info' correctly at all times has become cumbersome, especially with partitioning where each partition gets its own result relation info. Having to set and reset it across arbitrary operations has caused bugs in the past. This changes all the places that used 'es_result_relation_info', to receive the currently active ResultRelInfo via function parameters instead. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
* Create ResultRelInfos later in InitPlan, index them by RT index.Heikki Linnakangas2020-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of allocating all the ResultRelInfos upfront in one big array, allocate them in ExecInitModifyTable(). es_result_relations is now an array of ResultRelInfo pointers, rather than an array of structs, and it is indexed by the RT index. This simplifies things: we get rid of the separate concept of a "result rel index", and don't need to set it in setrefs.c anymore. This also allows follow-up optimizations (not included in this commit yet) to skip initializing ResultRelInfos for target relations that were not needed at runtime, and removal of the es_result_relation_info pointer. The EState arrays of regular result rels and root result rels are merged into one array. Similarly, the resultRelations and rootResultRelations lists in PlannedStmt are merged into one. It's not actually clear to me why they were kept separate in the first place, but now that the es_result_relations array is indexed by RT index, it certainly seems pointless. The PlannedStmt->resultRelations list is now only needed for ExecRelationIsTargetRelation(). One visible effect of this change is that ExecRelationIsTargetRelation() will now return 'true' also for the partition root, if a partitioned table is updated. That seems like a good thing, although the function isn't used in core code, and I don't see any reason for an FDW to call it on a partition root. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
* Fix the logical replication from HEAD to lower versions.Amit Kapila2020-09-26
| | | | | | | | | | | | Commit 464824323e changed the logical replication protocol to allow the streaming of in-progress transactions and used the new version of protocol irrespective of the server version. Use the appropriate version of the protocol based on the server version. Reported-by: Ashutosh Sharma Author: Dilip Kumar Reviewed-by: Ashutosh Sharma and Amit Kapila Discussion: https://postgr.es/m/CAE9k0P=9OpXcNrcU5Gsvd5MZ8GFpiN833vNHzX6Uc=8+h1ft1Q@mail.gmail.com
* Don't fetch partition check expression during InitResultRelInfo.Tom Lane2020-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since there is only one place that actually needs the partition check expression, namely ExecPartitionCheck, it's better to fetch it from the relcache there. In this way we will never fetch it at all if the query never has use for it, and we still fetch it just once when we do need it. The reason for taking an interest in this is that if the relcache doesn't already have the check expression cached, fetching it requires obtaining AccessShareLock on the partition root. That means that operations that look like they should only touch the partition itself will also take a lock on the root. In particular we observed that TRUNCATE on a partition may take a lock on the partition's root, contributing to a deadlock situation in parallel pg_restore. As written, this patch does have a small cost, which is that we are microscopically reducing efficiency for the case where a partition has an empty check expression. ExecPartitionCheck will be called, and will go through the motions of setting up and checking an empty qual, where before it would not have been called at all. We could avoid that by adding a separate boolean flag to track whether there is a partition expression to test. However, this case only arises for a default partition with no siblings, which surely is not an interesting case in practice. Hence adding complexity for it does not seem like a good trade-off. Amit Langote, per a suggestion by me Discussion: https://postgr.es/m/VI1PR03MB31670CA1BD9625C3A8C5DD05EB230@VI1PR03MB3167.eurprd03.prod.outlook.com
* Fix over-eager ping'ing in logical replication receiver.Tom Lane2020-09-04
| | | | | | | | | | | Commit 3f60f690f only partially fixed the broken-status-tracking issue in LogicalRepApplyLoop: we need ping_sent to have the same lifetime as last_recv_timestamp. The effects are much less serious than what that commit fixed, though. AFAICS this would just lead to extra ping requests being sent, once per second until the sender responds. Still, it's a bug, so backpatch to v10 as before. Discussion: https://postgr.es/m/959627.1599248476@sss.pgh.pa.us
* Fix inline marking introduced in commit 464824323e.Amit Kapila2020-09-04
| | | | | | | | Forgot to add inline marking in changes_filename() declaration. In the passing, add inline marking for a similar function subxact_filename(). Reported-By: Nathan Bossart Discussion: https://postgr.es/m/E98FBE8F-B878-480D-A728-A60C6EED3047@amazon.com
* Add support for streaming to built-in logical replication.Amit Kapila2020-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | To add support for streaming of in-progress transactions into the built-in logical replication, we need to do three things: * Extend the logical replication protocol, so identify in-progress transactions, and allow adding additional bits of information (e.g. XID of subtransactions). * Modify the output plugin (pgoutput) to implement the new stream API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle streamed in-progress transaction by spilling the data to disk and then replaying them on commit. We however must explicitly disable streaming replication during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover we don't have a replication connection open so we don't have where to send the data anyway. Author: Tomas Vondra, Dilip Kumar and Amit Kapila Reviewed-by: Amit Kapila, Kuntal Ghosh and Ajin Cherian Tested-by: Neha Sharma, Mahendra Singh Thalor and Ajin Cherian Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
* Empty search_path in logical replication apply worker and walsender.Noah Misch2020-08-10
| | | | | | | | | | | | | | This is like CVE-2018-1058 commit 582edc369cdbd348d68441fc50fa26a84afd0c1a. Today, a malicious user of a publisher or subscriber database can invoke arbitrary SQL functions under an identity running replication, often a superuser. This fix may cause "does not exist" or "no schema has been selected to create in" errors in a replication process. After upgrading, consider watching server logs for these errors. Objects accruing schema qualification in the wake of the earlier commit are unlikely to need further correction. Back-patch to v10, which introduced logical replication. Security: CVE-2020-14349
* Fix construction of updated-columns bitmap in logical replication.Tom Lane2020-07-20
| | | | | | | | | | | | | | | | | | | Commit b9c130a1f failed to apply the publisher-to-subscriber column mapping while checking which columns were updated. Perhaps less significantly, it didn't exclude dropped columns either. This could result in an incorrect updated-columns bitmap and thus wrong decisions about whether to fire column-specific triggers on the subscriber while applying updates. In HEAD (since commit 9de77b545), it could also result in accesses off the end of the colstatus array, as detected by buildfarm member skink. Fix the logic, and adjust 003_constraints.pl so that the problem is exposed in unpatched code. In HEAD, also add some assertions to check that we don't access off the ends of these newly variable-sized arrays. Back-patch to v10, as b9c130a1f was. Discussion: https://postgr.es/m/CAH2-Wz=79hKQ4++c5A060RYbjTHgiYTHz=fw6mptCtgghH2gJA@mail.gmail.com
* Allow logical replication to transfer data in binary format.Tom Lane2020-07-18
| | | | | | | | | | | | | | | | | This patch adds a "binary" option to CREATE/ALTER SUBSCRIPTION. When that's set, the publisher will send data using the data type's typsend function if any, rather than typoutput. This is generally faster, if slightly less robust. As committed, we won't try to transfer user-defined array or composite types in binary, for fear that type OIDs won't match at the subscriber. This might be changed later, but it seems like fit material for a follow-on patch. Dave Cramer, reviewed by Daniel Gustafsson, Petr Jelinek, and others; adjusted some by me Discussion: https://postgr.es/m/CADK3HH+R3xMn=8t3Ct+uD+qJ1KD=Hbif5NFMJ+d5DkoCzp6Vgw@mail.gmail.com
* code: replace 'master' with 'primary' where appropriate.Andres Freund2020-07-08
| | | | | | | | | Also changed "in the primary" to "on the primary", and added a few "the" before "primary". Author: Andres Freund Reviewed-By: David Steele Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
* Add logical replication support to replicate into partitioned tablesPeter Eisentraut2020-04-06
| | | | | | | | | | | | Mainly, this adds support code in logical/worker.c for applying replicated operations whose target is a partitioned table to its relevant partitions. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Petr Jelinek <petr@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
* Refactor code to look up local replication tuplePeter Eisentraut2020-04-01
| | | | | | | This unifies some duplicate code. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://www.postgresql.org/message-id/CA+HiwqFjYE5anArxvkjr37AQMd52L-LZtz9Ld2QrLQ3YfcYhTw@mail.gmail.com
* Fix compiler warningPeter Eisentraut2020-03-24
| | | | | | | A variable was unused in non-assert builds. Simplify the code to avoid the issue. Reported-by: Erik Rijkers <er@xs4all.nl>
* Some refactoring of logical/worker.cPeter Eisentraut2020-03-24
| | | | | | | | | | | | | | | This moves the main operations of apply_handle_{insert|update|delete}, that of inserting, updating, deleting a tuple into/from a given relation, into corresponding apply_handle_{insert|update|delete}_internal functions. This allows performing those operations on relations that are not directly the targets of replication, which is something a later patch will use for targeting partitioned tables. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
* Fill in extraUpdatedCols in logical replicationPeter Eisentraut2020-02-17
| | | | | | | | | | | | | | The extraUpdatedCols field of the target RTE records which generated columns are affected by an update. This is used in a variety of places, including per-column triggers and foreign data wrappers. When an update was initiated by a logical replication subscription, this field was not filled in, so such an update would not affect generated columns in a way that is consistent with normal updates. To fix, factor out some code from analyze.c to fill in extraUpdatedCols in the logical replication worker as well. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b05e781a-fa16-6b52-6738-761181204567@2ndquadrant.com
* Have logical replication subscriber fire column triggersPeter Eisentraut2020-01-06
| | | | | | | | | The logical replication apply worker did not fire per-column update triggers because the updatedCols bitmap in the RTE was not populated. This fixes that. Reviewed-by: Euler Taveira <euler@timbira.com.br> Discussion: https://www.postgresql.org/message-id/flat/21673e2d-597c-6afe-637e-e8b10425b240%402ndquadrant.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Refactor attribute mappings used in logical tuple conversionMichael Paquier2019-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | Tuple conversion support in tupconvert.c is able to convert rowtypes between two relations, inner and outer, which are logically equivalent but have a different ordering or even dropped columns (used mainly for inheritance tree and partitions). This makes use of attribute mappings, which are simple arrays made of AttrNumber elements with a length matching the number of attributes of the outer relation. The length of the attribute mapping has been treated as completely independent of the mapping itself until now, making it easy to pass down an incorrect mapping length. This commit refactors the code related to attribute mappings and moves it into an independent facility called attmap.c, extracted from tupconvert.c. This merges the attribute mapping with its length, avoiding to try to guess what is the length of a mapping to use as this is computed once, when the map is built. This will avoid mistakes like what has been fixed in dc816e58, which has used an incorrect mapping length by matching it with the number of attributes of an inner relation (a child partition) instead of an outer relation (a partitioned table). Author: Michael Paquier Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20191121042556.GD153437@paquier.xyz
* Partially deduplicate interrupt handling for background processes.Robert Haas2019-12-17
| | | | | | | | | | | | | | | | Where possible, share signal handler code and main loop interrupt checking. This saves quite a bit of code and should simplify maintenance, too. This commit intends not to change the way anything works, even though that might allow more code to be unified. It does unify a bunch of individual variables into a ShutdownRequestPending flag that has is now used by a bunch of different process types, though. Patch by me, reviewed by Andres Freund and Daniel Gustafsson. Discussion: http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com