aboutsummaryrefslogtreecommitdiff
path: root/src/backend/replication/logical/launcher.c
Commit message (Collapse)AuthorAge
* Fix off-by-one loop termination condition in pg_stat_get_subscription().Tom Lane2022-06-07
| | | | | | | | | | | | | | | | pg_stat_get_subscription scanned one more LogicalRepWorker array entry than is really allocated. In the worst case this could lead to SIGSEGV, if the LogicalRepCtx data structure is near the end of shared memory. That seems quite unlikely though (thanks to the ordering of calls in CreateSharedMemoryAndSemaphores) and we've heard no field reports of it. A more likely misbehavior is one row of garbage data in the function's result, but even that is not real likely because of the check that the pid field matches some live backend. Report and fix by Kuntal Ghosh. This bug is old, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAGz5QCJykEDzW6jQK6Yz7Qh_PMtD=95de_7QoocbVR2Qy8hWZA@mail.gmail.com
* Fix the check to limit sync workers.Amit Kapila2022-04-19
| | | | | | | | | | | | | | | | | We don't allow to invoke more sync workers once we have reached the sync worker limit per subscription. But the check to enforce this also doesn't allow to launch an apply worker if it gets restarted. This code was introduced by commit de43897122 but we caught the problem only with the test added by recent commit c91f71b9dc which started failing occasionally in the buildfarm. As per buildfarm. Diagnosed-by: Amit Kapila, Masahiko Sawada, Tomas Vondra Author: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB=_ektNRH8NJ1jf95g@mail.gmail.com https://postgr.es/m/f90d2b03-4462-ce95-a524-d91464e797c8@enterprisedb.com
* Rename the logical replication global "wrconn"Alvaro Herrera2021-05-12
| | | | | | | | | | | | | | The worker.c global wrconn is only meant to be used by logical apply/ tablesync workers, but there are other variables with the same name. To reduce future confusion rename the global from "wrconn" to "LogRepWorkerWalRcvConn". While this is just cosmetic, it seems better to backpatch it all the way back to 10 where this code appeared, to avoid future backpatching issues. Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/CAHut+Pu7Jv9L2BOEx_Z0UtJxfDevQSAUW2mJqWU+CtmDrEZVAg@mail.gmail.com
* Fix signal handler setup for SIGHUP in the apply launcher process.Amit Kapila2020-07-17
| | | | | | | | | | | | Commit 1e53fe0e70 has unified the usage of the config-file reload flag by using the same signal handler function for the SIGHUP signal at many places in the code. By mistake, it used the wrong SIGNAL in apply launcher process for the SIGHUP signal handler function. Author: Bharath Rupireddy Reviewed-by: Dilip Kumar Backpatch-through: 13, where it was introduced Discussion: https://postgr.es/m/CALj2ACVzHCRnS20bOiEHaLtP5PVBENZQn4khdsSJQgOv_GM-LA@mail.gmail.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Avoid splitting C string literals with \-newlineAlvaro Herrera2019-12-24
| | | | | | | | | | | Using \ is unnecessary and ugly, so remove that. While at it, stitch the literals back into a single line: we've long discouraged splitting error message literals even when they go past the 80 chars line limit, to improve greppability. Leave contrib/tablefunc alone. Discussion: https://postgr.es/m/20191223195156.GA12271@alvherre.pgsql
* Partially deduplicate interrupt handling for background processes.Robert Haas2019-12-17
| | | | | | | | | | | | | | | | Where possible, share signal handler code and main loop interrupt checking. This saves quite a bit of code and should simplify maintenance, too. This commit intends not to change the way anything works, even though that might allow more code to be unified. It does unify a bunch of individual variables into a ShutdownRequestPending flag that has is now used by a bunch of different process types, though. Patch by me, reviewed by Andres Freund and Daniel Gustafsson. Discussion: http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com
* Use PostgresSigHupHandler in more places.Robert Haas2019-12-17
| | | | | | | | | | | There seems to be no reason for every background process to have its own flag indicating that a config-file reload is needed. Instead, let's just use ConfigFilePending for that purpose everywhere. Patch by me, reviewed by Andres Freund and Daniel Gustafsson. Discussion: http://postgr.es/m/CA+TgmoZwDk=BguVDVa+qdA6SBKef=PKbaKDQALTC_9qoz1mJqg@mail.gmail.com
* Remove useless "return;" linesAlvaro Herrera2019-11-28
| | | | Discussion: https://postgr.es/m/20191128144653.GA27883@alvherre.pgsql
* Make the order of the header file includes consistent in backend modules.Amit Kapila2019-11-12
| | | | | | | | | | | Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order of header file inclusion consistent for backend modules. In the passing, removed a couple of duplicate inclusions. Author: Vignesh C Reviewed-by: Kuntal Ghosh and Amit Kapila Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
* tableam: Add and use scan APIs.Andres Freund2019-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
* Replace uses of heap_open et al with the corresponding table_* function.Andres Freund2019-01-21
| | | | | Author: Andres Freund Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Add WL_EXIT_ON_PM_DEATH pseudo-event.Thomas Munro2018-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users of the WaitEventSet and WaitLatch() APIs can now choose between asking for WL_POSTMASTER_DEATH and then handling it explicitly, or asking for WL_EXIT_ON_PM_DEATH to trigger immediate exit on postmaster death. This reduces code duplication, since almost all callers want the latter. Repair all code that was previously ignoring postmaster death completely, or requesting the event but ignoring it, or requesting the event but then doing an unconditional PostmasterIsAlive() call every time through its event loop (which is an expensive syscall on platforms for which we don't have USE_POSTMASTER_DEATH_SIGNAL support). Assert that callers of WaitLatchXXX() under the postmaster remember to ask for either WL_POSTMASTER_DEATH or WL_EXIT_ON_PM_DEATH, to prevent future bugs. The only process that doesn't handle postmaster death is syslogger. It waits until all backends holding the write end of the syslog pipe (including the postmaster) have closed it by exiting, to be sure to capture any parting messages. By using the WaitEventSet API directly it avoids the new assertion, and as a by-product it may be slightly more efficient on platforms that have epoll(). Author: Thomas Munro Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas, Tom Lane Discussion: https://postgr.es/m/CAEepm%3D1TCviRykkUb69ppWLr_V697rzd1j3eZsRMmbXvETfqbQ%40mail.gmail.com, https://postgr.es/m/CAEepm=2LqHzizbe7muD7-2yHUbTOoF7Q+qkSD5Q41kuhttRTwA@mail.gmail.com
* Remove WITH OIDS support, change oid catalog column visibility.Andres Freund2018-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
* Add subtransaction handling for table synchronization workers.Robert Haas2018-07-16
| | | | | | | | | | | Since the old logic was completely unaware of subtransactions, a change made in a subsequently-aborted subtransaction would still cause workers to be stopped at toplevel transaction commit. Fix that by managing a stack of worker lists rather than just one. Amit Khandekar and Robert Haas Discussion: http://postgr.es/m/CAJ3gD9eaG_mWqiOTA2LfAug-VRNn1hrhf50Xi1YroxL37QkZNg@mail.gmail.com
* Allow background workers to bypass datallowconnMagnus Hagander2018-04-05
| | | | | | | THis adds a "flags" field to the BackgroundWorkerInitializeConnection() and BackgroundWorkerInitializeConnectionByOid(). For now only one flag, BGWORKER_BYPASS_ALLOWCONN, is defined, which allows the worker to ignore datallowconn.
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Rethink MemoryContext creation to improve performance.Tom Lane2017-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes a number of interrelated changes to reduce the overhead involved in creating/deleting memory contexts. The key ideas are: * Include the AllocSetContext header of an aset.c context in its first malloc request, rather than allocating it separately in TopMemoryContext. This means that we now always create an initial or "keeper" block in an aset, even if it never receives any allocation requests. * Create freelists in which we can save and recycle recently-destroyed asets (this idea is due to Robert Haas). * In the common case where the name of a context is a constant string, just store a pointer to it in the context header, rather than copying the string. The first change eliminates a palloc/pfree cycle per context, and also avoids bloat in TopMemoryContext, at the price that creating a context now involves a malloc/free cycle even if the context never receives any allocations. That would be a loser for some common usage patterns, but recycling short-lived contexts via the freelist eliminates that pain. Avoiding copying constant strings not only saves strlen() and strcpy() overhead, but is an essential part of the freelist optimization because it makes the context header size constant. Currently we make no attempt to use the freelist for contexts with non-constant names. (Perhaps someday we'll need to think harder about that, but in current usage, most contexts with custom names are long-lived anyway.) The freelist management in this initial commit is pretty simplistic, and we might want to refine it later --- but in common workloads that will never matter because the freelists will never get full anyway. To create a context with a non-constant name, one is now required to call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME option. AllocSetContextCreate becomes a wrapper macro, and it includes a test that will complain about non-string-literal context name parameters on gcc and similar compilers. An unfortunate side effect of making AllocSetContextCreate a macro is that one is now *required* to use the size parameter abstraction macros (ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of writing out individual size parameters no longer works unless you switch to AllocSetContextCreateExtended. Internally to the memory-context-related modules, the context creation APIs are simplified, removing the rather baroque original design whereby a context-type module called mcxt.c which then called back into the context-type module. That saved a bit of code duplication, but not much, and it prevented context-type modules from exercising control over the allocation of context headers. In passing, I converted the test-and-elog validation of aset size parameters into Asserts to save a few more cycles. The original thought was that callers might compute size parameters on the fly, but in practice nobody does that, so it's useless to expend cycles on checking those numbers in production builds. Also, mark the memory context method-pointer structs "const", just for cleanliness. Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
* Add background worker typePeter Eisentraut2017-09-29
| | | | | | | | | | | | | | | | | Add bgw_type field to background worker structure. It is intended to be set to the same value for all workers of the same type, so they can be grouped in pg_stat_activity, for example. The backend_type column in pg_stat_activity now shows bgw_type for a background worker. The ps listing also no longer calls out that a process is a background worker but just show the bgw_type. That way, being a background worker is more of an implementation detail now that is not shown to the user. However, most log messages still refer to 'background worker "%s"'; otherwise constructing sensible and translatable log messages would become tricky. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
* Fix, or at least ameliorate, bugs in logicalrep_worker_launch().Tom Lane2017-09-18
| | | | | | | | | | | | | | | | | | | | | | If we failed to get a background worker slot, the code just walked away from the logicalrep-worker slot it already had, leaving that looking like the worker is still starting up. This led to an indefinite hang in subscription startup, as reported by Thomas Munro. We must release the slot on failure. Also fix a thinko: we must capture the worker slot's generation before releasing LogicalRepWorkerLock the first time, else testing to see if it's changed is pretty meaningless. BTW, the CHECK_FOR_INTERRUPTS() in WaitForReplicationWorkerAttach is a ticking time bomb, even without considering the possibility of elog(ERROR) in one of the other functions it calls. Really, this entire business needs a redesign with some actual thought about error recovery. But for now I'm just band-aiding the case observed in testing. Back-patch to v10 where this code was added. Discussion: https://postgr.es/m/CAEepm=2bP3TBMFBArP6o20AZaRduWjMnjCjt22hSdnA-EvrtCw@mail.gmail.com
* Simplify some code in logical replication launcherPeter Eisentraut2017-08-15
| | | | | | Avoid unnecessary locking calls when a subscription is disabled. Author: Yugo Nagata <nagata@sraoss.co.jp>
* Final pgindent + perltidy run for v10.Tom Lane2017-08-14
|
* Only kill sync workers at commit time in subscription DDLPeter Eisentraut2017-08-04
| | | | | | This allows a transaction abort to avoid killing those workers. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
* Shorten timeouts while waiting for logicalrep worker slot attach/detach.Tom Lane2017-07-01
| | | | | | | | | | | | | | | | | | | | | | When waiting for a logical replication worker process to start or stop, we have to busy-wait until we see it add or remove itself from the LogicalRepWorker slot in shared memory. Those loops were using a one-second delay between checks, but on any reasonably modern machine, it doesn't take more than a couple of msec for a worker to spawn or shut down. Reduce the loop delays to 10ms to avoid wasting quite so much time in the related regression tests. In principle, a better solution would be to fix things so that the waiting process can be awakened via its latch at the right time. But that seems considerably more invasive, which is undesirable for a post-beta fix. Worker start/stop performance likely isn't of huge interest anyway for production purposes, so we might not ever get around to it. In passing, rearrange the second wait loop in logicalrep_worker_stop() so that the lock is held at the top of the loop, thus saving one lock acquisition/release per call, and making it look more like the other loop. Discussion: https://postgr.es/m/30864.1498861103@sss.pgh.pa.us
* Fix race conditions and missed wakeups in syncrep worker signaling.Tom Lane2017-06-30
| | | | | | | | | | | | | | | | | | | When a sync worker is waiting for the associated apply worker to notice that it's in SYNCWAIT state, wait_for_worker_state_change() would just patiently wait for that to happen. This generally required waiting for the 1-second timeout in LogicalRepApplyLoop to elapse. Kicking the worker via its latch makes things significantly snappier. While at it, fix race conditions that could potentially result in crashes: we can *not* call logicalrep_worker_wakeup_ptr() once we've released the LogicalRepWorkerLock, because worker->proc might've been reset to NULL after we do that (indeed, there's no really solid reason to believe that the LogicalRepWorker slot even belongs to the same worker anymore). In logicalrep_worker_wakeup(), we can just move the wakeup inside the lock scope. In process_syncing_tables_for_apply(), a bit more code rearrangement is needed. Also improve some nearby comments.
* Phase 3 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Use standard interrupt handling in logical replication launcher.Andres Freund2017-06-08
| | | | | | | | | | | | | | | Previously the exit handling was only able to exit from within the main loop, and not from within the backend code it calls. Fix that by using the standard die() SIGTERM handler, and adding the necessary CHECK_FOR_INTERRUPTS() call. This requires adding yet another process-type-specific branch to ProcessInterrupts(), which hints that we probably should generalize that handling. But that's work for another day. Author: Petr Jelinek Reviewed-By: Andres Freund Discussion: https://postgr.es/m/fe072153-babd-3b5d-8052-73527a6eb657@2ndquadrant.com
* Clean up latch related code.Andres Freund2017-06-06
| | | | | | | | | | | | | | | | | | | | | | The larger part of this patch replaces usages of MyProc->procLatch with MyLatch. The latter works even early during backend startup, where MyProc->procLatch doesn't yet. While the affected code shouldn't run in cases where it's not initialized, it might get copied into places where it might. Using MyLatch is simpler and a bit faster to boot, so there's little point to stick with the previous coding. While doing so I noticed some weaknesses around newly introduced uses of latches that could lead to missed events, and an omitted CHECK_FOR_INTERRUPTS() call in worker_spi. As all the actual bugs are in v10 code, there doesn't seem to be sufficient reason to backpatch this. Author: Andres Freund Discussion: https://postgr.es/m/20170606195321.sjmenrfgl2nu6j63@alap3.anarazel.de https://postgr.es/m/20170606210405.sim3yl6vpudhmufo@alap3.anarazel.de Backpatch: -
* Don't set application_name in logical replication workersPeter Eisentraut2017-06-05
| | | | | | This was bothering some people because it's not the intended use of application_name and it makes the default view of pg_stat_activity bulky.
* Fix signal handling in logical replication workersPeter Eisentraut2017-06-02
| | | | | | | | | | The logical replication worker processes now use the normal die() handler for SIGTERM and CHECK_FOR_INTERRUPTS() instead of custom code. One problem before was that the apply worker would not exit promptly when a subscription was dropped, which could lead to deadlocks. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
* Reorganize logical replication worker disconnect codePeter Eisentraut2017-06-01
| | | | | | | | | Move the walrcv_disconnect() calls into the before_shmem_exit handler. This makes sure the call is always made even during exit by signal, it saves some duplicate code, and it makes the logic more similar to walreceiver.c. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
* Improve logical replication worker log messagesPeter Eisentraut2017-05-24
| | | | | | | | Reduce some redundant messages to DEBUG1. Be clearer about the distinction between apply workers and table synchronization workers. Add subscription and table name where possible. Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
* Post-PG 10 beta1 pgindent runBruce Momjian2017-05-17
| | | | perltidy run not included.
* Fix logical replication launcher wake up and resetPeter Eisentraut2017-05-01
| | | | | | | | | | | | | After the logical replication launcher was told to wake up at commit (for example, by a CREATE SUBSCRIPTION command), the flag to wake up was not reset, so it would be woken up at every following commit as well. So fix that by resetting the flag. Also, we don't need to wake up anything if the transaction was rolled back. Just reset the flag in that case. Author: Masahiko Sawada <sawada.mshk@gmail.com> Reported-by: Fujii Masao <masao.fujii@gmail.com>
* Fix possible null pointer dereference or invalid warning message.Tom Lane2017-04-30
| | | | | | | | | | | | Thinko in commit de4389712: this warning message references the wrong "LogicalRepWorker *" variable. This would often result in a core dump, but if it didn't, the message would show the wrong subscription OID. In passing, adjust the message text to format a subscription OID similarly to how that's done elsewhere in the function; and fix grammatical issues in some nearby messages. Per Coverity testing.
* Fix bug so logical rep launcher saves correctly time of last startup of worker.Fujii Masao2017-04-28
| | | | | | | | | | | | | | | | | | | | | | Previously the logical replication launcher stored the last timestamp when it started the worker, in the local variable "last_start_time", in order to check whether wal_retrive_retry_interval elapsed since the last startup of worker. If it has elapsed, the launcher sees pg_subscription and starts new worker if necessary. This is for limitting the startup of worker to once a wal_retrieve_retry_interval. The bug was that the variable "last_start_time" was defined and always initialized with 0 at the beginning of the launcher's main loop. So even if it's set to the last timestamp in later phase of the loop, it's always reset to 0. Therefore the launcher could not check correctly whether wal_retrieve_retry_interval elapsed since the last startup. This patch moves the variable "last_start_time" outside the main loop so that it will not be reset. Reviewed-by: Petr Jelinek Discussion: http://postgr.es/m/CAHGQGwGJrPO++XM4mFENAwpy1eGXKsGdguYv43GUgLgU-x8nTQ@mail.gmail.com
* Silence compiler warning induced by commit de4389712.Tom Lane2017-04-26
| | | | | Smarter compilers can see that "slot" can't be used uninitialized, but some popular ones cannot. Noted by Jeff Janes.
* Fix various concurrency issues in logical replication worker launchingPeter Eisentraut2017-04-26
| | | | | | | | | | | | | | | | | | | | | | | | | | The code was originally written with assumption that launcher is the only process starting the worker. However that hasn't been true since commit 7c4f52409 which failed to modify the worker management code adequately. This patch adds an in_use field to the LogicalRepWorker struct to indicate whether the worker slot is being used and uses proper locking everywhere this flag is set or read. However if the parent process dies while the new worker is starting and the new worker fails to attach to shared memory, this flag would never get cleared. We solve this rare corner case by adding a sort of garbage collector for in_use slots. This uses another field in the LogicalRepWorker struct named launch_time that contains the time when the worker was started. If any request to start a new worker does not find free slot, we'll check for workers that were supposed to start but took too long to actually do so, and reuse their slot. In passing also fix possible race conditions when stopping a worker that hasn't finished starting yet. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Fujii Masao <masao.fujii@gmail.com>
* A collection of small fixes for logical replication.Fujii Masao2017-04-19
| | | | | | | | | | | | | | | | | | * Be sure to reset the launcher's pid (LogicalRepCtx->launcher_pid) to 0 even when the launcher emits an error. * Declare ApplyLauncherWakeup() as a static function because it's called only in launcher.c. * Previously IsBackendPId() was used to check whether the launcher's pid was valid. IsBackendPid() was necessary because there was the bug where the launcher's pid was not reset to 0. But now it's fixed, so IsBackendPid() is not necessary and this patch removes it. Author: Masahiko Sawada Reviewed-by: Kyotaro Horiguchi Reported-by: Fujii Masao Discussion: http://postgr.es/m/CAHGQGwFDWh_Qr-q_GEMpD+qH=vYPMdVqw=ZOSY3kX_Pna9R9SA@mail.gmail.com
* Use DatumGetInt32() to extract 32-bit integer value from a datum.Fujii Masao2017-04-19
| | | | | | | | | Previously DatumGetObjectId() was wrongly used for that. Author: Masahiko Sawada Reviewed-by: Kyotaro Horiguchi Reported-by: Fujii Masao Discussion: http://postgr.es/m/CAHGQGwFDWh_Qr-q_GEMpD+qH=vYPMdVqw=ZOSY3kX_Pna9R9SA@mail.gmail.com
* Ensure BackgroundWorker struct contents are well-defined.Tom Lane2017-04-16
| | | | | | | | Coverity complained because bgw.bgw_extra wasn't being filled in by ApplyLauncherRegister(). The most future-proof fix is to memset the whole BackgroundWorker struct to zeroes. While at it, let's apply the same coding rule to other places that set up BackgroundWorker structs; four out of five had the same or related issues.
* Add option to modify sync commit per subscriptionPeter Eisentraut2017-04-14
| | | | | | | This also changes default behaviour of subscription workers to synchronous_commit = off. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
* Support configuration reload in logical replication workersPeter Eisentraut2017-04-10
| | | | | | Author: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Fujii Masao <masao.fujii@gmail.com>
* Don't use bgw_main even to specify in-core bgworker entrypoints.Robert Haas2017-03-31
| | | | | | | | | | | | | | | On EXEC_BACKEND builds, this can fail if ASLR is in use. Backpatch to 9.5. On master, completely remove the bgw_main field completely, since there is no situation in which it is safe for an EXEC_BACKEND build. On 9.6 and 9.5, leave the field intact to avoid breaking things for third-party code that doesn't care about working under EXEC_BACKEND. Prior to 9.5, there are no in-core bgworker entrypoints. Petr Jelinek, reviewed by me. Discussion: http://postgr.es/m/09d8ad33-4287-a09b-a77f-77f8761adb5e@2ndquadrant.com
* Logical replication support for initial data copyPeter Eisentraut2017-03-23
| | | | | | | | | | | | | | | | | | | | | | | Add functionality for a new subscription to copy the initial data in the tables and then sync with the ongoing apply process. For the copying, add a new internal COPY option to have the COPY source data provided by a callback function. The initial data copy works on the subscriber by receiving COPY data from the publisher and then providing it locally into a COPY that writes to the destination table. A WAL receiver can now execute full SQL commands. This is used here to obtain information about tables and publications. Several new options were added to CREATE and ALTER SUBSCRIPTION to control whether and when initial table syncing happens. Change pg_dump option --no-create-subscription-slots to --no-subscription-connect and use the new CREATE SUBSCRIPTION ... NOCONNECT option for that. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Tested-by: Erik Rijkers <er@xs4all.nl>
* Reduce log verbosity of startup/shutdown for launcher subprocesses.Tom Lane2017-03-10
| | | | | | | | | | | | | | | There's no really good reason why the autovacuum launcher and logical replication launcher should announce themselves at startup and shutdown by default. Users don't care that those processes exist, and it's inconsistent that those background processes announce themselves while others don't. So, reduce those messages from LOG to DEBUG1 level. I was sorely tempted to reduce the "starting logical replication worker for subscription ..." message to DEBUG1 as well, but forebore for now. Those processes might possibly be of direct interest to users, at least until logical replication is a lot better shaken out than it is today. Discussion: https://postgr.es/m/19479.1489121003@sss.pgh.pa.us
* Prevent logical rep workers with removed subscriptions from starting.Fujii Masao2017-03-09
| | | | | | | | | | | | | | | | | | | Any logical rep workers must have their subscription entries in pg_subscription. To ensure this, we need to prevent the launcher from starting new worker corresponding to the subscription that DROP SUBSCRIPTION command is removing. To implement this, previously LogicalRepLauncherLock was introduced and held until the end of transaction running DROP SUBSCRIPTION. But using LWLock for that purpose was not valid. Instead, this commit changes DROP SUBSCRIPTION so that it takes AccessExclusiveLock on pg_subscription, in order to ensure that the launcher cannot see any subscriptions being removed. Also this commit gets rid of LogicalRepLauncherLock. Patch by me, reviewed by Petr Jelinek Discussion: https://www.postgresql.org/message-id/CAHGQGwHPi8ky-yANFfe0sgmhKtsYcQLTnKx07bW9S7-Rn1746w@mail.gmail.com
* Fix typo in variable name.Heikki Linnakangas2017-02-06
| | | | Masahiko Sawada
* Fix typos in comments.Heikki Linnakangas2017-02-06
| | | | | | | | | Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com