aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor/execParallel.c
Commit message (Collapse)AuthorAge
...
* Support Parallel Append plan nodes.Robert Haas2017-12-05
| | | | | | | | | | | | | | | | | | | | When we create an Append node, we can spread out the workers over the subplans instead of piling on to each subplan one at a time, which should typically be a bit more efficient, both because the startup cost of any plan executed entirely by one worker is paid only once and also because of reduced contention. We can also construct Append plans using a mix of partial and non-partial subplans, which may allow for parallelism in places that otherwise couldn't support it. Unfortunately, this patch doesn't handle the important case of parallelizing UNION ALL by running each branch in a separate worker; the executor infrastructure is added here, but more planner work is needed. Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
* Fix accumulation of parallel worker instrumentation.Robert Haas2017-12-05
| | | | | | | | | | | | | When a Gather or Gather Merge node is started and stopped multiple times, the old code wouldn't reset the shared state between executions, potentially resulting in dramatically inflated instrumentation data for nodes beneath it. (The per-worker instrumentation ended up OK, I think, but the overall totals were inflated.) Report by hubert depesz lubaczewski. Analysis and fix by Amit Kapila, reviewed and tweaked a bit by me. Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com
* Fix EXPLAIN ANALYZE of hash join when the leader doesn't participate.Andres Freund2017-12-05
| | | | | | | | | | | | | | | | | | | | If a hash join appears in a parallel query, there may be no hash table available for explain.c to inspect even though a hash table may have been built in other processes. This could happen either because parallel_leader_participation was set to off or because the leader happened to hit the end of the outer relation immediately (even though the complete relation is not empty) and decided not to build the hash table. Commit bf11e7ee introduced a way for workers to exchange instrumentation via the DSM segment for Sort nodes even though they are not parallel-aware. This commit does the same for Hash nodes, so that explain.c has a way to find instrumentation data from an arbitrary participant that actually built the hash table. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm%3D3DUQC2-z252N55eOcZBer6DPdM%3DFzrxH9dZc5vYLsjaA%40mail.gmail.com
* Pass eflags down to parallel workers.Robert Haas2017-11-20
| | | | | | | | | | | | | | | | | | Currently, there are no known consequences of this oversight, so no back-patch. Several of the EXEC_FLAG_* constants aren't usable in parallel mode anyway, and potential problems related to the presence or absence of OIDs (see EXEC_FLAG_WITH_OIDS, EXEC_FLAG_WITHOUT_OIDS) seem at present to be masked by the unconditional projection step performed by Gather and Gather Merge. In general, however, it seems important that all participants agree on the values of these flags, which modify executor behavior globally, and a pending patch to skip projection in Gather (Merge) would be outright broken in certain cases without this fix. Patch by me, based on investigation of a test case provided by Amit Kapila. This patch was also reviewed by Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoZ0ZL=cesZFq8c9NnfK6bqy-wwUd3_74iYGodYrSoQ7Fw@mail.gmail.com
* Provide DSM segment to ExecXXXInitializeWorker functions.Andres Freund2017-11-16
| | | | | | | | | | | | Previously, executor nodes running in parallel worker processes didn't have access to the dsm_segment object used for parallel execution. In order to support resource management based on DSM segment lifetime, they need that. So create a ParallelWorkerContext object to hold it and pass it to all InitializeWorker functions. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
* Pass InitPlan values to workers via Gather (Merge).Robert Haas2017-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | If a PARAM_EXEC parameter is used below a Gather (Merge) but the InitPlan that computes it is attached to or above the Gather (Merge), force the value to be computed before starting parallelism and pass it down to all workers. This allows us to use parallelism in cases where it previously would have had to be rejected as unsafe. We do - in this case - lose the optimization that the value is only computed if it's actually used. An alternative strategy would be to have the first worker that needs the value compute it, but one downside of that approach is that we'd then need to select a parallel-safe path to compute the parameter value; it couldn't for example contain a Gather (Merge) node. At some point in the future, we might want to consider both approaches. Independent of that consideration, there is a great deal more work that could be done to make more kinds of PARAM_EXEC parameters parallel-safe. This infrastructure could be used to allow a Gather (Merge) on the inner side of a nested loop (although that's not a very appealing plan) and cases where the InitPlan is attached below the Gather (Merge) could be addressed as well using various techniques. But this is a good start. Amit Kapila, reviewed and revised by me. Reviewing and testing from Kuntal Ghosh, Haribabu Kommi, and Tushar Ahuja. Discussion: http://postgr.es/m/CAA4eK1LV0Y1AUV4cUCdC+sYOx0Z0-8NAJ2Pd9=UKsbQ5Sr7+JQ@mail.gmail.com
* Track in the plan the types associated with PARAM_EXEC parameters.Robert Haas2017-11-13
| | | | | | | | | | | | Up until now, we only tracked the number of parameters, which was sufficient to allocate an array of Datums of the appropriate size, but not sufficient to, for example, know how to serialize a Datum stored in one of those slots. An upcoming patch wants to do that, so add this tracking to make it possible. Patch by me, reviewed by Tom Lane and Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoYqpxDKn8koHdW8BEKk8FMUL0=e8m2Qe=M+r0UBjr3tuQ@mail.gmail.com
* pg_stat_statements: Widen query IDs from 32 bits to 64 bits.Robert Haas2017-10-11
| | | | | | | | | | | | | | | | | | This takes advantage of the infrastructure introduced by commit 81c5e46c490e2426db243eada186995da5bb0ba7 to greatly reduce the likelihood that two different queries will end up with the same query ID. It's still possible, of course, but whereas before it the chances of a collision reached 25% around 50,000 queries, it will now take more than 3 billion queries. Backward incompatibility: Because the type exposed at the SQL level is int8, users may now see negative query IDs in the pg_stat_statements view (and also, query IDs more than 4 billion, which was the old limit). Patch by me, reviewed by Michael Paquier and Peter Geoghegan. Discussion: http://postgr.es/m/CA+TgmobG_Kp4cBKFmsznUAaM1GWW6hhRNiZC0KjRMOOeYnz5Yw@mail.gmail.com
* Remove TupleDesc remapping logic from tqueue.c.Andres Freund2017-09-14
| | | | | | | | | | With the introduction of a shared memory record typmod registry, it is no longer necessary to remap record typmods when sending tuples between backends so most of tqueue.c can be removed. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
* Reduce excessive dereferencing of function pointersPeter Eisentraut2017-09-07
| | | | | | | | | | | | It is equivalent in ANSI C to write (*funcptr) () and funcptr(). These two styles have been applied inconsistently. After discussion, we'll use the more verbose style for plain function pointer variables, to make it clear that it's a variable, and the shorter style when the function pointer is in a struct (s.func() or s->func()), because then it's clear that it's not a plain function name, and otherwise the excessive punctuation makes some of those invocations hard to read. Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com
* Improve division of labor between execParallel.c and nodeGather[Merge].c.Tom Lane2017-09-01
| | | | | | | | | | | | | | | | Move the responsibility for creating/destroying TupleQueueReaders into execParallel.c, to avoid duplicative coding in nodeGather.c and nodeGatherMerge.c. Also, instead of having DestroyTupleQueueReader do shm_mq_detach, do it in the caller (which is now only ExecParallelFinish). This means execParallel.c does both the attaching and detaching of the tuple-queue-reader shm_mqs, which seems less weird than the previous arrangement. These changes also eliminate a vestigial memory leak (of the pei->tqueue array). It's now demonstrable that rescans of Gather or GatherMerge don't leak memory. Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us
* Separate reinitialization of shared parallel-scan state from ExecReScan.Tom Lane2017-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the parallel executor logic did reinitialization of shared state within the ExecReScan code for parallel-aware scan nodes. This is problematic, because it means that the ExecReScan call has to occur synchronously (ie, during the parent Gather node's ReScan call). That is swimming very much against the tide so far as the ExecReScan machinery is concerned; the fact that it works at all today depends on a lot of fragile assumptions, such as that no plan node between Gather and a parallel-aware scan node is parameterized. Another objection is that because ExecReScan might be called in workers as well as the leader, hacky extra tests are needed in some places to prevent unwanted shared-state resets. Hence, let's separate this code into two functions, a ReInitializeDSM call and the ReScan call proper. ReInitializeDSM is called only in the leader and is guaranteed to run before we start new workers. ReScan is returned to its traditional function of resetting only local state, which means that ExecReScan's usual habits of delaying or eliminating child rescan calls are safe again. As with the preceding commit 7df2c1f8d, it doesn't seem to be necessary to make these changes in 9.6, which is a good thing because the FDW and CustomScan APIs are impacted. Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
* Propagate sort instrumentation from workers back to leader.Robert Haas2017-08-29
| | | | | | | | | | | | | | | | Up until now, when parallel query was used, no details about the sort method or space used by the workers were available; details were shown only for any sorting done by the leader. Fix that. Commit 1177ab1dabf72bafee8f19d904cee3a299f25892 forced the test case added by commit 1f6d515a67ec98194c23a5db25660856c9aab944 to run without parallelism; now that we have this infrastructure, allow that again, with a little tweaking to make it pass with and without force_parallel_mode. Robert Haas and Tom Lane Discussion: http://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=Rf6RqQ2br+zJvEgwJ0uoD_tQ@mail.gmail.com
* Push tuple limits through Gather and Gather Merge.Robert Haas2017-08-29
| | | | | | | | | | | | If we only need, say, 10 tuples in total, then we certainly don't need more than 10 tuples from any single process. Pushing down the limit lets workers exit early when possible. For Gather Merge, there is an additional benefit: a Sort immediately below the Gather Merge can be done as a bounded sort if there is an applicable limit. Robert Haas and Tom Lane Discussion: http://postgr.es/m/CA+TgmoYa3QKKrLj5rX7UvGqhH73G1Li4B-EKxrmASaca2tFu9Q@mail.gmail.com
* Phase 3 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Don't be so trusting that shm_toc_lookup() will always succeed.Tom Lane2017-06-05
| | | | | | | | | | | | | | | | | | Given the possibility of race conditions and so on, it seems entirely unsafe to just assume that shm_toc_lookup() always finds the key it's looking for --- but that was exactly what all but one call site were doing. To fix, add a "bool noError" argument, similarly to what we have in many other functions, and throw an error on an unexpected lookup failure. Remove now-redundant Asserts that a rather random subset of call sites had. I doubt this will throw any light on buildfarm member lorikeet's recent failures, because if an unnoticed lookup failure were involved, you'd kind of expect a null-pointer-dereference crash rather than the observed symptom. But you never know ... and this is better coding practice even if it never catches anything. Discussion: https://postgr.es/m/9697.1496675981@sss.pgh.pa.us
* Post-PG 10 beta1 pgindent runBruce Momjian2017-05-17
| | | | perltidy run not included.
* Avoid passing function pointers across process boundaries.Tom Lane2017-04-14
| | | | | | | | | | | | | | | | We'd already recognized that we can't pass function pointers across process boundaries for functions in loadable modules, since a shared library could get loaded at different addresses in different processes. But actually the practice doesn't work for functions in the core backend either, if we're using EXEC_BACKEND. This is the cause of recent failures on buildfarm member culicidae. Switch to passing a string function name in all cases. Something like this needs to be back-patched into 9.6, but let's see if the buildfarm likes it first. Petr Jelinek, with a bunch of basically-cosmetic adjustments by me Discussion: https://postgr.es/m/548f9c1d-eafa-e3fa-9da8-f0cc2f654e60@2ndquadrant.com
* Avoid transferring parallel-unsafe subplans to parallel workers.Tom Lane2017-04-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5e6d8d2bb allowed parallel workers to execute parallel-safe subplans, but it transmitted the query's entire list of subplans to the worker(s). Since execMain.c blindly does ExecInitNode and later ExecEndNode on every list element, this resulted in parallel-unsafe plan nodes nonetheless getting started up and shut down in parallel workers. That seems mostly harmless as far as core plan node types go (but maybe not so much for Gather?). But it resulted in postgres_fdw opening and then closing extra remote connections, and it's likely that other non-parallel-safe FDWs or custom scan providers would have worse reactions. To fix, just make ExecSerializePlan replace parallel-unsafe subplans with NULLs in the cut-down plan tree that it transmits to workers. This relies on ExecInitNode and ExecEndNode to do nothing on NULL input, but they do anyway. If anything else is touching the dropped subplans in a parallel worker, that would be a bug to be fixed. (This thus provides a strong guarantee that we won't try to do something with a parallel-unsafe subplan in a worker.) This is, I think, the last fix directly occasioned by Andreas Seltenreich's bug report of a few days ago. Tom Lane and Amit Kapila Discussion: https://postgr.es/m/87tw5x4vcu.fsf@credativ.de
* Add infrastructure to support EphemeralNamedRelation references.Kevin Grittner2017-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A QueryEnvironment concept is added, which allows new types of objects to be passed into queries from parsing on through execution. At this point, the only thing implemented is a collection of EphemeralNamedRelation objects -- relations which can be referenced by name in queries, but do not exist in the catalogs. The only type of ENR implemented is NamedTuplestore, but provision is made to add more types fairly easily. An ENR can carry its own TupleDesc or reference a relation in the catalogs by relid. Although these features can be used without SPI, convenience functions are added to SPI so that ENRs can easily be used by code run through SPI. The initial use of all this is going to be transition tables in AFTER triggers, but that will be added to each PL as a separate commit. An incidental effect of this patch is to produce a more informative error message if an attempt is made to modify the contents of a CTE from a referencing DML statement. No tests previously covered that possibility, so one is added. Kevin Grittner and Thomas Munro Reviewed by Heikki Linnakangas, David Fetter, and Thomas Munro with valuable comments and suggestions from many others
* Allow for parallel execution whenever ExecutorRun() is done only once.Robert Haas2017-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, it was unsafe to execute a plan in parallel if ExecutorRun() might be called with a non-zero row count. However, it's quite easy to fix things up so that we can support that case, provided that it is known that we will never call ExecutorRun() a second time for the same QueryDesc. Add infrastructure to signal this, and cross-checks to make sure that a caller who claims this is true doesn't later reneg. While that pattern never happens with queries received directly from a client -- there's no way to know whether multiple Execute messages will be sent unless the first one requests all the rows -- it's pretty common for queries originating from procedural languages, which often limit the result to a single tuple or to a user-specified number of tuples. This commit doesn't actually enable parallelism in any additional cases, because currently none of the places that would be able to benefit from this infrastructure pass CURSOR_OPT_PARALLEL_OK in the first place, but it makes it much more palatable to pass CURSOR_OPT_PARALLEL_OK in places where we currently don't, because it eliminates some cases where we'd end up having to run the parallel plan serially. Patch by me, based on some ideas from Rafia Sabih and corrected by Rafia Sabih based on feedback from Dilip Kumar and myself. Discussion: http://postgr.es/m/CA+TgmobXEhvHbJtWDuPZM9bVSLiTj-kShxQJ2uM5GPDze9fRYA@mail.gmail.com
* Don't scan partitioned tables.Robert Haas2017-03-21
| | | | | | | | | | | | | | | | | | | Partitioned tables do not contain any data; only their unpartitioned descendents need to be scanned. However, the partitioned tables still need to be locked, even though they're not scanned. To make that work, Append and MergeAppend relations now need to carry a list of (unscanned) partitioned relations that must be locked, and InitPlan must lock all partitioned result relations. Aside from the obvious advantage of avoiding some work at execution time, this has two other advantages. First, it may improve the planner's decision-making in some cases since the empty relation might throw things off. Second, it paves the way to getting rid of the storage for partitioned tables altogether. Amit Langote, reviewed by me. Discussion: http://postgr.es/m/6837c359-45c4-8044-34d1-736756335a15@lab.ntt.co.jp
* Support parallel bitmap heap scans.Robert Haas2017-03-08
| | | | | | | | | | | | | | | The index is scanned by a single process, but then all cooperating processes can iterate jointly over the resulting set of heap blocks. In the future, we might also want to support using a parallel bitmap index scan to set up for a parallel bitmap heap scan, but that's a job for another day. Dilip Kumar, with some corrections and cosmetic changes by me. The larger patch set of which this is a part has been reviewed and tested by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia Sabih, Haribabu Kommi, Thomas Munro, and me. Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
* Pass the source text for a parallel query to the workers.Robert Haas2017-02-22
| | | | | | | | | With this change, you can see the query that a parallel worker is executing in pg_stat_activity, and if the worker crashes you can see what query it was executing when it crashed. Rafia Sabih, reviewed by Kuntal Ghosh and Amit Kapila and slightly revised by me.
* Add optimizer and executor support for parallel index-only scans.Robert Haas2017-02-19
| | | | | | | | | | | | | Commit 5262f7a4fc44f651241d2ff1fa688dd664a34874 added similar support for parallel index scans; this extends that work to index-only scans. As with parallel index scans, this requires support from the index AM, so currently parallel index-only scans will only be possible for btree indexes. Rafia Sabih, reviewed and tested by Rahila Syed, Tushar Ahuja, and Amit Kapila Discussion: http://postgr.es/m/CAOGQiiPEAs4C=TBp0XShxBvnWXuzGL2u++Hm1=qnCpd6_Mf8Fw@mail.gmail.com
* Add optimizer and executor support for parallel index scans.Robert Haas2017-02-15
| | | | | | | | | | | | In combination with 569174f1be92be93f5366212cc46960d28a5c5cd, which taught the btree AM how to perform parallel index scans, this allows parallel index scan plans on btree indexes. This infrastructure should be general enough to support parallel index scans for other index AMs as well, if someone updates them to support parallel scans. Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar Ahuja, and Haribabu Kommi, and me.
* Allow parallel workers to execute subplans.Robert Haas2017-02-14
| | | | | | | | | | | | | | | This doesn't do anything to make Param nodes anything other than parallel-restricted, so this only helps with uncorrelated subplans, and it's not necessarily very cheap because each worker will run the subplan separately (just as a Hash Join will build a separate copy of the hash table in each participating process), but it's a first step toward supporting cases that are more likely to help in practice, and is occasionally useful on its own. Amit Kapila, reviewed and tested by Rafia Sabih, Dilip Kumar, and me. Discussion: http://postgr.es/m/CAA4eK1+e8Z45D2n+rnDMDYsVEb5iW7jqaCH_tvPMYau=1Rru9w@mail.gmail.com
* Fix typos in comments.Heikki Linnakangas2017-02-06
| | | | | | | | | Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
* Change representation of statement lists, and add statement location info.Tom Lane2017-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes several changes that improve the consistency of representation of lists of statements. It's always been the case that the output of parse analysis is a list of Query nodes, whatever the types of the individual statements in the list. This patch brings similar consistency to the outputs of raw parsing and planning steps: * The output of raw parsing is now always a list of RawStmt nodes; the statement-type-dependent nodes are one level down from that. * The output of pg_plan_queries() is now always a list of PlannedStmt nodes, even for utility statements. In the case of a utility statement, "planning" just consists of wrapping a CMD_UTILITY PlannedStmt around the utility node. This list representation is now used in Portal and CachedPlan plan lists, replacing the former convention of intermixing PlannedStmts with bare utility-statement nodes. Now, every list of statements has a consistent head-node type depending on how far along it is in processing. This allows changing many places that formerly used generic "Node *" pointers to use a more specific pointer type, thus reducing the number of IsA() tests and casts needed, as well as improving code clarity. Also, the post-parse-analysis representation of DECLARE CURSOR is changed so that it looks more like EXPLAIN, PREPARE, etc. That is, the contained SELECT remains a child of the DeclareCursorStmt rather than getting flipped around to be the other way. It's now true for both Query and PlannedStmt that utilityStmt is non-null if and only if commandType is CMD_UTILITY. That allows simplifying a lot of places that were testing both fields. (I think some of those were just defensive programming, but in many places, it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.) Because PlannedStmt carries a canSetTag field, we're also able to get rid of some ad-hoc rules about how to reconstruct canSetTag for a bare utility statement; specifically, the assumption that a utility is canSetTag if and only if it's the only one in its list. While I see no near-term need for relaxing that restriction, it's nice to get rid of the ad-hocery. The API of ProcessUtility() is changed so that what it's passed is the wrapper PlannedStmt not just the bare utility statement. This will affect all users of ProcessUtility_hook, but the changes are pretty trivial; see the affected contrib modules for examples of the minimum change needed. (Most compilers should give pointer-type-mismatch warnings for uncorrected code.) There's also a change in the API of ExplainOneQuery_hook, to pass through cursorOptions instead of expecting hook functions to know what to pick. This is needed because of the DECLARE CURSOR changes, but really should have been done in 9.6; it's unlikely that any extant hook functions know about using CURSOR_OPT_PARALLEL_OK. Finally, teach gram.y to save statement boundary locations in RawStmt nodes, and pass those through to Query and PlannedStmt nodes. This allows more intelligent handling of cases where a source query string contains multiple statements. This patch doesn't actually do anything with the information, but a follow-on patch will. (Passing this information through cleanly is the true motivation for these changes; while I think this is all good cleanup, it's unlikely we'd have bothered without this end goal.) catversion bump because addition of location fields to struct Query affects stored rules. This patch is by me, but it owes a good deal to Fabien Coelho who did a lot of preliminary work on the problem, and also reviewed the patch. Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre
* Fix incorrect function name in comment.Robert Haas2017-01-12
| | | | Amit Langote
* Fix possible crash reading pg_stat_activity.Robert Haas2017-01-05
| | | | | | | | | | With the old code, a backend that read pg_stat_activity without ever having executed a parallel query might see a backend in the midst of executing one waiting on a DSA LWLock, resulting in a crash. The solution is for backends to register the tranche at startup time, not the first time a parallel query is executed. Report by Andreas Seltenreich. Patch by me, reviewed by Thomas Munro.
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Spellcheck: s/descendent/descendant/gTom Lane2016-12-23
| | | | | | | | | | I got a little annoyed by reading documentation paragraphs containing both spellings within a few lines of each other. My dictionary says "descendant" is the preferred spelling, and it's certainly the majority usage in our tree, so standardize on that. For one usage in parallel.sgml, I thought it better to rewrite to avoid the term altogether.
* Provide a DSA area for all parallel queries.Robert Haas2016-12-19
| | | | | | | This will allow future parallel query code to dynamically allocate storage shared by all participants. Thomas Munro, with assorted changes by me.
* Fix typo in commentMagnus Hagander2016-11-25
| | | | Thomas Munro
* Fix possible crash due to incorrect allocation context.Robert Haas2016-08-16
| | | | | | | | | | | | | Commit af33039317ddc4a0e38a02e2255c2bf453115fd2 aimed to reduce leakage from tqueue.c, which is good. Unfortunately, by changing the memory context in which all of gather_readnext() executes, it also changed the context in which ExecShutdownGatherWorkers executes, which is not good, because that function eventually causes a call to ExecParallelRetrieveInstrumentation, which proceeds to allocate planstate->worker_instrument in a short-lived context, causing a crash. Rushabh Lathia, reviewed by Amit Kapila and by me.
* Avoid invalidating all foreign-join cached plans when user mappings change.Tom Lane2016-07-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We must not push down a foreign join when the foreign tables involved should be accessed under different user mappings. Previously we tried to enforce that rule literally during planning, but that meant that the resulting plans were dependent on the current contents of the pg_user_mapping catalog, and we had to blow away all cached plans containing any remote join when anything at all changed in pg_user_mapping. This could have been improved somewhat, but the fact that a syscache inval callback has very limited info about what changed made it hard to do better within that design. Instead, let's change the planner to not consider user mappings per se, but to allow a foreign join if both RTEs have the same checkAsUser value. If they do, then they necessarily will use the same user mapping at runtime, and we don't need to know specifically which one that is. Post-plan-time changes in pg_user_mapping no longer require any plan invalidation. This rule does give up some optimization ability, to wit where two foreign table references come from views with different owners or one's from a view and one's directly in the query, but nonetheless the same user mapping would have applied. We'll sacrifice the first case, but to not regress more than we have to in the second case, allow a foreign join involving both zero and nonzero checkAsUser values if the nonzero one is the same as the prevailing effective userID. In that case, mark the plan as only runnable by that userID. The plancache code already had a notion of plans being userID-specific, in order to support RLS. It was a little confused though, in particular lacking clarity of thought as to whether it was the rewritten query or just the finished plan that's dependent on the userID. Rearrange that code so that it's clearer what depends on which, and so that the same logic applies to both RLS-injected role dependency and foreign-join-injected role dependency. Note that this patch doesn't remove the other issue mentioned in the original complaint, which is that while we'll reliably stop using a foreign join if it's disallowed in a new context, we might fail to start using a foreign join if it's now allowed, but we previously created a generic cached plan that didn't use one. It was agreed that the chance of winning that way was not high enough to justify the much larger number of plan invalidations that would have to occur if we tried to cause it to happen. In passing, clean up randomly-varying spelling of EXPLAIN commands in postgres_fdw.sql, and fix a COSTS ON example that had been allowed to leak into the committed tests. This reverts most of commits fbe5a3fb7 and 5d4171d1c, which were the previous attempt at ensuring we wouldn't push down foreign joins that span permissions contexts. Etsuro Fujita and Tom Lane Discussion: <d49c1e5b-f059-20f4-c132-e9752ee0113e@lab.ntt.co.jp>
* pgindent run for 9.6Robert Haas2016-06-09
|
* Use mul_size when multiplying by the number of parallel workers.Robert Haas2016-05-06
| | | | | | | | | | That way, if the result overflows size_t, you'll get an error instead of undefined behavior, which seems like a plus. This also has the effect of casting the number of workers from int to Size, which is better because it's harder to overflow int than size_t. Dilip Kumar reported this issue and provided a patch upon which this patch is based, but his version did use mul_size.
* Tweak a few more things in preparation for upcoming pgindent run.Robert Haas2016-05-03
| | | | | | | | These adjustments adjust code and comments in minor ways to prevent pgindent from mangling them. Among other things, I tried to avoid situations where pgindent would emit "a +b" instead of "a + b", and I tried to avoid having it break up inline comments across multiple lines.
* Clean up a few parallelism-related things that pgindent wants to mangle.Robert Haas2016-04-27
| | | | | | | | | | | | | | In nodeFuncs.c, pgindent wants to introduce spurious indentation into the definitions of planstate_tree_walker and planstate_walk_subplans. Fix that by spreading the definition out across several lines, similar to what is already done for other walker functions in that file. In execParallel.c, in the definition of SharedExecutorInstrumentation, pgindent wants to insert more whitespace between the type name and the member name. That causes it to mangle comments later on the line. Fix by moving the comments out of line. Now that we have a bit more room, add some more details that may be useful to the next person reading this code.
* Copyedit comments and documentation.Noah Misch2016-04-01
|
* Minor optimizations based on ParallelContext having nworkers_launched.Robert Haas2016-03-04
| | | | | | | | | | | Originally, we didn't have nworkers_launched, so code that used parallel contexts had to be preprared for the possibility that not all of the workers requested actually got launched. But now we can count on knowing the number of workers that were successfully launched, which can shave off a few cycles and simplify some code slightly. Amit Kapila, reviewed by Haribabu Kommi, per a suggestion from Peter Geoghegan.
* Allow parallel custom and foreign scans.Robert Haas2016-02-03
| | | | | | | | | | This patch doesn't put the new infrastructure to use anywhere, and indeed it's not clear how it could ever be used for something like postgres_fdw which has to send an SQL query and wait for a reply, but there might be FDWs or custom scan providers that are CPU-bound, so let's give them a way to join club parallel. KaiGai Kohei, reviewed by me.
* Only try to push down foreign joins if the user mapping OIDs match.Robert Haas2016-01-28
| | | | | | | | | | | | | Previously, the foreign join pushdown infrastructure left the question of security entirely up to individual FDWs, but it would be easy for a foreign data wrapper to inadvertently open up subtle security holes that way. So, make it the core code's job to determine which user mapping OID is relevant, and don't attempt join pushdown unless it's the same for all relevant relations. Per a suggestion from Tom Lane. Shigeru Hanada and Ashutosh Bapat, reviewed by Etsuro Fujita and KaiGai Kohei, with some further changes by me.
* Support parallel joins, and make related improvements.Robert Haas2016-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | | The core innovation of this patch is the introduction of the concept of a partial path; that is, a path which if executed in parallel will generate a subset of the output rows in each process. Gathering a partial path produces an ordinary (complete) path. This allows us to generate paths for parallel joins by joining a partial path for one side (which at the baserel level is currently always a Partial Seq Scan) to an ordinary path on the other side. This is subject to various restrictions at present, especially that this strategy seems unlikely to be sensible for merge joins, so only nested loops and hash joins paths are generated. This also allows an Append node to be pushed below a Gather node in the case of a partitioned table. Testing revealed that early versions of this patch made poor decisions in some cases, which turned out to be caused by the fact that the original cost model for Parallel Seq Scan wasn't very good. So this patch tries to make some modest improvements in that area. There is much more to be done in the area of generating good parallel plans in all cases, but this seems like a useful step forward. Patch by me, reviewed by Dilip Kumar and Amit Kapila.
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Allow EXPLAIN (ANALYZE, VERBOSE) to display per-worker statistics.Robert Haas2015-12-09
| | | | | | | | | | | The original parallel sequential scan commit included only very limited changes to the EXPLAIN output. Aggregated totals from all workers were displayed, but there was no way to see what each individual worker did or to distinguish the effort made by the workers from the effort made by the leader. Per a gripe by Thom Brown (and maybe others). Patch by me, reviewed by Amit Kapila.
* Avoid aggregating worker instrumentation multiple times.Robert Haas2015-11-18
| | | | Amit Kapila, per design ideas from me.
* Make sequential scans parallel-aware.Robert Haas2015-11-11
| | | | | | | | | | | | | | In addition, this path fills in a number of missing bits and pieces in the parallel infrastructure. Paths and plans now have a parallel_aware flag indicating whether whatever parallel-aware logic they have should be engaged. It is believed that we will need this flag for a number of path/plan types, not just sequential scans, which is why the flag is generic rather than part of the SeqScan structures specifically. Also, execParallel.c now gives parallel nodes a chance to initialize their PlanState nodes from the DSM during parallel worker startup. Amit Kapila, with a fair amount of adjustment by me. Review of previous patch versions by Haribabu Kommi and others.