diff options
author | Robert Haas <rhaas@postgresql.org> | 2018-03-22 13:25:59 -0400 |
---|---|---|
committer | Robert Haas <rhaas@postgresql.org> | 2018-03-22 13:26:12 -0400 |
commit | f644c3b386acc9e1bfef2c4fbe738706d3ccf3a3 (patch) | |
tree | 98795aeb5dc649229d438c4ef0a4401a1d003f74 | |
parent | 649f1792508fb040a9b70c68dfedd6b93897e087 (diff) | |
download | postgresql-f644c3b386acc9e1bfef2c4fbe738706d3ccf3a3.tar.gz postgresql-f644c3b386acc9e1bfef2c4fbe738706d3ccf3a3.zip |
doc: Update parallel join documentation for Parallel Shared Hash.
Thomas Munro
Discussion: http://postgr.es/m/CAEepm=3XdL=+bn3=WQVCCT5wwfAEv-4onKpk+XQZdwDXv6etzA@mail.gmail.com
-rw-r--r-- | doc/src/sgml/parallel.sgml | 47 |
1 files changed, 32 insertions, 15 deletions
diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml index f15a9233cbf..d8f001d4b61 100644 --- a/doc/src/sgml/parallel.sgml +++ b/doc/src/sgml/parallel.sgml @@ -323,23 +323,40 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; more other tables using a nested loop, hash join, or merge join. The inner side of the join may be any kind of non-parallel plan that is otherwise supported by the planner provided that it is safe to run within - a parallel worker. For example, if a nested loop join is chosen, the - inner plan may be an index scan which looks up a value taken from the outer - side of the join. + a parallel worker. Depending on the join type, the inner side may also be + a parallel plan. </para> - <para> - Each worker will execute the inner side of the join in full. This is - typically not a problem for nested loops, but may be inefficient for - cases involving hash or merge joins. For example, for a hash join, this - restriction means that an identical hash table is built in each worker - process, which works fine for joins against small tables but may not be - efficient when the inner table is large. For a merge join, it might mean - that each worker performs a separate sort of the inner relation, which - could be slow. Of course, in cases where a parallel plan of this type - would be inefficient, the query planner will normally choose some other - plan (possibly one which does not use parallelism) instead. - </para> + <itemizedlist> + <listitem> + <para> + In a <emphasis>nested loop join</emphasis>, the inner side is always + non-parallel. Although it is executed in full, this is efficient if + the inner side is an index scan, because the outer tuples and thus + the loops that look up values in the index are divided over the + cooperating processes. + </para> + </listitem> + <listitem> + <para> + In a <emphasis>merge join</emphasis>, the inner side is always + a non-parallel plan and therefore executed in full. This may be + inefficient, especially if a sort must be performed, because the work + and resulting data are duplicated in every cooperating process. + </para> + </listitem> + <listitem> + <para> + In a <emphasis>hash join</emphasis> (without the "parallel" prefix), + the inner side is executed in full by every cooperating process + to build identical copies of the hash table. This may be inefficient + if the hash table is large or the plan is expensive. In a + <emphasis>parallel hash join</emphasis>, the inner side is a + <emphasis>parallel hash</emphasis> that divides the work of building + a shared hash table over the cooperating processes. + </para> + </listitem> + </itemizedlist> </sect2> <sect2 id="parallel-aggregation"> |