aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRobert Haas <rhaas@postgresql.org>2018-03-22 13:25:59 -0400
committerRobert Haas <rhaas@postgresql.org>2018-03-22 13:26:12 -0400
commitf644c3b386acc9e1bfef2c4fbe738706d3ccf3a3 (patch)
tree98795aeb5dc649229d438c4ef0a4401a1d003f74
parent649f1792508fb040a9b70c68dfedd6b93897e087 (diff)
downloadpostgresql-f644c3b386acc9e1bfef2c4fbe738706d3ccf3a3.tar.gz
postgresql-f644c3b386acc9e1bfef2c4fbe738706d3ccf3a3.zip
doc: Update parallel join documentation for Parallel Shared Hash.
Thomas Munro Discussion: http://postgr.es/m/CAEepm=3XdL=+bn3=WQVCCT5wwfAEv-4onKpk+XQZdwDXv6etzA@mail.gmail.com
-rw-r--r--doc/src/sgml/parallel.sgml47
1 files changed, 32 insertions, 15 deletions
diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index f15a9233cbf..d8f001d4b61 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -323,23 +323,40 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
more other tables using a nested loop, hash join, or merge join. The
inner side of the join may be any kind of non-parallel plan that is
otherwise supported by the planner provided that it is safe to run within
- a parallel worker. For example, if a nested loop join is chosen, the
- inner plan may be an index scan which looks up a value taken from the outer
- side of the join.
+ a parallel worker. Depending on the join type, the inner side may also be
+ a parallel plan.
</para>
- <para>
- Each worker will execute the inner side of the join in full. This is
- typically not a problem for nested loops, but may be inefficient for
- cases involving hash or merge joins. For example, for a hash join, this
- restriction means that an identical hash table is built in each worker
- process, which works fine for joins against small tables but may not be
- efficient when the inner table is large. For a merge join, it might mean
- that each worker performs a separate sort of the inner relation, which
- could be slow. Of course, in cases where a parallel plan of this type
- would be inefficient, the query planner will normally choose some other
- plan (possibly one which does not use parallelism) instead.
- </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ In a <emphasis>nested loop join</emphasis>, the inner side is always
+ non-parallel. Although it is executed in full, this is efficient if
+ the inner side is an index scan, because the outer tuples and thus
+ the loops that look up values in the index are divided over the
+ cooperating processes.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ In a <emphasis>merge join</emphasis>, the inner side is always
+ a non-parallel plan and therefore executed in full. This may be
+ inefficient, especially if a sort must be performed, because the work
+ and resulting data are duplicated in every cooperating process.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ In a <emphasis>hash join</emphasis> (without the "parallel" prefix),
+ the inner side is executed in full by every cooperating process
+ to build identical copies of the hash table. This may be inefficient
+ if the hash table is large or the plan is expensive. In a
+ <emphasis>parallel hash join</emphasis>, the inner side is a
+ <emphasis>parallel hash</emphasis> that divides the work of building
+ a shared hash table over the cooperating processes.
+ </para>
+ </listitem>
+ </itemizedlist>
</sect2>
<sect2 id="parallel-aggregation">