diff options
author | Peter Geoghegan <pg@bowt.ie> | 2020-07-29 14:14:58 -0700 |
---|---|---|
committer | Peter Geoghegan <pg@bowt.ie> | 2020-07-29 14:14:58 -0700 |
commit | d6c08e29e7bc8bc3bf49764192c4a9c71fc0b097 (patch) | |
tree | 8d0d2cdb7d18504b50a49433f9181130f74186c4 /src/backend/executor/nodeAgg.c | |
parent | 6023b7ea717ca04cf1bd53709d9c862db07eaefb (diff) | |
download | postgresql-d6c08e29e7bc8bc3bf49764192c4a9c71fc0b097.tar.gz postgresql-d6c08e29e7bc8bc3bf49764192c4a9c71fc0b097.zip |
Add hash_mem_multiplier GUC.
Add a GUC that acts as a multiplier on work_mem. It gets applied when
sizing executor node hash tables that were previously size constrained
using work_mem alone.
The new GUC can be used to preferentially give hash-based nodes more
memory than the generic work_mem limit. It is intended to enable admin
tuning of the executor's memory usage. Overall system throughput and
system responsiveness can be improved by giving hash-based executor
nodes more memory (especially over sort-based alternatives, which are
often much less sensitive to being memory constrained).
The default value for hash_mem_multiplier is 1.0, which is also the
minimum valid value. This means that hash-based nodes continue to apply
work_mem in the traditional way by default.
hash_mem_multiplier is generally useful. However, it is being added now
due to concerns about hash aggregate performance stability for users
that upgrade to Postgres 13 (which added disk-based hash aggregation in
commit 1f39bce0). While the old hash aggregate behavior risked
out-of-memory errors, it is nevertheless likely that many users actually
benefited. Hash agg's previous indifference to work_mem during query
execution was not just faster; it also accidentally made aggregation
resilient to grouping estimate problems (at least in cases where this
didn't create destabilizing memory pressure).
hash_mem_multiplier can provide a certain kind of continuity with the
behavior of Postgres 12 hash aggregates in cases where the planner
incorrectly estimates that all groups (plus related allocations) will
fit in work_mem/hash_mem. This seems necessary because hash-based
aggregation is usually much slower when only a small fraction of all
groups can fit. Even when it isn't possible to totally avoid hash
aggregates that spill, giving hash aggregation more memory will reliably
improve performance (the same cannot be said for external sort
operations, which appear to be almost unaffected by memory availability
provided it's at least possible to get a single merge pass).
The PostgreSQL 13 release notes should advise users that increasing
hash_mem_multiplier can help with performance regressions associated
with hash aggregation. That can be taken care of by a later commit.
Author: Peter Geoghegan
Reviewed-By: Álvaro Herrera, Jeff Davis
Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de
Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com
Backpatch: 13-, where disk-based hash aggregation was introduced.
Diffstat (limited to 'src/backend/executor/nodeAgg.c')
-rw-r--r-- | src/backend/executor/nodeAgg.c | 30 |
1 files changed, 16 insertions, 14 deletions
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index 02a9165c694..9776263ae75 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -203,7 +203,7 @@ * entries (and initialize new transition states), we instead spill them to * disk to be processed later. The tuples are spilled in a partitioned * manner, so that subsequent batches are smaller and less likely to exceed - * work_mem (if a batch does exceed work_mem, it must be spilled + * hash_mem (if a batch does exceed hash_mem, it must be spilled * recursively). * * Spilled data is written to logical tapes. These provide better control @@ -212,7 +212,7 @@ * * Note that it's possible for transition states to start small but then * grow very large; for instance in the case of ARRAY_AGG. In such cases, - * it's still possible to significantly exceed work_mem. We try to avoid + * it's still possible to significantly exceed hash_mem. We try to avoid * this situation by estimating what will fit in the available memory, and * imposing a limit on the number of groups separately from the amount of * memory consumed. @@ -1516,7 +1516,7 @@ build_hash_table(AggState *aggstate, int setno, long nbuckets) /* * Used to make sure initial hash table allocation does not exceed - * work_mem. Note that the estimate does not include space for + * hash_mem. Note that the estimate does not include space for * pass-by-reference transition data values, nor for the representative * tuple of each group. */ @@ -1782,7 +1782,7 @@ hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck) } /* - * Set limits that trigger spilling to avoid exceeding work_mem. Consider the + * Set limits that trigger spilling to avoid exceeding hash_mem. Consider the * number of partitions we expect to create (if we do spill). * * There are two limits: a memory limit, and also an ngroups limit. The @@ -1796,13 +1796,14 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits, { int npartitions; Size partition_mem; + int hash_mem = get_hash_mem(); - /* if not expected to spill, use all of work_mem */ - if (input_groups * hashentrysize < work_mem * 1024L) + /* if not expected to spill, use all of hash_mem */ + if (input_groups * hashentrysize < hash_mem * 1024L) { if (num_partitions != NULL) *num_partitions = 0; - *mem_limit = work_mem * 1024L; + *mem_limit = hash_mem * 1024L; *ngroups_limit = *mem_limit / hashentrysize; return; } @@ -1824,14 +1825,14 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits, HASHAGG_WRITE_BUFFER_SIZE * npartitions; /* - * Don't set the limit below 3/4 of work_mem. In that case, we are at the + * Don't set the limit below 3/4 of hash_mem. In that case, we are at the * minimum number of partitions, so we aren't going to dramatically exceed * work mem anyway. */ - if (work_mem * 1024L > 4 * partition_mem) - *mem_limit = work_mem * 1024L - partition_mem; + if (hash_mem * 1024L > 4 * partition_mem) + *mem_limit = hash_mem * 1024L - partition_mem; else - *mem_limit = work_mem * 1024L * 0.75; + *mem_limit = hash_mem * 1024L * 0.75; if (*mem_limit > hashentrysize) *ngroups_limit = *mem_limit / hashentrysize; @@ -1989,19 +1990,20 @@ hash_choose_num_partitions(double input_groups, double hashentrysize, int partition_limit; int npartitions; int partition_bits; + int hash_mem = get_hash_mem(); /* * Avoid creating so many partitions that the memory requirements of the - * open partition files are greater than 1/4 of work_mem. + * open partition files are greater than 1/4 of hash_mem. */ partition_limit = - (work_mem * 1024L * 0.25 - HASHAGG_READ_BUFFER_SIZE) / + (hash_mem * 1024L * 0.25 - HASHAGG_READ_BUFFER_SIZE) / HASHAGG_WRITE_BUFFER_SIZE; mem_wanted = HASHAGG_PARTITION_FACTOR * input_groups * hashentrysize; /* make enough partitions so that each one is likely to fit in memory */ - npartitions = 1 + (mem_wanted / (work_mem * 1024L)); + npartitions = 1 + (mem_wanted / (hash_mem * 1024L)); if (npartitions > partition_limit) npartitions = partition_limit; |