aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAge
* Fix postmaster's behavior during smart shutdown.Tom Lane2020-08-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the postmaster has immediately killed all "optional" background processes, and subsequently refused to launch new ones while it's waiting for foreground client processes to exit. No doubt this seemed like an OK policy at some point; but it's a pretty bad one now, because it makes for a seriously degraded environment for the remaining clients: * Parallel queries are killed, and new ones fail to launch. (And our parallel-query infrastructure utterly fails to deal with the case in a reasonable way --- it just hangs waiting for workers that are not going to arrive. There is more work needed in that area IMO.) * Autovacuum ceases to function. We can tolerate that for awhile, but if bulk-update queries continue to run in the surviving client sessions, there's eventually going to be a mess. In the worst case the system could reach a forced shutdown to prevent XID wraparound. * The bgwriter and walwriter are also stopped immediately, likely resulting in performance degradation. Hence, let's rearrange things so that the only immediate change in behavior is refusing to let in new normal connections. Once the last normal connection is gone, shut everything down as though we'd received a "fast" shutdown. To implement this, remove the PM_WAIT_BACKUP and PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY while normal connections remain. A subsidiary state variable tracks whether or not we're letting in new connections in those states. This also allows having just one copy of the logic for killing child processes in smart and fast shutdown modes. I moved that logic into PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS. Back-patch to 9.6 where parallel query was added. In principle this'd be a good idea in 9.5 as well, but the risk/reward ratio is not as good there, since lack of autovacuum is not a problem during typical uses of smart shutdown. Per report from Bharath Rupireddy. Patch by me, reviewed by Thomas Munro Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com
* Fix typo in test comment.Heikki Linnakangas2020-08-14
|
* Doc: improve examples for json_populate_record() and related functions.Tom Lane2020-08-13
| | | | | | | Make these examples self-contained by providing declarations of the user-defined row types they rely on. There wasn't room to do this in the old doc format, but now there is, and I think it makes the examples a good bit less confusing.
* Handle new HOT chains in index-build table scansAlvaro Herrera2020-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | When a table is scanned by heapam_index_build_range_scan (née IndexBuildHeapScan) and the table lock being held allows concurrent data changes, it is possible for new HOT chains to sprout in a page that were unknown when the scan of a page happened. This leads to an error such as ERROR: failed to find parent tuple for heap-only tuple at (X,Y) in table "tbl" because the root tuple was not present when we first obtained the list of the page's root tuples. This can be fixed by re-obtaining the list of root tuples, if we see that a heap-only tuple appears to point to a non-existing root. This was reported by Anastasia as occurring for BRIN summarization (which exists since 9.5), but I think it could theoretically also happen with CREATE INDEX CONCURRENTLY (much older) or REINDEX CONCURRENTLY (very recent). It seems a happy coincidence that BRIN forces us to backpatch this all the way to 9.5. Reported-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Diagnosed-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Co-authored-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/602d8487-f0b2-5486-0088-0f372b2549fa@postgrespro.ru Backpatch: 9.5 - master
* BRIN: Handle concurrent desummarization properlyAlvaro Herrera2020-08-12
| | | | | | | | | | | | | | | If a page range is desummarized at just the right time concurrently with an index walk, BRIN would raise an error indicating index corruption. This is scary and unhelpful; silently returning that the page range is not summarized is sufficient reaction. This bug was introduced by commit 975ad4e602ff as additional protection against a bug whose actual fix was elsewhere. Backpatch equally. Reported-By: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Diagnosed-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/2588667e-d07d-7e10-74e2-7e1e46194491@postgrespro.ru Backpatch: 9.5 - master
* Stamp 13beta3.REL_13_BETA3Tom Lane2020-08-10
|
* Document clashes between logical replication and untrusted users.Noah Misch2020-08-10
| | | | | | Back-patch to v10, which introduced logical replication. Security: CVE-2020-14349
* Empty search_path in logical replication apply worker and walsender.Noah Misch2020-08-10
| | | | | | | | | | | | | | This is like CVE-2018-1058 commit 582edc369cdbd348d68441fc50fa26a84afd0c1a. Today, a malicious user of a publisher or subscriber database can invoke arbitrary SQL functions under an identity running replication, often a superuser. This fix may cause "does not exist" or "no schema has been selected to create in" errors in a replication process. After upgrading, consider watching server logs for these errors. Objects accruing schema qualification in the wake of the earlier commit are unlikely to need further correction. Back-patch to v10, which introduced logical replication. Security: CVE-2020-14349
* Move connect.h from fe_utils to src/include/common.Noah Misch2020-08-10
| | | | | | | Any libpq client can use the header. Clients include backend components postgres_fdw, dblink, and logical replication apply worker. Back-patch to v10, because another fix needs this. In released branches, just copy the header and keep the original.
* Make contrib modules' installation scripts more secure.Tom Lane2020-08-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hostile objects located within the installation-time search_path could capture references in an extension's installation or upgrade script. If the extension is being installed with superuser privileges, this opens the door to privilege escalation. While such hazards have existed all along, their urgency increases with the v13 "trusted extensions" feature, because that lets a non-superuser control the installation path for a superuser-privileged script. Therefore, make a number of changes to make such situations more secure: * Tweak the construction of the installation-time search_path to ensure that references to objects in pg_catalog can't be subverted; and explicitly add pg_temp to the end of the path to prevent attacks using temporary objects. * Disable check_function_bodies within installation/upgrade scripts, so that any security gaps in SQL-language or PL-language function bodies cannot create a risk of unwanted installation-time code execution. * Adjust lookup of type input/receive functions and join estimator functions to complain if there are multiple candidate functions. This prevents capture of references to functions whose signature is not the first one checked; and it's arguably more user-friendly anyway. * Modify various contrib upgrade scripts to ensure that catalog modification queries are executed with secure search paths. (These are in-place modifications with no extension version changes, since it is the update process itself that is at issue, not the end result.) Extensions that depend on other extensions cannot be made fully secure by these methods alone; therefore, revert the "trusted" marking that commit eb67623c9 applied to earthdistance and hstore_plperl, pending some better solution to that set of issues. Also add documentation around these issues, to help extension authors write secure installation scripts. Patch by me, following an observation by Andres Freund; thanks to Noah Misch for review. Security: CVE-2020-14350
* Translation updatesPeter Eisentraut2020-08-10
| | | | | Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 42620448109473e0d2271f0f0015d3647fbbfff6
* Doc: update v13 release notes for changes through today.Tom Lane2020-08-09
|
* Check for fseeko() failure in pg_dump's _tarAddFile().Tom Lane2020-08-09
| | | | | | | | | Coverity pointed out, not unreasonably, that we checked fseeko's result at every other call site but these. Failure to seek in the temp file (note this is NOT pg_dump's output file) seems quite unlikely, and even if it did happen the file length cross-check further down would probably detect the problem. Still, that's a poor excuse for not checking the result of a system call.
* walsnd: Don't set waiting_for_ping_response spuriouslyAlvaro Herrera2020-08-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ashutosh Bapat noticed that when logical walsender needs to wait for WAL, and it realizes that it must send a keepalive message to walreceiver to update the sent-LSN, which *does not* request a reply from walreceiver, it wrongly sets the flag that it's going to wait for that reply. That means that any future would-be sender of feedback messages ends up not sending a feedback message, because they all believe that a reply is expected. With built-in logical replication there's not much harm in this, because WalReceiverMain will send a ping-back every wal_receiver_timeout/2 anyway; but with other logical replication systems (e.g. pglogical) it can cause significant pain. This problem was introduced in commit 41d5f8ad734, where the request-reply flag was changed from true to false to WalSndKeepalive, without at the same time removing the line that sets waiting_for_ping_response. Just removing that line would be a sufficient fix, but it seems better to shift the responsibility of setting the flag to WalSndKeepalive itself instead of requiring caller to do it; this is clearly less error-prone. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> Backpatch: 9.5 and up Discussion: https://postgr.es/m/20200806225558.GA22401@alvherre.pgsql
* Add list of acknowledgments to release notesPeter Eisentraut2020-08-07
| | | | | | | This contains all individuals mentioned in the commit messages during PostgreSQL 13 development. current through 79a3ab1e98d6b5952e29ad91e07c0e9fc777cc0b
* Fix yet another issue with step generation in partition pruning.Etsuro Fujita2020-08-07
| | | | | | | | | | | | | | | | Commit 13838740f fixed some issues with step generation in partition pruning, but there was yet another one: get_steps_using_prefix() assumes that clauses in the passed-in prefix list are sorted in ascending order of their partition key numbers, but the caller failed to ensure this for range partitioning, which led to an assertion failure in debug builds. Adjust the caller function to arrange the clauses in the prefix list in the required order for range partitioning. Back-patch to v11, like the previous commit. Patch by me, reviewed by Amit Langote. Discussion: https://postgr.es/m/CAPmGK16jkXiFG0YqMbU66wte-oJTfW6D1HaNvQf%3D%2B5o9%3Dm55wQ%40mail.gmail.com
* amcheck: Sanitize metapage's allequalimage field.Peter Geoghegan2020-08-06
| | | | | | | This will be helpful if it ever proves necessary to revoke an opclass's support for deduplication. Backpatch: 13-, where nbtree deduplication was introduced.
* Fix bogus EXPLAIN output for Hash AggregateDavid Rowley2020-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 9bdb300de modified the EXPLAIN output for Hash Aggregate to show details from parallel workers. However, it neglected to consider that a given parallel worker may not have assisted with the given Hash Aggregate. This can occur when workers fail to start or during Parallel Append with enable_partitionwise_join enabled when only a single worker is working on a non-parallel aware sub-plan. It could also happen if a worker simply wasn't fast enough to get any work done before other processes went and finished all the work. The bogus output came from the fact that ExplainOpenWorker() skipped showing any details for non-initialized workers but show_hashagg_info() did show details from the worker. This meant that the worker properties that were shown were not properly attributed to the worker that they belong to. In passing, we also now don't show Hash Aggregate properties for the leader process when it did not contribute any work to the Hash Aggregate. This can occur either during Parallel Append when only a parallel worker worked on a given sub plan or with parallel_leader_participation set to off. This aims to make the behavior of Hash Aggregate's EXPLAIN output more similar to Sort's. Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20200805012105.GZ28072%40telsasoft.com Backpatch-through: 13, where the original breakage was introduced
* doc: clarify "state" table reference in tutorialBruce Momjian2020-08-05
| | | | | | | | Reported-by: Vyacheslav Shablistyy Discussion: https://postgr.es/m/159586122762.680.1361378513036616007@wrigleys.postgresql.org Backpatch-through: 9.5
* Fix matching of sub-partitions when a partitioned plan is stale.Tom Lane2020-08-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we no longer require AccessExclusiveLock to add a partition, the executor may see that a partitioned table has more partitions than the planner saw. ExecCreatePartitionPruneState's code for matching up the partition lists in such cases was faulty, and would misbehave if the planner had successfully pruned any partitions from the query. (Thus, trouble would occur only if a partition addition happens concurrently with a query that uses both static and dynamic partition pruning.) This led to an Assert failure in debug builds, and probably to crashes or query misbehavior in production builds. To repair the bug, just explicitly skip zeroes in the plan's relid_map[] list. I also made some cosmetic changes to make the code more readable (IMO anyway). Also, convert the cross-checking Assert to a regular test-and-elog, since it's now apparent that this logic is more fragile than one would like. Currently, there's no way to repeatably exercise this code, except with manual use of a debugger to stop the backend between planning and execution. Hence, no test case in this patch. We oughta do something about that testability gap, but that's for another day. Amit Langote and Tom Lane, per report from Justin Pryzby. Oversight in commit 898e5e329; backpatch to v12 where that appeared. Discussion: https://postgr.es/m/20200802181131.GA27754@telsasoft.com
* Increase hard-wired timeout values in ecpg regression tests.Tom Lane2020-08-04
| | | | | | | | | | | | | | A couple of test cases had connect_timeout=14, a value that seems to have been plucked from a hat. While it's more than sufficient for normal cases, slow/overloaded buildfarm machines can get a timeout failure here, as per recent report from "sungazer". Increase to 180 seconds, which is in line with our typical timeouts elsewhere in the regression tests. Back-patch to 9.6; the code looks different in 9.5, and this doesn't seem to be quite worth the effort to adapt to that. Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2020-08-04%2007%3A12%3A22
* Make new SSL TAP test for channel_binding more robustMichael Paquier2020-08-04
| | | | | | | | | | The test would fail in an environment including a certificate file in ~/.postgresql/. bdd6e9b fixed a similar failure, and d6e612f introduced the same problem again with a new test. Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20200804.120033.31225582282178001.horikyota.ntt@gmail.com Backpatch-through: 13
* doc: PG 13 relnotes: hash_mem_multiplier can restore old behav.Bruce Momjian2020-08-03
| | | | | | | | | | | Document that hash_mem_multiplier can get query behavior closer to the pre-PG 13 behavior of allowing hashing to use more memory. Reported-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wzn3kwQm_pe6g2=ki+P7+ZRqH5GvFGn6SWfv_j7UUgcLdQ@mail.gmail.com Backpatch-through: 13 only
* Remove unnecessary "DISTINCT" in psql's queries for \dAc and \dAf.Tom Lane2020-08-03
| | | | | | | | A moment's examination of these queries is sufficient to see that they do not produce duplicate rows, unless perhaps there's catalog corruption. Using DISTINCT anyway is inefficient and confusing; moreover it sets a poor example for anyone who refers to psql -E output to see how to query the catalogs.
* Doc: fix obsolete info about allowed range of TZ offsets in timetz.Tom Lane2020-08-03
| | | | | | | | | We've allowed UTC offsets up to +/- 15:59 since commit cd0ff9c0f, but that commit forgot to fix the documentation about timetz. Per bug #16571 from osdba. Discussion: https://postgr.es/m/16571-eb7501598de78c8a@postgresql.org
* Fix behavior of ecpg's "EXEC SQL elif name".Tom Lane2020-08-03
| | | | | | | | | | | | | | | | | | This ought to work much like C's "#elif defined(name)"; but the code implemented it in a way equivalent to endif followed by ifdef, so that it didn't matter whether any previous branch of the IF construct had succeeded. Fix that; add some test cases covering elif and nested IFs; and improve the documentation, which also seemed a bit confused. AFAICS the code has been like this since the feature was added in 1999 (commit b57b0e044). So while it's surely wrong, there might be code out there relying on the current behavior. Hence, don't back-patch into stable branches. It seems all right to fix it in v13 though. Per report from Ashutosh Sharma. Reviewed by Ashutosh Sharma and Michael Meskes. Discussion: https://postgr.es/m/CAE9k0P=dQk9X0cU2tN49S7a9tv733-e1pVdpB1P-pWJ5PdTktg@mail.gmail.com
* Fix rare failure in LDAP tests.Thomas Munro2020-08-03
| | | | | | | | | | | Instead of writing a query to psql's stdin, use -c. This avoids a failure where psql exits before we write, seen a few times on the build farm. Thanks to Tom Lane for the suggestion. Back-patch to 11, where the LDAP tests arrived. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/CA%2BhUKGLFmW%2BHQYPeKiwSp5sdFFHtFViCpw4Mh6yAgEx74r5-Cw%40mail.gmail.com
* Fix minor issues in psql's new \dAc and related commands.Tom Lane2020-08-02
| | | | | | | | | | | | | | | | | The type-name pattern in \dAc and \dAf was matched only to the actual pg_type.typname string, which is fairly user-unfriendly in cases where that is not what's shown to the user by format_type (compare "_int4" and "integer[]"). Make this code match what \dT does, i.e. match the pattern against either typname or format_type() output. Also fix its broken handling of schema-name restrictions. (IOW, make these processSQLNamePattern calls match \dT's.) While here, adjust whitespace to make the query a little prettier in -E output, too. Also improve some inaccuracies and shaky grammar in the related documentation. Noted while working on a patch for intarray's opclasses; I wondered why I couldn't get a match to "integer*" for the input type name.
* Use int64 instead of long in incremental sort codeDavid Rowley2020-08-02
| | | | | | | | | | Windows 64bit has 4-byte long values which is not suitable for tracking disk space usage in the incremental sort code. Let's just make all these fields int64s. Author: James Coleman Discussion: https://postgr.es/m/CAApHDvpky%2BUhof8mryPf5i%3D6e6fib2dxHqBrhp0Qhu0NeBhLJw%40mail.gmail.com Backpatch-through: 13, where the incremental sort code was added
* Restore lost amcheck TOAST test coverage.Peter Geoghegan2020-07-31
| | | | | | | | | | | | | | | | | | | | Commit eba77534 fixed an amcheck false positive bug involving inconsistencies in TOAST input state between table and index. A test case was added that verified that such an inconsistency didn't result in a spurious corruption related error. Test coverage from the test was accidentally lost by commit 501e41dd, which propagated ALTER TABLE ... SET STORAGE attstorage state to indexes. This broke the test because the test specifically relied on attstorage not being propagated. This artificially forced there to be index tuples whose datums were equivalent to the datums in the heap without the datums actually being bitwise equal. Fix this by updating pg_attribute directly instead. Commit 501e41dd made similar changes to a test_decoding TOAST-related test case which made the same assumption, but overlooked the amcheck test case. Backpatch: 11-, just like commit eba77534 (and commit 501e41dd).
* Fix oversight in ALTER TYPE: typmodin/typmodout must propagate to arrays.Tom Lane2020-07-31
| | | | | | | | | | | If a base type supports typmods, its array type does too, with the same interpretation. Hence changes in pg_type.typmodin/typmodout must be propagated to the array type. While here, improve AlterTypeRecurse to not recurse to domains if there is nothing we'd need to change. Oversight in fe30e7ebf. Back-patch to v13 where that came in.
* Fix recently-introduced performance problem in ts_headline().Tom Lane2020-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new hlCover() algorithm that I introduced in commit c9b0c678d turns out to potentially take O(N^2) or worse time on long documents, if there are many occurrences of individual query words but few or no substrings that actually satisfy the query. (One way to hit this behavior is with a "common_word & rare_word" type of query.) This seems unavoidable given the original goal of checking every substring of the document, so we have to back off that idea. Fortunately, it seems unlikely that anyone would really want headlines spanning all of a long document, so we can avoid the worse-than-linear behavior by imposing a maximum length of substring that we'll consider. For now, just hard-wire that maximum length as a multiple of max_words times max_fragments. Perhaps at some point somebody will argue for exposing it as a ts_headline parameter, but I'm hesitant to make such a feature addition in a back-patched bug fix. I also noted that the hlFirstIndex() function I'd added in that commit was unnecessarily stupid: it really only needs to check whether a HeadlineWordEntry's item pointer is null or not. This wouldn't make all that much difference in typical cases with queries having just a few terms, but a cycle shaved is a cycle earned. In addition, add a CHECK_FOR_INTERRUPTS call in TS_execute_recurse. This ensures that hlCover's loop is cancellable if it manages to take a long time, and it may protect some other TS_execute callers as well. Back-patch to 9.6 as the previous commit was. I also chose to add the CHECK_FOR_INTERRUPTS call to 9.5. The old hlCover() algorithm seems to avoid the O(N^2) behavior, at least on the test case I tried, but nonetheless it's not very quick on a long document. Per report from Stephen Frost. Discussion: https://postgr.es/m/20200724160535.GW12375@tamriel.snowman.net
* Doc: fix high availability solutions comparison.Tatsuo Ishii2020-07-31
| | | | | | | | | | | | In "High Availability, Load Balancing, and Replication" chapter, certain descriptions of Pgpool-II were not correct at this point. It does not need conflict resolution. Also "Multiple-Server Parallel Query Execution" is not supported anymore. Discussion: https://postgr.es/m/20200726.230128.53842489850344110.t-ishii%40sraoss.co.jp Author: Tatsuo Ishii Reviewed-by: Bruce Momjian Backpatch-through: 9.5
* Use pg_bitutils for HyperLogLog.Jeff Davis2020-07-30
| | | | | | | | | | | Using pg_leftmost_one_post32() yields substantial performance benefits. Backpatching to version 13 because HLL is used for HashAgg improvements in 9878b643, which was also backpatched to 13. Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-WzkGvDKVDo+0YvfvZ+1CE=iCi88DCOGFF3i1hTGGaxcKPw@mail.gmail.com Backpatch-through: 13
* doc: Mention index references in pg_inheritsMichael Paquier2020-07-30
| | | | | | | | | Partitioned indexes are also registered in pg_inherits, but the description of this catalog did not reflect that. Author: Dagfinn Ilmari Mannsåker Discussion: https://postgr.es/m/87k0ynj35y.fsf@wibble.ilmari.org Backpatch-through: 11
* Add hash_mem_multiplier GUC.Peter Geoghegan2020-07-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a GUC that acts as a multiplier on work_mem. It gets applied when sizing executor node hash tables that were previously size constrained using work_mem alone. The new GUC can be used to preferentially give hash-based nodes more memory than the generic work_mem limit. It is intended to enable admin tuning of the executor's memory usage. Overall system throughput and system responsiveness can be improved by giving hash-based executor nodes more memory (especially over sort-based alternatives, which are often much less sensitive to being memory constrained). The default value for hash_mem_multiplier is 1.0, which is also the minimum valid value. This means that hash-based nodes continue to apply work_mem in the traditional way by default. hash_mem_multiplier is generally useful. However, it is being added now due to concerns about hash aggregate performance stability for users that upgrade to Postgres 13 (which added disk-based hash aggregation in commit 1f39bce0). While the old hash aggregate behavior risked out-of-memory errors, it is nevertheless likely that many users actually benefited. Hash agg's previous indifference to work_mem during query execution was not just faster; it also accidentally made aggregation resilient to grouping estimate problems (at least in cases where this didn't create destabilizing memory pressure). hash_mem_multiplier can provide a certain kind of continuity with the behavior of Postgres 12 hash aggregates in cases where the planner incorrectly estimates that all groups (plus related allocations) will fit in work_mem/hash_mem. This seems necessary because hash-based aggregation is usually much slower when only a small fraction of all groups can fit. Even when it isn't possible to totally avoid hash aggregates that spill, giving hash aggregation more memory will reliably improve performance (the same cannot be said for external sort operations, which appear to be almost unaffected by memory availability provided it's at least possible to get a single merge pass). The PostgreSQL 13 release notes should advise users that increasing hash_mem_multiplier can help with performance regressions associated with hash aggregation. That can be taken care of by a later commit. Author: Peter Geoghegan Reviewed-By: Álvaro Herrera, Jeff Davis Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com Backpatch: 13-, where disk-based hash aggregation was introduced.
* HashAgg: use better cardinality estimate for recursive spilling.Jeff Davis2020-07-28
| | | | | | | | | | | | | | | | Use HyperLogLog to estimate the group cardinality in a spilled partition. This estimate is used to choose the number of partitions if we recurse. The previous behavior was to use the number of tuples in a spilled partition as the estimate for the number of groups, which lead to overpartitioning. That could cause the number of batches to be much higher than expected (with each batch being very small), which made it harder to interpret EXPLAIN ANALYZE results. Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/a856635f9284bc36f7a77d02f47bbb6aaf7b59b3.camel@j-davis.com Backpatch-through: 13
* Rename another "hash_mem" local variable.Peter Geoghegan2020-07-28
| | | | | | Missed by my commit 564ce621. Backpatch: 13-, where disk-based hash aggregation was introduced.
* Correct obsolete UNION hash aggs comment.Peter Geoghegan2020-07-28
| | | | | | Oversight in commit 1f39bce0, which added disk-based hash aggregation. Backpatch: 13-, where disk-based hash aggregation was introduced.
* Doc: Remove obsolete CREATE AGGREGATE note.Peter Geoghegan2020-07-28
| | | | | | | | | | | | | | | The planner is in fact willing to use hash aggregation when work_mem is not set high enough for everything to fit in memory. This has been the case since commit 1f39bce0, which added disk-based hash aggregation. There are a few remaining cases in which hash aggregation is avoided as a matter of policy when the planner surmises that spilling will be necessary. For example, callers of choose_hashed_setop() still conservatively avoid hash aggregation when spilling is anticipated. That doesn't seem like a good enough reason to mention hash aggregation in this context. Backpatch: 13-, where disk-based hash aggregation was introduced.
* Make EXPLAIN ANALYZE of HashAgg more similar to Hash JoinDavid Rowley2020-07-29
| | | | | | | | | | | | | | | | | | | | | | | | | There were various unnecessary differences between Hash Agg's EXPLAIN ANALYZE output and Hash Join's. Here we modify the Hash Agg output so that it's better aligned to Hash Join's. The following changes have been made: 1. Start batches counter at 1 instead of 0. 2. Always display the "Batches" property, even when we didn't spill to disk. 3. Use the text "Batches" instead of "HashAgg Batches" for text format. 4. Use the text "Memory Usage" instead of "Peak Memory Usage" for text format. 5. Include "Batches" before "Memory Usage" in both text and non-text formats. In passing also modify the "Planned Partitions" property so that we show it regardless of if the value is 0 or not for non-text EXPLAIN formats. This was pointed out by Justin Pryzby and probably should have been part of 40efbf870. Reviewed-by: Justin Pryzby, Jeff Davis Discussion: https://postgr.es/m/CAApHDvrshRnA6C0VFnu7Fb9TVvgGo80PUMm5+2DiaS1gEkPvtw@mail.gmail.com Backpatch-through: 13, where HashAgg batching was introduced
* Doc: Improve documentation for pg_jit_available()David Rowley2020-07-28
| | | | | | | Per complaint from Scott Ribe. Based on wording suggestion from Tom Lane. Discussion: https://postgr.es/m/1956E806-1468-4417-9A9D-235AE1D5FE1A@elevated-dev.com Backpatch-through: 11, where pg_jit_available() was added
* doc: Mention the rename of wal_keep_segments GUC in release note.Fujii Masao2020-07-28
| | | | | | | | | Commit f5dff45962 renamed wal_keep_segments to wal_keep_size. This commit adds the mention to this change in the release note. Author: Fujii Masao Reviewed-by: David Steele Discussion: https://postgr.es/m/dc4768f2-1eff-d2fc-35ba-6b2985b7cb6c@oss.nttdata.com
* Fix some issues with step generation in partition pruning.Etsuro Fujita2020-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case of range partitioning, get_steps_using_prefix() assumes that the passed-in prefix list contains at least one clause for each of the partition keys earlier than one specified in the passed-in step_lastkeyno, but the caller (ie, gen_prune_steps_from_opexps()) didn't take it into account, which led to a server crash or incorrect results when the list contained no clauses for such partition keys, as reported in bug #16500 and #16501 from Kobayashi Hisanori. Update the caller to call that function only when the list created there contains at least one clause for each of the earlier partition keys in the case of range partitioning. While at it, fix some other issues: * The list to pass to get_steps_using_prefix() is allowed to contain multiple clauses for the same partition key, as described in the comment for that function, but that function actually assumed that the list contained just a single clause for each of middle partition keys, which led to an assertion failure when the list contained multiple clauses for such partition keys. Update that function to match the comment. * In the case of hash partitioning, partition keys are allowed to be NULL, in which case the list to pass to get_steps_using_prefix() contains no clauses for NULL partition keys, but that function treats that case as like the case of range partitioning, which led to the assertion failure. Update the assertion test to take into account NULL partition keys in the case of hash partitioning. * Fix a typo in a comment in get_steps_using_prefix_recurse(). * gen_partprune_steps() failed to detect self-contradiction from strict-qual clauses and an IS NULL clause for the same partition key in some cases, producing incorrect partition-pruning steps, which led to incorrect results of partition pruning, but didn't cause any user-visible problems fortunately, as the self-contradiction is detected later in the query planning. Update that function to detect the self-contradiction. Per bug #16500 and #16501 from Kobayashi Hisanori. Patch by me, initial diagnosis for the reported issue and review by Dmitry Dolgov. Back-patch to v11, where partition pruning was introduced. Discussion: https://postgr.es/m/16500-d1613f2a78e1e090%40postgresql.org Discussion: https://postgr.es/m/16501-5234a9a0394f6754%40postgresql.org
* Remove hashagg_avoid_disk_plan GUC.Peter Geoghegan2020-07-27
| | | | | | | | | | | Note: This GUC was originally named enable_hashagg_disk when it appeared in commit 1f39bce0, which added disk-based hash aggregation. It was subsequently renamed in commit 92c58fd9. Author: Peter Geoghegan Reviewed-By: Jeff Davis, Álvaro Herrera Discussion: https://postgr.es/m/9d9d1e1252a52ea1bad84ea40dbebfd54e672a0f.camel%40j-davis.com Backpatch: 13-, where disk-based hash aggregation was introduced.
* Fix corner case with 16kB-long decompression in pgcrypto, take 2Michael Paquier2020-07-27
| | | | | | | | | | | | | | | | | | | | | | | A compressed stream may end with an empty packet. In this case decompression finishes before reading the empty packet and the remaining stream packet causes a failure in reading the following data. This commit makes sure to consume such extra data, avoiding a failure when decompression the data. This corner case was reproducible easily with a data length of 16kB, and existed since e94dd6a. A cheap regression test is added to cover this case based on a random, incompressible string. The first attempt of this patch has allowed to find an older failure within the compression logic of pgcrypto, fixed by b9b6105. This involved SLES 15 with z390 where a custom flavor of libz gets used. Bonus thanks to Mark Wong for providing access to the specific environment. Reported-by: Frank Gagnepain Author: Kyotaro Horiguchi, Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/16476-692ef7b84e5fb893@postgresql.org Backpatch-through: 9.5
* Fix handling of structure for bytea data type in ECPGMichael Paquier2020-07-27
| | | | | | | | | | | | | | Some code paths dedicated to bytea used the structure for varchar. This did not lead to any actual bugs, as bytea and varchar have the same definition, but it could become a trap if one of these definitions changes for a new feature or a bug fix. Issue introduced by 050710b. Author: Shenhao Wang Reviewed-by: Vignesh C, Michael Paquier Discussion: https://postgr.es/m/07ac7dee1efc44f99d7f53a074420177@G08CNEXMBPEKD06.g08.fujitsu.local Backpatch-through: 12
* Fix LookupTupleHashEntryHash() pipeline-stall issue.Jeff Davis2020-07-26
| | | | | | | | Refactor hash lookups in nodeAgg.c to improve performance. Author: Andres Freund and Jeff Davis Discussion: https://postgr.es/m/20200612213715.op4ye4q7gktqvpuo%40alap3.anarazel.de Backpatch-through: 13
* Tweak behavior of pg_stat_activity.leader_pidMichael Paquier2020-07-26
| | | | | | | | | | | | | | | | | | | | | | | The initial implementation of leader_pid in pg_stat_activity added by b025f32 took the approach to strictly print what a PGPROC entry includes. In short, if a backend has been involved in parallel query at least once, leader_pid would remain set as long as the backend is alive. For a parallel group leader, this means that the field would always be set after it participated at least once in parallel query, and after more discussions this could be confusing if using for example a connection pooler. This commit changes the data printed so as leader_pid becomes always NULL for a parallel group leader, showing up a non-NULL value only for the parallel workers, and actually as long as a parallel query is running as workers are shut down once the query has completed. This does not change the definition of any catalog, so no catalog bump is needed. Per discussion with Justin Pryzby, Álvaro Herrera, Julien Rouhaud and me. Discussion: https://postgr.es/m/20200721035145.GB17300@paquier.xyz Backpatch-through: 13
* Fix buffer usage stats for nodes above Gather Merge.Amit Kapila2020-07-25
| | | | | | | | | | | | | | | Commit 85c9d347 addressed a similar problem for Gather and Gather Merge nodes but forgot to account for nodes above parallel nodes. This still works for nodes above Gather node because we shut down the workers for Gather node as soon as there are no more tuples. We can do a similar thing for Gather Merge as well but it seems better to account for stats during nodes shutdown after completing the execution. Reported-by: Stéphane Lorek, Jehan-Guillaume de Rorthais Author: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> Reviewed-by: Amit Kapila Backpatch-through: 10, where it was introduced Discussion: https://postgr.es/m/20200718160206.584532a2@firost