aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Fix misuse of foreach_delete_current().Tom Lane2021-03-18
| | | | | | | | | Our coding convention requires this macro's result to be assigned back to the original List variable. In this usage, since the List could not become empty, there was no actual bug --- but some compilers warned about it. Oversight in be45be9c3. Discussion: https://postgr.es/m/35077b31-2d62-1e31-0e2e-ddb52d590b73@enterprisedb.com
* Implement GROUP BY DISTINCTTomas Vondra2021-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With grouping sets, it's possible that some of the grouping sets are duplicate. This is especially common with CUBE and ROLLUP clauses. For example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to GROUP BY GROUPING SETS ( (a, b, c), (a, b, c), (a, b, c), (a, b), (a, b), (a, b), (a), (a), (a), (c, a), (c, a), (c, a), (c), (b, c), (b), () ) Some of the grouping sets are calculated multiple times, which is mostly unnecessary. This commit implements a new GROUP BY DISTINCT feature, as defined in the SQL standard, which eliminates the duplicate sets. Author: Vik Fearing Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
* Remove temporary files after backend crashTomas Vondra2021-03-18
| | | | | | | | | | | | | | | | | | After a crash of a backend using temporary files, the files used to be left behind, on the basis that it might be useful for debugging. But we don't have any reports of anyone actually doing that, and it means the disk usage may grow over time due to repeated backend failures (possibly even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing it required either manual cleanup (deleting files, which is error-prone) or restart of the instance (i.e. service disruption). This implements automatic cleanup of temporary files, controled by a new GUC remove_temp_files_after_crash. By default the files are removed, but it can be disabled to restore the old behavior if needed. Author: Euler Taveira Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com
* Fix function name in error hintMagnus Hagander2021-03-18
| | | | | | | | | | | pg_read_file() is the function that's in core, pg_file_read() is in adminpack. But when using pg_file_read() in adminpack it calls the *C* level function pg_read_file() in core, which probably threw the original author off. But the error hint should be about the SQL function. Reported-By: Sergei Kornilov Backpatch-through: 11 Discussion: https://postgr.es/m/373021616060475@mail.yandex.ru
* Add a new GUC and a reloption to enable inserts in parallel-mode.Amit Kapila2021-03-18
| | | | | | | | | | | | | | | | | | | | | Commit 05c8482f7f added the implementation of parallel SELECT for "INSERT INTO ... SELECT ..." which may incur non-negligible overhead in the additional parallel-safety checks that it performs, even when, in the end, those checks determine that parallelism can't be used. This is normally only ever a problem in the case of when the target table has a large number of partitions. A new GUC option "enable_parallel_insert" is added, to allow insert in parallel-mode. The default is on. In addition to the GUC option, the user may want a mechanism to allow inserts in parallel-mode with finer granularity at table level. The new table option "parallel_insert_enabled" allows this. The default is true. Author: "Hou, Zhijie" Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
* Fix memory lifetime issues of replication slot stats.Andres Freund2021-03-17
| | | | | | | | | | | | | | | | When accessing replication slot stats, introduced in 98681675002d, pgstat_read_statsfiles() reads the data into newly allocated memory. Unfortunately the current memory context at that point is the callers, leading to leaks and use-after-free dangers. The fix is trivial, explicitly use pgStatLocalContext. There's some potential for further improvements, but that's outside of the scope of this bugfix. No backpatch necessary, feature is only in HEAD. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i@alap3.anarazel.de
* Code review for server's handling of "tablespace map" files.Tom Lane2021-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While looking at Robert Foggia's report, I noticed a passel of other issues in the same area: * The scheme for backslash-quoting newlines in pathnames is just wrong; it will misbehave if the last ordinary character in a pathname is a backslash. I'm not sure why we're bothering to allow newlines in tablespace paths, but if we're going to do it we should do it without introducing other problems. Hence, backslashes themselves have to be backslashed too. * The author hadn't read the sscanf man page very carefully, because this code would drop any leading whitespace from the path. (I doubt that a tablespace path with leading whitespace could happen in practice; but if we're bothering to allow newlines in the path, it sure seems like leading whitespace is little less implausible.) Using sscanf for the task of finding the first space is overkill anyway. * While I'm not 100% sure what the rationale for escaping both \r and \n is, if the idea is to allow Windows newlines in the file then this code failed, because it'd throw an error if it saw \r followed by \n. * There's no cross-check for an incomplete final line in the map file, which would be a likely apparent symptom of the improper-escaping bug. On the generation end, aside from the escaping issue we have: * If needtblspcmapfile is true then do_pg_start_backup will pass back escaped strings in tablespaceinfo->path values, which no caller wants or is prepared to deal with. I'm not sure if there's a live bug from that, but it looks like there might be (given the dubious assumption that anyone actually has newlines in their tablespace paths). * It's not being very paranoid about the possibility of random stuff in the pg_tblspc directory. IMO we should ignore anything without an OID-like name. The escaping rule change doesn't seem back-patchable: it'll require doubling of backslashes in the tablespace_map file, which is basically a basebackup format change. The odds of that causing trouble are considerably more than the odds of the existing bug causing trouble. The rest of this seems somewhat unlikely to cause problems too, so no back-patch.
* Prevent buffer overrun in read_tablespace_map().Tom Lane2021-03-17
| | | | | | | | | | | | | | | | Robert Foggia of Trustwave reported that read_tablespace_map() fails to prevent an overrun of its on-stack input buffer. Since the tablespace map file is presumed trustworthy, this does not seem like an interesting security vulnerability, but still we should fix it just in the name of robustness. While here, document that pg_basebackup's --tablespace-mapping option doesn't work with tar-format output, because it doesn't. To make it work, we'd have to modify the tablespace_map file within the tarball sent by the server, which might be possible but I'm not volunteering. (Less-painful solutions would require changing the basebackup protocol so that the source server could adjust the map. That's not very appetizing either.)
* Revert "Fix race in Parallel Hash Join batch cleanup."Thomas Munro2021-03-18
| | | | | | | This reverts commit 378802e3713c6c0fce31d2390c134cd5d7c30157. This reverts commit 3b8981b6e1a2aea0f18384c803e21e9391de669a. Discussion: https://postgr.es/m/CA%2BhUKGJmcqAE3MZeDCLLXa62cWM0AJbKmp2JrJYaJ86bz36LFA%40mail.gmail.com
* Fix comment in indexing.cMichael Paquier2021-03-17
| | | | | | | | | 578b229, that removed support for WITH OIDS, has changed CatalogTupleInsert() to not return an Oid, but one comment was still mentioning that. Author: Vik Fearing Discussion: https://postgr.es/m/fef01975-ed10-3601-7b9e-80ecef72d00b@postgresfriends.org
* Small error message improvementPeter Eisentraut2021-03-17
|
* Update the names of Parallel Hash Join phases.Thomas Munro2021-03-17
| | | | | | | | | | | | | | | | Commit 3048898e dropped -ING from some wait event names that correspond to barrier phases. Update the phases' names to match. While we're here making cosmetic changes, also rename "DONE" to "FREE". That pairs better with "ALLOCATE", and describes the activity that actually happens in that phase (as we do for the other phases) rather than describing a state. The distinction is clearer after bugfix commit 3b8981b6 split the phase into two. As for the growth barriers, rename their "ALLOCATE" phase to "REALLOCATE", which is probably a better description of what happens then. Also improve the comments about the phases a bit. Discussion: https://postgr.es/m/CA%2BhUKG%2BMDpwF2Eo2LAvzd%3DpOh81wUTsrwU1uAwR-v6OGBB6%2B7g%40mail.gmail.com
* Fix race in Parallel Hash Join batch cleanup.Thomas Munro2021-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | With very unlucky timing and parallel_leader_participation off, PHJ could attempt to access per-batch state just as it was being freed. There was code intended to prevent that by checking for a cleared pointer, but it was buggy. Fix, by introducing an extra barrier phase. The new phase PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to find a batch to help with, and PHJ_BUILD_DONE means that it is too late. The last to detach will free the array of per-batch state as before, but now it will also atomically advance the phase at the same time, so that late attachers can avoid the hazard, without the data race. This mirrors the way per-batch hash tables are freed (see phases PHJ_BATCH_PROBING and PHJ_BATCH_DONE). Revealed by a one-off build farm failure, where BarrierAttach() failed a sanity check assertion, because the memory had been clobbered by dsa_free(). Back-patch to 11, where the code arrived. Reported-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz
* Fix race condition in drop subscription's handling of tablesync slots.Amit Kapila2021-03-17
| | | | | | | | | | | | | | Commit ce0fdbfe97 made tablesync slots permanent and allow Drop Subscription to drop such slots. However, it is possible that before tablesync worker could get the acknowledgment of slot creation, drop subscription stops it and that can lead to a dangling slot on the publisher. Prevent cancel/die interrupts while creating a slot in the tablesync worker. Reported-by: Thomas Munro as per buildfarm Author: Amit Kapila Reviewed-by: Vignesh C, Takamichi Osumi Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com
* Enable parallelism in REFRESH MATERIALIZED VIEW.Thomas Munro2021-03-17
| | | | | | | | | | | | | | | | | Pass CURSOR_OPT_PARALLEL_OK to pg_plan_query() so that parallel plans are considered when running the underlying SELECT query. This wasn't done in commit e9baa5e9, which did this for CREATE MATERIALIZED VIEW, because it wasn't yet known to be safe. Since REFRESH always inserts into a freshly created table before later merging or swapping the data into place with separate operations, we can enable such plans here too. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Hou, Zhijie <houzj.fnst@cn.fujitsu.com> Reviewed-by: Luc Vlaming <luc@swarm64.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CALj2ACXg-4hNKJC6nFnepRHYT4t5jJVstYvri%2BtKQHy7ydcr8A%40mail.gmail.com
* Fix comment about promising tuples.Peter Geoghegan2021-03-16
| | | | Oversight in commit d168b666823, which added bottom-up index deletion.
* Avoid corner-case memory leak in SSL parameter processing.Tom Lane2021-03-16
| | | | | | | | | | | | | | | | | | | | | | | After reading the root cert list from the ssl_ca_file, immediately install it as client CA list of the new SSL context. That gives the SSL context ownership of the list, so that SSL_CTX_free will free it. This avoids a permanent memory leak if we fail further down in be_tls_init(), which could happen if bogus CRL data is offered. The leak could only amount to something if the CRL parameters get broken after server start (else we'd just quit) and then the server is SIGHUP'd many times without fixing the CRL data. That's rather unlikely perhaps, but it seems worth fixing, if only because the code is clearer this way. While we're here, add some comments about the memory management aspects of this logic. Noted by Jelte Fennema and independently by Andres Freund. Back-patch to v10; before commit de41869b6 it doesn't matter, since we'd not re-execute this code during SIGHUP. Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org
* Use pre-fetching for ANALYZEStephen Frost2021-03-16
| | | | | | | | | | | | When we have posix_fadvise() available, we can improve the performance of an ANALYZE by quite a bit by using it to inform the kernel of the blocks that we're going to be asking for. Similar to bitmap index scans, the number of buffers pre-fetched is based off of the maintenance_io_concurrency setting (for the particular tablespace or, if not set, globally, via get_tablespace_maintenance_io_concurrency()). Reviewed-By: Heikki Linnakangas, Tomas Vondra Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
* Improve logging of auto-vacuum and auto-analyzeStephen Frost2021-03-16
| | | | | | | | | | | | When logging auto-vacuum and auto-analyze activity, include the I/O timing if track_io_timing is enabled. Also, for auto-analyze, add the read rate and the dirty rate, similar to how that information has historically been logged for auto-vacuum. Stephen Frost and Jakub Wartak Reviewed-By: Heikki Linnakangas, Tomas Vondra Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
* Improve logging of bad parameter values in BIND messages.Tom Lane2021-03-16
| | | | | | | | | | | | | | | Since commit ba79cb5dc, values of bind parameters have been logged during errors in extended query mode. However, we only did that after we'd collected and converted all the parameter values, thus failing to offer any useful localization of invalid-parameter problems. Add a separate callback that's used during parameter collection, and have it print the parameter number, along with the input string if text input format is used. Justin Pryzby and Tom Lane Discussion: https://postgr.es/m/20210104170939.GH9712@telsasoft.com Discussion: https://postgr.es/m/CANfkH5k-6nNt-4cSv1vPB80nq2BZCzhFVR5O4VznYbsX0wZmow@mail.gmail.com
* Implement pipeline mode in libpqAlvaro Herrera2021-03-15
| | | | | | | | | | | | | | | | | | | | | | | | | Pipeline mode in libpq lets an application avoid the Sync messages in the FE/BE protocol that are implicit in the old libpq API after each query. The application can then insert Sync at its leisure with a new libpq function PQpipelineSync. This can lead to substantial reductions in query latency. Co-authored-by: Craig Ringer <craig.ringer@enterprisedb.com> Co-authored-by: Matthieu Garrigues <matthieu.garrigues@gmail.com> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aya Iwata <iwata.aya@jp.fujitsu.com> Reviewed-by: Daniel Vérité <daniel@manitou-mail.org> Reviewed-by: David G. Johnston <david.g.johnston@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Kirk Jamison <k.jamison@fujitsu.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Nikhil Sontakke <nikhils@2ndquadrant.com> Reviewed-by: Vaishnavi Prabakaran <VaishnaviP@fast.au.fujitsu.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com Discussion: https://postgr.es/m/CAJkzx4T5E-2cQe3dtv2R78dYFvz+in8PY7A8MArvLhs_pg75gg@mail.gmail.com
* Make archiver process an auxiliary process.Fujii Masao2021-03-15
| | | | | | | | | | | | | | | | | | | | | | | This commit changes WAL archiver process so that it's treated as an auxiliary process and can use shared memory. This is an infrastructure patch required for upcoming shared-memory based stats collector patch series. These patch series basically need any processes including archiver that can report the statistics to access to shared memory. Since this patch itself is useful to simplify the code and when users monitor the status of archiver, it's committed separately in advance. This commit simplifies the code for WAL archiving. For example, previously backends need to signal to archiver via postmaster when they notify archiver that there are some WAL files to archive. On the other hand, this commit removes that signal to postmaster and enables backends to notify archier directly using shared latch. Also, as the side of this change, the information about archiver process becomes viewable at pg_stat_activity view. Author: Kyotaro Horiguchi Reviewed-by: Andres Freund, Álvaro Herrera, Julien Rouhaud, Tomas Vondra, Arthur Zakirov, Fujii Masao Discussion: https://postgr.es/m/20180629.173418.190173462.horiguchi.kyotaro@lab.ntt.co.jp
* Notice that heap page has dead items during VACUUM.Peter Geoghegan2021-03-14
| | | | | | | | | | | | | | | | | | | | | Consistently set a flag variable that tracks whether the current heap page has a dead item during lazy vacuum's heap scan. We missed the common case where there is an preexisting (or even a new) LP_DEAD heap line pointer. Also make it clear that the variable might be affected by an existing line pointer, say from an earlier opportunistic pruning operation. This distinction is important because it's the main reason why we can't just use the nearby tups_vacuumed variable instead. No backpatch. In theory failing to set the page level flag variable had no consequences. Currently it is only used to defensively check if a page marked all visible has dead items, which should never happen anyway (if it does then the table must be corrupt). Author: Masahiko Sawada <sawada.mshk@gmail.com> Diagnosed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoAtZb4+HJT_8RoOXvu4HM-Zd4HKS3YSMCH6+-W=bDyh-w@mail.gmail.com
* Improve FK trigger parallel-safety check added by 05c8482f7f.Amit Kapila2021-03-13
| | | | | | | | | | | | Commit 05c8482f7f added special logic related to parallel-safety of FK triggers. This is a bit of a hack and should have instead been done by simply setting appropriate proparallel values on those trigger functions themselves. Suggested-by: Tom Lane Author: Greg Nancarrow Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us
* Consolidate nbtree VACUUM metapage routines.Peter Geoghegan2021-03-12
| | | | | | | | | | | Simplify _bt_vacuum_needs_cleanup() functions's signature (it only needs a single 'rel' argument now), and move it next to its sibling function in nbtpage.c. I believe that _bt_vacuum_needs_cleanup() was originally located in nbtree.c due to an include dependency issue. That's no longer an issue. Follow-up to commit 9f3665fb.
* Forbid marking an identity column as nullable.Tom Lane2021-03-12
| | | | | | | | | | | | | | | GENERATED ALWAYS AS IDENTITY implies NOT NULL, but the code failed to complain if you overrode that with "GENERATED ALWAYS AS IDENTITY NULL". One might think the old behavior was a feature, but it was inconsistent because the outcome varied depending on the order of the clauses, so it seems to have been just an oversight. Per bug #16913 from Pavel Boev. Back-patch to v10 where identity columns were introduced. Vik Fearing (minor tweaks by me) Discussion: https://postgr.es/m/16913-3b5198410f67d8c6@postgresql.org
* Specialize checkpointer sort functions.Thomas Munro2021-03-12
| | | | | | | | | When sorting a potentially large number of dirty buffers, the checkpointer can benefit from a faster sort routine. One reported improvement on a large buffer pool system was 1.4s -> 0.6s. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com
* Fix size overflow in calculation introduced by commits d6ad34f3 and bea449c6.Amit Kapila2021-03-12
| | | | | | | Reported-by: Thomas Munro Author: Takayuki Tsunakawa Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CA+hUKG+oPoFizjABt=GXZWTEHx3oev5rAe2scjW2r6F1rguo5w@mail.gmail.com
* Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.Amit Kapila2021-03-12
| | | | | | | | | | | The commit added code which used a relcache TriggerDesc field across another cache access, which it shouldn't because the relcache doesn't guarantee it won't get moved. Diagnosed-by: Tom Lane Author: Greg Nancarrow Reviewed-by: Hou Zhijie, Amit Kapila Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us
* Poll postmaster less frequently in recovery.Thomas Munro2021-03-12
| | | | | | | | | | | | | | | | Since commits 9f095299 and f98b8476 we don't poll the postmaster pipe at all during crash recovery on Linux and FreeBSD, but on other operating systems we were still doing it for every WAL record. Do it less frequently on operating systems where system calls are required, at the cost of delaying exit a bit after postmaster death. This avoids expensive system calls reported to slow down CPU-bound recovery by as much as 10-30%. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
* Add condition variable for walreceiver shutdown.Thomas Munro2021-03-12
| | | | | | | Use this new CV to wait for walreceiver shutdown without a sleep/poll loop, while also benefiting from standard postmaster death handling. Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
* Add condition variable for recovery resume.Thomas Munro2021-03-12
| | | | | | | | | Replace a sleep loop with a CV, to get a fast reaction time when recovery is resumed or the postmaster exits via standard infrastructure. Unfortunately we still need to wake up every second to perform extra polling during the recovery pause loop. Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
* Send statistics collected during shutdown checkpoint to the stats collector.Fujii Masao2021-03-12
| | | | | | | | | | | | | | When shutdown is requested, checkpointer performs checkpoint or restartpoint, and updates the statistics, before it exits. But previously checkpointer didn't send those statistics to the stats collector. Shutdown checkpoint and restartpoint are treated as requested ones instead of scheduled ones, so the number of them are counted in pg_stat_bgwriter.checkpoints_req column. Author: Masahiro Ikeda Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
* Force to send remaining WAL stats to the stats collector at walwriter exit.Fujii Masao2021-03-12
| | | | | | | | | | | | | | | | | | | In walwriter's main loop, WAL stats message is only sent if enough time has passed since last one was sent to reach PGSTAT_STAT_INTERVAL msecs. This is necessary to avoid overloading to the stats collector. But this can cause recent WAL stats to be unsent when walwriter exits. To ensure that all the WAL stats are sent, this commit makes walwriter force to send remaining WAL stats to the collector when it exits because of shutdown request. Note that those remaining WAL stats can still be unsent when walwriter exits with non-zero exit code (e.g., FATAL error). This is OK because that walwriter exit leads to server crash and subsequent recovery discards all the stats. So there is no need to send remaining stats in that case. Author: Masahiro Ikeda Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
* Minor modernization for README.barrier.Thomas Munro2021-03-12
| | | | | Itanium is very uncommon and being discontinued. ARM is everywhere. Prefer ARM as an example of an architecture with weak memory ordering.
* Save a few cycles during nbtree VACUUM.Peter Geoghegan2021-03-11
| | | | | | | | | | | Avoid calling RelationGetNumberOfBlocks() unnecessarily in the common case where there are no deleted but not yet recycled pages to recycle during a cleanup-only nbtree VACUUM operation. Follow-up to commit e5d8a999, which (among other things) taught the "skip full scan" nbtree VACUUM mechanism to only trigger a full index scan when the absolute number of deleted pages in the index is considered excessive.
* Add back vacuum_cleanup_index_scale_factor parameter.Peter Geoghegan2021-03-11
| | | | | | | | | | | | | | | | | Commit 9f3665fb removed the vacuum_cleanup_index_scale_factor storage parameter. However, that creates dump/reload hazards when moving across major versions. Add back the vacuum_cleanup_index_scale_factor parameter (though not the GUC of the same name) purely to avoid problems when using tools like pg_upgrade. The parameter remains disabled and undocumented. No backpatch to Postgres 13, since vacuum_cleanup_index_scale_factor was only disabled by REL_13_STABLE's version of master branch commit 9f3665fb in the first place -- the parameter already looks like this on REL_13_STABLE. Discussion: https://postgr.es/m/YEm/a3Ko3nKnBuVq@paquier.xyz
* Be clear about whether a recovery pause has taken effect.Robert Haas2021-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the code and documentation seem to have essentially assumed than a call to pg_wal_replay_pause() would take place immediately, but that's not the case, because we only check for a pause in certain places. This means that a tool that uses this function and then wants to do something else afterward that is dependent on the pause having taken effect doesn't know how long it needs to wait to be sure that no more WAL is going to be replayed. To avoid that, add a new function pg_get_wal_replay_pause_state() which returns either 'not paused', 'paused requested', or 'paused'. After calling pg_wal_replay_pause() the status will immediate change from 'not paused' to 'pause requested'; when the startup process has noticed this, the status will change to 'pause'. For backward compatibility, pg_is_wal_replay_paused() still exists and returns the same thing as before: true if a pause has been requested, whether or not it has taken effect yet; and false if not. The documentation is updated to clarify. To improve the changes that a pause request is quickly confirmed effective, adjust things so that WaitForWALToBecomeAvailable will swiftly reach a call to recoveryPausesHere() when a pause request is made. Dilip Kumar, reviewed by Simon Riggs, Kyotaro Horiguchi, Yugo Nagata, Masahiko Sawada, and Bharath Rupireddy. Discussion: http://postgr.es/m/CAFiTN-vcLLWEm8Zr%3DYK83rgYrT9pbC8VJCfa1kY9vL3AUPfu6g%40mail.gmail.com
* VACUUM ANALYZE: Always update pg_class.reltuples.Peter Geoghegan2021-03-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vacuumlazy.c sometimes fails to update pg_class entries for each index (to ensure that pg_class.reltuples is current), even though analyze.c assumed that that must have happened during VACUUM ANALYZE. There are at least a couple of reasons for this. For example, vacuumlazy.c could fail to update pg_class when the index AM indicated that its statistics are merely an estimate, per the contract for amvacuumcleanup() routines established by commit e57345975cf back in 2006. Stop assuming that pg_class must have been updated with accurate statistics within VACUUM ANALYZE -- update pg_class for indexes at the same time as the table relation in all cases. That way VACUUM ANALYZE will never fail to keep pg_class.reltuples reasonably accurate. The only downside of this approach (compared to the old approach) is that it might inaccurately set pg_class.reltuples for indexes whose heap relation ends up with the same inaccurate value anyway. This doesn't seem too bad. We already consistently called vac_update_relstats() (to update pg_class) for the heap/table relation twice during any VACUUM ANALYZE -- once in vacuumlazy.c, and once in analyze.c. We now make sure that we call vac_update_relstats() at least once (though often twice) for each index. This is follow up work to commit 9f3665fb, which dealt with issues in btvacuumcleanup(). Technically this fixes an unrelated issue, though. btvacuumcleanup() no longer provides an accurate num_index_tuples value following commit 9f3665fb (when there was no btbulkdelete() call during the VACUUM operation in question), but hashvacuumcleanup() has worked in the same way for many years now. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzknxdComjhqo4SUxVFk_Q1171GJO2ZgHZ1Y6pion6u8rA@mail.gmail.com Backpatch: 13-, just like commit 9f3665fb.
* Don't consider newly inserted tuples in nbtree VACUUM.Peter Geoghegan2021-03-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the entire idea of "stale stats" within nbtree VACUUM (stop caring about stats involving the number of inserted tuples). Also remove the vacuum_cleanup_index_scale_factor GUC/param on the master branch (though just disable them on postgres 13). The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM partially responsible for deciding when pg_class.reltuples stats needed to be updated. This seems contrary to the spirit of the index AM API, though -- it is not actually necessary for an index AM's bulk delete and cleanup callbacks to provide accurate stats when it happens to be inconvenient. The core code owns that. (Index AMs have the authority to perform or not perform certain kinds of deferred cleanup based on their own considerations, such as page deletion and recycling, but that has little to do with pg_class.reltuples/num_index_tuples.) This issue was fairly harmless until the introduction of the autovacuum_vacuum_insert_threshold feature by commit b07642db, which had an undesirable interaction with the vacuum_cleanup_index_scale_factor mechanism: it made insert-driven autovacuums perform full index scans, even though there is no real benefit to doing so. This has been tied to a regression with an append-only insert benchmark [1]. Also have remaining cases that perform a full scan of an index during a cleanup-only nbtree VACUUM indicate that the final tuple count is only an estimate. This prevents vacuumlazy.c from setting the index's pg_class.reltuples in those cases (it will now only update pg_class when vacuumlazy.c had TIDs for nbtree to bulk delete). This arguably fixes an oversight in deduplication-related bugfix commit 48e12913. [1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.
* C comments: improve description of GiST NSN and GistBuildLSNBruce Momjian2021-03-10
| | | | | | | GiST indexes are complex, so adding more details in the code might help someone. Discussion: https://postgr.es/m/20210302164021.GA364@momjian.us
* Replace buffer I/O locks with condition variables.Thomas Munro2021-03-11
| | | | | | | | | | | | | | | | | | | 1. Backends waiting for buffer I/O are now interruptible. 2. If something goes wrong in a backend that is currently performing I/O, waiting backends no longer wake up until that backend reaches AbortBufferIO() and broadcasts on the CV. Previously, any waiters would wake up (because the I/O lock was automatically released) and then busy-loop until AbortBufferIO() cleared BM_IO_IN_PROGRESS. 3. LWLockMinimallyPadded is removed, as it would now be unused. Author: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier version, 2016) Discussion: https://postgr.es/m/CA%2BhUKGJ8nBFrjLuCTuqKN0pd2PQOwj9b_jnsiGFFMDvUxahj_A%40mail.gmail.com Discussion: https://postgr.es/m/CA+Tgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr=C56Xng@mail.gmail.com
* Avoid creating duplicate cached plans for inherited FK constraints.Tom Lane2021-03-10
| | | | | | | | | | | | | | | | | When a foreign key constraint is applied to a partitioned table, each leaf partition inherits a similar FK constraint. We were processing all of those constraints independently, meaning that in large partitioning trees we'd build up large collections of cached FK-checking query plans. However, in all cases but one, the generated queries are actually identical for all members of the inheritance tree (because, in most cases, the query only mentions the topmost table of the other side of the FK relationship). So we can share a single cached plan among all the partitions, saving memory, not to mention time to build and maintain the cached plans. Keisuke Kuroda and Amit Langote Discussion: https://postgr.es/m/cab4b85d-9292-967d-adf2-be0d803c3e23@nttcom.co.jp_1
* Add bound check before bsearch() for performancePeter Eisentraut2021-03-10
| | | | | | | | | | | | | | | In the current lazy vacuum implementation, some index AMs such as btree indexes call lazy_tid_reaped() for each index tuple during ambulkdelete to check if the index tuple points to the (collected) garbage tuple. In that function, we simply call bsearch(), but we should be able to know the result without bsearch() if the index tuple points to the heap tuple that is out of range of the collected garbage tuples. Therefore, add a simple bound check before resorting to bsearch(). Testing has shown that this can give significant performance benefits. Author: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+fd4k76j8jKzJzcx8UqEugvayaMSnQz0iLUt_XgBp-_-bd22A@mail.gmail.com
* Small debug message tweakPeter Eisentraut2021-03-10
| | | | This makes the wording of the delete case match the update case.
* Fix valgrind issue in commit 05c8482f7f.Amit Kapila2021-03-10
| | | | | | | | | | Initialize other newly added variables in max_parallel_hazard_context via is_parallel_safe() because we don't check the parallel-safety of target relations in that function. Reported-by: Tom Lane as per buildfarm Author: Amit Kapila Discussion: https://postgr.es/m/2060179.1615347455@sss.pgh.pa.us
* Enable parallel SELECT for "INSERT INTO ... SELECT ...".Amit Kapila2021-03-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Parallel SELECT can't be utilized for INSERT in the following cases: - INSERT statement uses the ON CONFLICT DO UPDATE clause - Target table has a parallel-unsafe: trigger, index expression or predicate, column default expression or check constraint - Target table has a parallel-unsafe domain constraint on any column - Target table is a partitioned table with a parallel-unsafe partition key expression or support function The planner is updated to perform additional parallel-safety checks for the cases listed above, for determining whether it is safe to run INSERT in parallel-mode with an underlying parallel SELECT. The planner will consider using parallel SELECT for "INSERT INTO ... SELECT ...", provided nothing unsafe is found from the additional parallel-safety checks, or from the existing parallel-safety checks for SELECT. While checking parallel-safety, we need to check it for all the partitions on the table which can be costly especially when we decide not to use a parallel plan. So, in a separate patch, we will introduce a GUC and or a reloption to enable/disable parallelism for Insert statements. Prior to entering parallel-mode for the execution of INSERT with parallel SELECT, a TransactionId is acquired and assigned to the current transaction state. This is necessary to prevent the INSERT from attempting to assign the TransactionId whilst in parallel-mode, which is not allowed. This approach has a disadvantage in that if the underlying SELECT does not return any rows, then the TransactionId is not used, however that shouldn't happen in practice in many cases. Author: Greg Nancarrow, Amit Langote, Amit Kapila Reviewed-by: Amit Langote, Hou Zhijie, Takayuki Tsunakawa, Antonin Houska, Bharath Rupireddy, Dilip Kumar, Vignesh C, Zhihong Yu, Amit Kapila Tested-by: Tang, Haiying Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
* Track total amounts of times spent writing and syncing WAL data to disk.Fujii Masao2021-03-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds new GUC track_wal_io_timing. When this is enabled, the total amounts of time XLogWrite writes and issue_xlog_fsync syncs WAL data to disk are counted in pg_stat_wal. This information would be useful to check how much WAL write and sync affect the performance. Enabling track_wal_io_timing will make the server query the operating system for the current time every time WAL is written or synced, which may cause significant overhead on some platforms. To avoid such additional overhead in the server with track_io_timing enabled, this commit introduces track_wal_io_timing as a separate parameter from track_io_timing. Note that WAL write and sync activity by walreceiver has not been tracked yet. This commit makes the server also track the numbers of times XLogWrite writes and issue_xlog_fsync syncs WAL data to disk, in pg_stat_wal, regardless of the setting of track_wal_io_timing. This counters can be used to calculate the WAL write and sync time per request, for example. Bump PGSTAT_FILE_FORMAT_ID. Bump catalog version. Author: Masahiro Ikeda Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston, Fujii Masao Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
* Add support for more progress reporting in COPYMichael Paquier2021-03-09
| | | | | | | | | | | | | | | | | The command (TO or FROM), its type (file, pipe, program or callback), and the number of tuples excluded by a WHERE clause in COPY FROM are added to the progress reporting already available. The column "lines_processed" is renamed to "tuples_processed" to disambiguate the meaning of this column in the cases of CSV and BINARY COPY and to be more consistent with the other catalog progress views. Bump catalog version, again. Author: Matthias van de Meent Reviewed-by: Michael Paquier, Justin Pryzby, Bharath Rupireddy, Josef Šimánek, Tomas Vondra Discussion: https://postgr.es/m/CAEze2WiOcgdH4aQA8NtZq-4dgvnJzp8PohdeKchPkhMY-jWZXA@mail.gmail.com
* Remove support for SSL compressionMichael Paquier2021-03-09
| | | | | | | | | | | | | | | | | | | | | | | | | | PostgreSQL disabled compression as of e3bdb2d and the documentation recommends against using it since. Additionally, SSL compression has been disabled in OpenSSL since version 1.1.0, and was disabled in many distributions long before that. The most recent TLS version, TLSv1.3, disallows compression at the protocol level. This commit removes the feature itself, removing support for the libpq parameter sslcompression (parameter still listed for compatibility reasons with existing connection strings, just ignored), and removes the equivalent field in pg_stat_ssl and de facto PgBackendSSLStatus. Note that, on top of removing the ability to activate compression by configuration, compression is actively disabled in both frontend and backend to avoid overrides from local configurations. A TAP test is added for deprecated SSL parameters to check after backwards compatibility. Bump catalog version. Author: Daniel Gustafsson Reviewed-by: Peter Eisentraut, Magnus Hagander, Michael Paquier Discussion: https://postgr.es/m/7E384D48-11C5-441B-9EC3-F7DB1F8518F6@yesql.se