aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Add the database name to the ps display of logical WAL sendersMichael Paquier2022-11-24
| | | | | | | | | | | | | | | | Logical WAL senders display now as follows, gaining a database name: postgres: walsender USER DATABASE HOST(PORT) STATE Physical WAL senders show up the same, as of: postgres: walsender USER HOST(PORT) STATE This information was missing, hence it was not possible to know from ps if a WAL sender was a logical or a physical one, and on which database it is connected when it is logical. Author: Tatsuhiro Nakamori Reviewed-by: Fujii Masao, Bharath Rupireddy Discussion: https://postgr.es/m/36a3b137e82e0ea9fe7e4234f03b64a1@oss.nttdata.com
* Add support for file inclusions in HBA and ident configuration filesMichael Paquier2022-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | pg_hba.conf and pg_ident.conf gain support for three record keywords: - "include", to include a file. - "include_if_exists", to include a file, ignoring it if missing. - "include_dir", to include a directory of files. These are classified by name (C locale, mostly) and need to be prefixed by ".conf", hence following the same rules as GUCs. This commit relies on the refactoring pieces done in efc9816, ad6c528, 783e8c6 and 1b73d0b, adding a small wrapper to build a list of TokenizedAuthLines (tokenize_include_file), and the code is shaped to offer some symmetry with what is done for GUCs with the same options. pg_hba_file_rules and pg_ident_file_mappings gain a new field called file_name, to track from which file a record is located, taking advantage of the addition of rule_number in c591300 to offer an organized view of the HBA or ident records loaded. Bump catalog version. Author: Julien Rouhaud Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/20220223045959.35ipdsvbxcstrhya@jrouhaud
* Speedup hash index builds by skipping needless binary searchesDavid Rowley2022-11-24
| | | | | | | | | | | | | | | | | When building hash indexes using the spool method, tuples are added to the index page in hashkey order. Because of this, we can safely skip performing the binary search on the existing tuples on the page to find the location to insert the tuple based on its hashkey value. For this case, we can just always put the tuple at the end of the item array as the tuples will always arrive in hashkey order. Testing has shown that this can improve hash index build speeds by 5-15% with a unique set of integer values. Author: Simon Riggs Reviewed-by: David Rowley Tested-by: David Zhang, Tomas Vondra Discussion: https://postgr.es/m/CANbhV-GBc5JoG0AneUGPZZW3o4OK5LjBGeKe_icpC3R1McrZWQ@mail.gmail.com
* Create memory context for tokenization after opening top-level file in hba.cMichael Paquier2022-11-24
| | | | | | | | | | | | | | The memory context was created before attempting to open the first HBA or ident file, which would cause it to leak. This had no consequences for the system views for HBA and ident files, but this would cause memory leaks in the postmaster on reload if the initial HBA and/or ident files are missing, which is a valid behavior while the backend is running. Oversight in efc9816. Author: Ted Yu Discussion: https://postgr.es/m/CALte62xH6ivgiKKzPRJgfekPZC6FKLB3xbnf3=tZmc_gKj78dw@mail.gmail.com
* Add missing initialization in tokenize_expand_file() for output listMichael Paquier2022-11-24
| | | | | | | | This should have been added in efc9816, but it looks like I have found a way to mess up a bit a patch split. This should have no consequence in practice, but let's be clean. Discussion: https://postgr.es/m/Y324HvGKiWxW2yxe@paquier.xyz
* Rework memory contexts in charge of HBA/ident tokenizationMichael Paquier2022-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The list of TokenizedAuthLines generated at parsing for the HBA and ident files is now stored in a static context called tokenize_context, where only all the parsed tokens are stored. This context is created when opening the first authentication file of a HBA/ident set (hba_file or ident_file), and is cleaned up once we are done all the work around it through a new routine called free_auth_file(). One call of open_auth_file() should have one matching call of free_auth_file(), the creation and deletion of the tokenization context is controlled by the recursion depth of the tokenization. Rather than having tokenize_auth_file() return a memory context that includes all the records, the tokenization logic now creates and deletes one memory context each time this function is called. This will simplify recursive calls to this routine for the upcoming inclusion record logic. While on it, rename tokenize_inc_file() to tokenize_expand_file() as this would conflict with the upcoming patch that will add inclusion records for HBA/ident files. An '@' file has its tokens added to an existing list. Reloading HBA/indent configuration in a tight loop shows no leaks, as of one type of test done (with and without -DEXEC_BACKEND). Author: Michael Paquier Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/Y324HvGKiWxW2yxe@paquier.xyz
* Support for custom slots in the custom executor nodesAlexander Korotkov2022-11-24
| | | | | | | | | | | Some custom table access method may have their tuple format and use custom executor nodes for their custom scan types. The ability to set a custom slot would save them from tuple format conversion. Other users of custom executor nodes may also benefit. Discussion: https://postgr.es/m/CAPpHfduJUU6ToecvTyRE_yjxTS80FyPpct4OHaLFk3OEheMTNA@mail.gmail.com Author: Alexander Korotkov Reviewed-by: Pavel Borisov
* Simplify WARNING messages from skipped vacuum/analyze on a tableAndrew Dunstan2022-11-23
| | | | | | | | | This will more easily accomodate adding new permissions for vacuum and analyze. Nathan Bossart following a suggestion from Kyotaro Horiguchi Discussion: https://postgr.es/m/20220726.104712.912995710251150228.horikyota.ntt@gmail.com
* Expand AclMode to 64 bitsAndrew Dunstan2022-11-23
| | | | | | | | | | | | | | We're running out of bits for new permissions. This change doubles the number of permissions we can accomodate from 16 to 32, so the forthcoming new ones for vacuum/analyze don't exhaust the pool. Nathan Bossart Reviewed by: Bharath Rupireddy, Kyotaro Horiguchi, Stephen Frost, Robert Haas, Mark Dilger, Tom Lane, Corey Huinker, David G. Johnston, Michael Paquier. Discussion: https://postgr.es/m/20220722203735.GB3996698@nathanxps13
* Simplify vacuum_set_xid_limits() signature.Peter Geoghegan2022-11-23
| | | | | | | | | | | | Pass VACUUM parameters (VacuumParams state) to vacuum_set_xid_limits() directly, rather than passing most individual VacuumParams fields as separate arguments. Also make vacuum_set_xid_limits() output parameter symbol names match those used by its vacuumlazy.c caller. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wz=TE7gW5DgSahDkf0UEZigFGAoHNNN6EvSrdzC=Kn+hrA@mail.gmail.com
* Don't test HEAP_XMAX_INVALID when freezing xmax.Peter Geoghegan2022-11-23
| | | | | | | | | | | | | | | | | | | | We shouldn't ever need to rely on whether HEAP_XMAX_INVALID is set in t_infomask when considering whether or not an xmax should be deemed already frozen, since that status flag is just a hint. The only acceptable representation for an "xmax_already_frozen" raw xmax field is the transaction ID value zero (also known as InvalidTransactionId). Adjust code that superficially appeared to rely on HEAP_XMAX_INVALID to make the rule about xmax_already_frozen clear. Also avoid needlessly rereading the tuple's raw xmax. Oversight in bugfix commit d2599ecf. There is no evidence that this ever led to incorrect behavior, so no backpatch. The worst consequence of this bug was that VACUUM could hypothetically fail to notice and report on certain kinds of corruption, which seems fairly benign. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wzkh3DMCDRPfhZxj9xCq9v3WmzvmbiCpf1dNKUBPadhCbQ@mail.gmail.com
* Give better hints for ambiguous or unreferenceable columns.Tom Lane2022-11-22
| | | | | | | | | | | | | | | | | | | | | | | | | Examine ParseNamespaceItem flags to detect whether a column name is unreferenceable for lack of LATERAL, or could be referenced if a qualified name were used, and give better hints for such cases. Also, don't phrase the message to imply that there's only one matching column when there is really more than one. Many of the regression test output changes are not very interesting, but just reflect reclassifying the "There is a column ... but it cannot be referenced from this part of the query" messages as DETAIL rather than HINT. They are details per our style guide, in the sense of being factual rather than offering advice; and this change provides room to offer actual HINTs about what to do. While here, adjust the fuzzy-name-matching code to be a shade less impenetrable. It was overloading the meanings of FuzzyAttrMatchState fields way too much IMO, so splitting them into multiple fields seems to make it clearer. It's not like we need to shave bytes in that struct. Per discussion of bug #17233 from Alexander Korolev. Discussion: https://postgr.es/m/17233-afb9d806aaa64b17@postgresql.org
* YA attempt at taming worst-case behavior of get_actual_variable_range.Tom Lane2022-11-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We've made multiple attempts at preventing get_actual_variable_range from taking an unreasonable amount of time (3ca930fc3, fccebe421). But there's still an issue for the very first planning attempt after deletion of a large number of extremal-valued tuples. While that planning attempt will set "killed" bits on the tuples it visits and thereby reduce effort for next time, there's still a lot of work it has to do to visit the heap and then set those bits. It's (usually?) not worth it to do that much work at plan time to have a slightly better estimate, especially in a context like this where the table contents are known to be mutating rapidly. Therefore, let's bound the amount of work to be done by giving up after we've visited 100 heap pages. Giving up just means we'll fall back on the extremal value recorded in pg_statistic, so it shouldn't mean that planner estimates suddenly become worthless. Note that this means we'll still gradually whittle down the problem by setting a few more index "killed" bits in each planning attempt; so eventually we'll reach a good state (barring further deletions), even in the absence of VACUUM. Simon Riggs, per a complaint from Jakub Wartak (with cosmetic adjustments by me). Back-patch to all supported branches. Discussion: https://postgr.es/m/CAKZiRmznOwi0oaV=4PHOCM4ygcH4MgSvt8=5cu_vNCfc8FSUug@mail.gmail.com
* Improve comments atop pg_get_replication_slots.Amit Kapila2022-11-22
| | | | | | | | Update comments atop pg_get_replication_slots to make it clear that it shows all replication slots that currently exist on the database cluster. Author: sirisha chamarthi Discussion: https://postgr.es/m/CAKrAKeXRuFpeiWS+STGFm-RFfW19sUDxju66JkyRi13kdQf94Q@mail.gmail.com
* Ignore invalidated slots while computing oldest catalog XminAlvaro Herrera2022-11-22
| | | | | | | | | | | | | | | | | Once a logical slot has acquired a catalog_xmin, it doesn't let go of it, even when invalidated by exceeding the max_slot_wal_keep_size, which means that dead catalog tuples are not removed by vacuum anymore since the point is invalidated, until the slot is dropped. This could be catastrophic if catalog churn is high. Change the computation of Xmin to ignore invalidated slots, to prevent dead rows from accumulating. Backpatch to 13, where slot invalidation appeared. Author: Sirisha Chamarthi <sirichamarthi22@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAKrAKeUEDeqquN9vwzNeG-CN8wuVsfRYbeOUV9qKO_RHok=j+g@mail.gmail.com
* Add wait event for pg_usleep() in perform_spin_delay()Andres Freund2022-11-21
| | | | | | | | | | | | | | | | | | | | The lwlock wait queue scalability issue fixed in a4adc31f690 was quite hard to find because of the exponential backoff and because we adjust spins_per_delay over time within a backend. To make it easier to find similar issues in the future, add a wait event for the pg_usleep() in perform_spin_delay(). Showing a wait event while spinning without sleeping would increase the overhead of spinlocks, which we do not want. We may at some later point want to have more granular wait events, but that'd be a substantial amount of work. This provides at least some insights into something currently hard to observe. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> https://postgr.es/m/20221120204310.xywrhyxyytsajuuq@awork3.anarazel.de
* Add comments and a missing CHECK_FOR_INTERRUPTS in ts_headline.Tom Lane2022-11-21
| | | | | | | | | | | | | | | | I just spent an annoying amount of time reverse-engineering the 100%-undocumented API between ts_headline and the text search parser's prsheadline function. Add some commentary about that while it's fresh in mind. Also remove some unused macros in wparser_def.c. While at it, I noticed that when commit 78e73e875 added a CHECK_FOR_INTERRUPTS call in TS_execute_recurse, it missed doing so in the parallel function TS_phrase_execute, which surely needs one just as much. Back-patch because of the missing CHECK_FOR_INTERRUPTS. Might as well back-patch the rest of this too.
* Add workaround to make ubsan and ps_status.c compatibleAndres Freund2022-11-21
| | | | | | | | | | | | | | | | | | | | At least on linux, set_ps_display() breaks /proc/$pid/environ. The sanitizer's helper library uses /proc/$pid/environ to implement getenv(), as it wants to work independent of libc. When just using undefined and alignment sanitizers, the sanitizer library is only initialized when the first error occurs, by which time we've often already called set_ps_display(), preventing the sanitizer libraries from seeing the options. We can work around that by defining __ubsan_default_options, a weak symbol libsanitizer uses to get defaults from the application, and return getenv("UBSAN_OPTIONS"). But only if main already was reached, so that we don't end up relying on a not-yet-working getenv(). As it's just a function that won't get called when not running a sanitizer, it doesn't seem necessary to make compilation of the function conditional. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/20220323173537.ll7klrglnp4gn2um@alap3.anarazel.de
* Provide options for postmaster to kill child processes with SIGABRT.Tom Lane2022-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The postmaster normally sends SIGQUIT to force-terminate its child processes after a child crash or immediate-stop request. If that doesn't result in child exit within a few seconds, we follow it up with SIGKILL. This patch provides GUC flags that allow either of these signals to be replaced with SIGABRT. On typically-configured Unix systems, that will result in a core dump being produced for each such child. This can be useful for debugging problems, although it's not something you'd want to have on in production due to the risk of disk space bloat from lots of core files. The old postmaster -T switch, which sent SIGSTOP in place of SIGQUIT, is changed to be the same as send_abort_for_crash. As far as I can tell from the code comments, the intent of that switch was just to block things for long enough to force core dumps manually, which seems like an unnecessary extra step. (Maybe at the time, there was no way to get most kernels to produce core files with per-PID names, requiring manual core file renaming after each one. But now it's surely the hard way.) I also took the opportunity to remove the old postmaster -n (skip shmem reinit) switch, which hasn't actually done anything in decades, though the documentation still claimed it did. Discussion: https://postgr.es/m/2251016.1668797294@sss.pgh.pa.us
* Replace SQLValueFunction by COERCE_SQL_SYNTAXMichael Paquier2022-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This switch impacts 9 patterns related to a SQL-mandated special syntax for function calls: - LOCALTIME [ ( typmod ) ] - LOCALTIMESTAMP [ ( typmod ) ] - CURRENT_TIME [ ( typmod ) ] - CURRENT_TIMESTAMP [ ( typmod ) ] - CURRENT_DATE Five new entries are added to pg_proc to compensate the removal of SQLValueFunction to provide backward-compatibility and making this change transparent for the end-user (for example for the attribute generated when a keyword is specified in a SELECT or in a FROM clause without an alias, or when specifying something else than an Iconst to the parser). The parser included a set of checks coming from the files in charge of holding the C functions used for the SQLValueFunction calls (as of transformSQLValueFunction()), which are now moved within each function's execution path, so this reduces the dependencies between the execution and the parsing steps. As of this change, all the SQL keywords use the same paths for their work, relying only on COERCE_SQL_SYNTAX. Like fb32748, no performance difference has been noticed, while the perf profiles get reduced with ExecEvalSQLValueFunction() gone. Bump catalog version. Reviewed-by: Corey Huinker, Ted Yu Discussion: https://postgr.es/m/YzaG3MoryCguUOym@paquier.xyz
* Add additional checks while creating the initial decoding snapshot.Amit Kapila2022-11-21
| | | | | | | | | | | | As per one of the CI reports, there is an assertion failure which indicates that we were trying to use an unenforced xmin horizon for decoding snapshots. Though, we couldn't figure out the reason for assertion failure these checks would help us in finding the reason if the problem happens again in the future. Author: Amit Kapila based on suggestions by Andres Freund Reviewd by: Andres Freund Discussion: https://postgr.es/m/CAA4eK1L8wYcyTPxNzPGkhuO52WBGoOZbT0A73Le=ZUWYAYmdfw@mail.gmail.com
* lwlock: Fix quadratic behavior with very long wait listsAndres Freund2022-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now LWLockDequeueSelf() sequentially searched the list of waiters to see if the current proc is still is on the list of waiters, or has already been removed. In extreme workloads, where the wait lists are very long, this leads to a quadratic behavior. #backends iterating over a list #backends long. Additionally, the likelihood of needing to call LWLockDequeueSelf() in the first place also increases with the increased length of the wait queue, as it becomes more likely that a lock is released while waiting for the wait list lock, which is held for longer during lock release. Due to the exponential back-off in perform_spin_delay() this is surprisingly hard to detect. We should make that easier, e.g. by adding a wait event around the pg_usleep() - but that's a separate patch. The fix is simple - track whether a proc is currently waiting in the wait list or already removed but waiting to be woken up in PGPROC->lwWaiting. In some workloads with a lot of clients contending for a small number of lwlocks (e.g. WALWriteLock), the fix can substantially increase throughput. As the quadratic behavior arguably is a bug, we might want to decide to backpatch this fix in the future. Author: Andres Freund <andres@anarazel.de> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/20221027165914.2hofzp4cvutj6gin@awork3.anarazel.de Discussion: https://postgr.es/m/CALj2ACXktNbG=K8Xi7PSqbofTZozavhaxjatVc14iYaLu4Maag@mail.gmail.com
* pgstat: replace double lookup with IsSharedRelation()Andres Freund2022-11-20
| | | | | | | | | | | As the list of shared relations is fixed, we can just dispatch based IsSharedRelation(), instead of first trying to look up stats for a non-shared rel and falling back to shared stats. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de Discussion: https://postgr.es/m/8c1851a2-a98e-e1bc-7729-37b0b95f66ec@gmail.com
* Fix long-obsolete comment.Tom Lane2022-11-20
| | | | | | Commit c94959d41 fixed DROP OPERATOR to reset oprcom/oprnegate links to the dropped operator; but it missed updating this old comment that claimed we allow such links to dangle.
* Switch SQLValueFunction on "name" to use COERCE_SQL_SYNTAXMichael Paquier2022-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit changes six SQL keywords to use COERCE_SQL_SYNTAX rather than relying on SQLValueFunction: - CURRENT_ROLE - CURRENT_USER - USER - SESSION_USER - CURRENT_CATALOG - CURRENT_SCHEMA Among the six, "user", "current_role" and "current_catalog" require specific SQL functions to allow ruleutils.c to map them to the SQL keywords these require when using COERCE_SQL_SYNTAX. Having pg_proc.proname match with the keyword ensures that the compatibility remains the same when projecting any of these keywords in a FROM clause to an attribute name when an alias is not specified. This is covered by the tests added in 2e0d80c, making sure that a correct mapping happens with each SQL keyword. The three others (current_schema, session_user and current_user) already have pg_proc entries for this job, so this brings more consistency between the way such keywords are treated in the parser, the executor and ruleutils.c. SQLValueFunction is reduced to half its contents after this change, simplifying its logic a bit as there is no need to enforce a C collation anymore for the entries returning a name as a result. I have made a few performance tests, with a million-ish calls to these keywords without seeing a difference in run-time or in perf profiles (ExecEvalSQLValueFunction() is removed from the profiles). The remaining SQLValueFunctions are now related to timestamps and dates. Bump catalog version. Reviewed-by: Corey Huinker Discussion: https://postgr.es/m/YzaG3MoryCguUOym@paquier.xyz
* Fix mislabeling of PROC_QUEUE->links as PGPROC, fixing UBSan on 32bitAndres Freund2022-11-19
| | | | | | | | | | | | | | | | | | | | | | | ProcSleep() used a PGPROC* variable to point to PROC_QUEUE->links.next, because that does "the right thing" with SHMQueueInsertBefore(). While that largely works, it's certainly not correct and unnecessary - we can just use SHM_QUEUE* to point to the insertion point. Noticed when testing a 32bit of postgres with undefined behavior sanitizer. UBSan noticed that sometimes the supposed PGPROC wasn't sufficiently aligned (required since 46d6e5f5679, ensured indirectly, via ShmemAllocRaw() guaranteeing cacheline alignment). For now fix this by using a SHM_QUEUE* for the insertion point. Subsequently we should replace all the use of PROC_QUEUE and SHM_QUEUE with ilist.h, but that's a larger change that we don't want to backpatch. Backpatch to all supported versions - it's useful to be able to run postgres under UBSan. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/20221117014230.op5kmgypdv2dtqsf@awork3.anarazel.de Backpatch: 11-
* Add a SET option to the GRANT command.Robert Haas2022-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to how the INHERIT option controls whether or not the permissions of the granted role are automatically available to the grantee, the new SET permission controls whether or not the grantee may use the SET ROLE command to assume the privileges of the granted role. In addition, the new SET permission controls whether or not it is possible to transfer ownership of objects to the target role or to create new objects owned by the target role using commands such as CREATE DATABASE .. OWNER. We could alternatively have made this controlled by the INHERIT option, or allow it when either option is given. An advantage of this approach is that if you are granted a predefined role with INHERIT TRUE, SET FALSE, you can't go and create objects owned by that role. The underlying theory here is that the ability to create objects as a target role is not a privilege per se, and thus does not depend on whether you inherit the target role's privileges. However, it's surely something you could do anyway if you could SET ROLE to the target role, and thus making it contingent on whether you have that ability is reasonable. Design review by Nathan Bossat, Wolfgang Walther, Jeff Davis, Peter Eisentraut, and Stephen Frost. Discussion: http://postgr.es/m/CA+Tgmob+zDSRS6JXYrgq0NWdzCXuTNzT5eK54Dn2hhgt17nm8A@mail.gmail.com
* Don't read MCV stats needlessly in eqjoinsel().Tom Lane2022-11-18
| | | | | | | | | | | | | | | | | eqjoinsel() currently makes use of MCV stats only when we have such stats for both sides of the clause. As coded, though, it would fetch those stats even when they're present for just one side. This can be a bit expensive with high statistics targets, leading to wasted effort in common cases such as joining a unique column to a non-unique column. So it seems worth the trouble to do a quick pre-check to confirm that both sides have MCVs before fetching either. Also, tweak the API spec for get_attstatsslot() to document the method we're using here. David Geier, Tomas Vondra, Tom Lane Discussion: https://postgr.es/m/b9846ca0-5f1c-9b26-5881-aad3f42b07f0@gmail.com
* Standardize rmgrdesc recovery conflict XID output.Peter Geoghegan2022-11-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Standardize on the name snapshotConflictHorizon for all XID fields from WAL records that generate recovery conflicts when in hot standby mode. This supersedes the previous latestRemovedXid naming convention. The new naming convention places emphasis on how the values are actually used by REDO routines. How the values are generated during original execution (details of which vary by record type) is deemphasized. Users of tools like pg_waldump can now grep for snapshotConflictHorizon to see all potential sources of recovery conflicts in a standardized way, without necessarily having to consider which specific record types might be involved. Also bring a couple of WAL record types that didn't follow any kind of naming convention into line. These are heapam's VISIBLE record type and SP-GiST's VACUUM_REDIRECT record type. Now every WAL record whose REDO routine calls ResolveRecoveryConflictWithSnapshot() passes through the snapshotConflictHorizon field from its WAL record. This is follow-up work to the refactoring from commit 9e540599 that made FREEZE_PAGE WAL records use a standard snapshotConflictHorizon style XID cutoff. No bump in XLOG_PAGE_MAGIC, since the underlying format of affected WAL records doesn't change. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wzm2CQUmViUq7Opgk=McVREHSOorYaAjR1ZpLYkRN7_dPw@mail.gmail.com
* Fix MERGE tuple count with DO NOTHINGAlvaro Herrera2022-11-17
| | | | | | | | | | Reporting tuples for which nothing is done is useless and goes against the documented behavior, so don't do it. Backpatch to 15. Reported by: Luca Ferrari Discussion: https://postgr.es/m/CAKoxK+42MmACUh6s8XzASQKizbzrtOGA6G1UjzCP75NcXHsiNw@mail.gmail.com
* Use correct type name in comments about freezing.Peter Geoghegan2022-11-17
| | | | Oversight in commit 9e540599, which added freeze plan deduplication.
* Fix outdated comment in ExecDeleteAlvaro Herrera2022-11-17
| | | | | | | | This commend references a struct that disappeared before MERGE was merged ... and ExecDelete is not called by the committed MERGE anyway. Revert to the original wording. Backpatch to 15
* Allow initdb to complete on systems without "locale" commandPeter Eisentraut2022-11-17
| | | | | | | | | | This partially reverts 2fe3bdbd691a5d11626308e7d660440be6c210c8, which added an error check on the "locale -a" execution. This is removed again, adding a comment explaining why. We already had code that shows a warning if no system locales could be found, which should be sufficient for feedback to the user. Discussion: https://www.postgresql.org/message-id/flat/b2b491d1-3b36-15b9-6910-5b5540b27f5c%40enterprisedb.com
* doc: Fix wording of MERGE actions in READMEDaniel Gustafsson2022-11-17
| | | | | | | | | | | | UPDATE was listed twice and DELETE was omitted, replace one UPDATE with DELETE instead. Backpatch through v15 where MERGE was added. Author: Myo Wai Thant <myo.waithant@fujitsu.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/OSAPR01MB43247E46931E9E9CFC4AA0F29A079@OSAPR01MB4324.jpnprd01.prod.outlook.com Backpatch-through: 15
* Fix typos in commentsDaniel Gustafsson2022-11-17
| | | | | | | Fix various misspellings of xl_running_xacts. Author: Japin Li <japinli@hotmail.com> Discussion: https://postgr.es/m/MEYP282MB1669CA2A39ACF0172774ED27B6069@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
* Update some more ObjectType switch statements to not have defaultPeter Eisentraut2022-11-17
| | | | | | | | | This allows the compiler to complain if a case has been missed. In these instances, the tradeoff of having to list a few unneeded cases to silence the compiler seems better than the risk of actually missing one. Discussion: https://www.postgresql.org/message-id/flat/fce5c98a-45da-19e7-dad0-21096bccd66e%40enterprisedb.com
* Improve ruleutils' printout of LATERAL references within subplans.Tom Lane2022-11-16
| | | | | | | | | | | | | | | | | | | Commit 1cc29fe7c, which taught EXPLAIN to print PARAM_EXEC Params as the referenced expressions, included some checks to prevent matching Params found in SubPlans or InitPlans to NestLoopParams of upper query levels. At the time, this seemed possibly necessary to avoid false matches because of the planner's habit of re-using the same PARAM_EXEC slot in multiple places in a plan. Furthermore, in the absence of LATERAL no such reference could be valid anyway. But it's possible now that we have LATERAL, and in the wake of 46c508fbc and 1db5667ba I believe the false-match hazard is gone. Hence, remove the in_same_plan_level checks. As shown in the regression test changes, this provides a useful improvement in readability for EXPLAIN of LATERAL-using subplans. Richard Guo, reviewed by Greg Stark and myself Discussion: https://postgr.es/m/CAMbWs4-YSOcQXAagJetP95cAeZPqzOy5kM5yijG0PVW5ztRb4w@mail.gmail.com
* Fix slowdown in TAP tests due to recent walreceiver change.Thomas Munro2022-11-17
| | | | | | | | | | | | | | Commit 05a7be93 changed the timing of the first reply sent by a walreceiver, which caused a few TAP tests that call wait_for_catchup() when they haven't actually streamed anything yet to wait ~10 seconds (wal_receiver_status_interval). Before commit 05a7be93 the initial reply was sent after 100ms, but there's no reason not to send it immediately as a slight improvement. Do the same for HS feedback for consistency. Author: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://postgr.es/m/742545.1668377284%40sss.pgh.pa.us
* Invent "multibitmapsets", and use them to speed up antijoin detection.Tom Lane2022-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | Implement a data structure that is a List of Bitmapsets, which is essentially a 2-D boolean array except that the rows need not all be the same width. Operations such as union and intersection are meaningful for these, just as they are for Bitmapsets. Eventually we might build many of the same operations that we have written for Bitmapsets, but for the first use-case we just need a few. That first use-case is for antijoin detection: reduce_outer_joins needs to find the set of Vars that are certain to be non-null in a successfully joined (not null-extended) left join row, and also find the set of Vars subject to higher-level IS NULL constraints, and intersect them. We had been doing this by making Lists of the Var nodes and then using list_intersect, which works but is pretty inefficient compared to a bitmapset-like intersection. Potentially it's O(N^2) if there are a lot of Vars involved, which fortunately there generally aren't; still it's not great. Moreover, that method requires the Vars of interest to be exactly equal() in the join condition and the upper IS NULL condition, which is problematic for my WIP patch that labels Vars according to which outer joins have possibly nulled them. Discussion: https://postgr.es/m/892228.1668437838@sss.pgh.pa.us Discussion: https://postgr.es/m/CAMbWs4-mvPPCJ1W6iK6dD5HiNwoJdi6mZp=-7mE8N9Sh+cd0tQ@mail.gmail.com
* Variable renaming in preparation for refactoringPeter Eisentraut2022-11-16
| | | | | | | | Rename page -> block and dp -> page where appropriate. The old naming mixed up block and page in confusing ways. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
* Remove useless castsPeter Eisentraut2022-11-16
| | | | | Maybe these are left from when PageGetItem() was a macro, but now they are clearly useless.
* Turn HeapKeyTest macro into inline functionPeter Eisentraut2022-11-16
| | | | | | | | It is easier to read as a function. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
* Remove unused includePeter Eisentraut2022-11-16
| | | | | Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
* Use multi-inserts for pg_ts_config_mapMichael Paquier2022-11-16
| | | | | | | | | | | | | | | | | | | Two locations working on pg_ts_config_map are switched from CatalogTupleInsert() to a multi-insert approach with tuple slots: - ALTER TEXT SEARCH CONFIGURATION ADD/ALTER MAPPING when inserting new entries. The number of entries to insert is known in advance, so is the number of slots needed. Note that CatalogTupleInsertWithInfo() is now used for the entry updates. - CREATE TEXT SEARCH CONFIGURATION, where up to ~20-ish records could be inserted at once. The number of slots is not known in advance, hence a slot initialization is delayed until a tuple is stored in it. Like all the changes of this kind (1ff4161, 63110c6 or e3931d01), an insert batch is capped at 64kB. Author: Michael Paquier, Ranier Vilela Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/Y3M5bovrkTQbAO4W@paquier.xyz
* Use multi-inserts for pg_enumMichael Paquier2022-11-16
| | | | | | | | | | | | | | | | | This allows to insert at once all the enum values defined with a given type into pg_enum, reducing the WAL produced by roughly 10%~. pg_enum's indexes are opened and closed now once rather than N times. The number of items to insert is known in advance, making this change straight-forward, and would happen on a CREATE TYPE .. AS ENUM. The amount of data inserted is capped at 64kB for each insert batch. This is similar to commits 63110c6 and e3931d01, that worked on different catalogs. Reported-by: Ranier Vilela Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, Ranier Vilela Discussion: https://postgr.es/m/Y3M5bovrkTQbAO4W@paquier.xyz
* Avoid some overhead with open and close of catalog indexesMichael Paquier2022-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit improves two code paths to open and close indexes a minimum amount of times when doing a series of catalog updates or inserts. CatalogTupleInsert() is costly when using it for multiple inserts or updates compared to CatalogTupleInsertWithInfo(), as it would need to open and close the indexes of the catalog worked each time an operation is done. This commit updates the following places: - REINDEX CONCURRENTLY when copying statistics from one index relation to the other. Multi-INSERTs are avoided here, as this would begin to show benefits only for indexes with multiple expressions, for example, which may not be the most common pattern. This change is noticeable in profiles with indexes having many expressions, for example, and it would improve any callers of CopyStatistics(). - Update of statistics on ANALYZE, that mixes inserts and updates. In each case, the catalog indexes are opened only if at least one insertion and/or update is required, to minimize the cost of the operation. Like the previous coding, no indexes are opened as long as at least one insert or update of pg_statistic has happened. Author: Ranier Vilela Reviewed-by: Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/CAEudQAqh0F9y6Di_Wc8xW4zkWm_5SDd-nRfVsCn=h0Nm1C_mrg@mail.gmail.com
* Mark argument of RegisterCustomRmgr() as const.Jeff Davis2022-11-15
|
* Deduplicate freeze plans in freeze WAL records.Peter Geoghegan2022-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make heapam WAL records that describe freezing performed by VACUUM more space efficient by storing each distinct "freeze plan" once, alongside an array of associated page offset numbers (one per freeze plan). The freeze plans required for most heap pages tend to naturally have a great deal of redundancy, so this technique is very effective in practice. It often leads to freeze WAL records that are less than 20% of the size of equivalent WAL records generated using the previous approach. The freeze plan concept was introduced by commit 3b97e6823b, which fixed bugs in VACUUM's handling of MultiXacts. We retain the concept of freeze plans, but go back to using page offset number arrays. There is no loss of generality here because deduplication is an additive process that gets applied mechanically when FREEZE_PAGE WAL records are built. More than anything else, freeze plan deduplication is an optimization that reduces the marginal cost of freezing additional tuples on pages that will need to have at least one or two tuples frozen in any case. Ongoing work that adds page-level freezing to VACUUM will take full advantage of the improved cost profile through batching. Also refactor some of the details surrounding recovery conflicts needed to REDO freeze records in passing: make original execution responsible for generating a standard latestRemovedXid cutoff, rather than working backwards to get the same cutoff in the REDO routine. Bugfix commit 66fbcb0d2e did it the other way around, which is equivalent but obscures what's going on. Also rename the cutoff field from the WAL record/struct (rename the field cutoff_xid to latestRemovedXid to match similar WAL records). Processing of conflicts by REDO routines is already completely uniform, so tools like pg_waldump should present the information driving the process uniformly. There are two remaining WAL record types that still don't quite follow this convention (heapam's VISIBLE record type and SP-GiST's VACUUM_REDIRECT record type). They can be brought into line by later work that totally standardizes how the cutoffs are presented. Bump XLOG_PAGE_MAGIC. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Nathan Bossart <nathandbossart@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CAH2-Wz=XytErMnb8FAyFd+OQEbiipB0Q2FmFdXrggPL4VBnRYQ@mail.gmail.com
* Check return value of pclose() correctlyPeter Eisentraut2022-11-15
| | | | | | | | | | | | | | | Some callers didn't check the return value of pclose() or ClosePipeStream() correctly. Either they didn't check it at all or they treated it like the return of fclose(). The correct way is to first check whether the return value is -1, and then report errno, and then check the return value like a result from system(), for which we already have wait_result_to_str() to make it simpler. To make this more compact, expand wait_result_to_str() to also handle -1 explicitly. Reviewed-by: Ankit Kumar Pandey <itsankitkp@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/8cd9fb02-bc26-65f1-a809-b1cb360eef73@enterprisedb.com
* Disallow setting archive_library and archive_command at the same timePeter Eisentraut2022-11-15
| | | | | | | | | | | Setting archive_library and archive_command at the same time is now an error. Before, archive_library would take precedence over archive_command. Author: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://www.postgresql.org/message-id/20220914222736.GA3042279%40nathanxps13