aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* Fix corruption when relation truncation fails.Thomas Munro2024-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RelationTruncate() does three things, while holding an AccessExclusiveLock and preventing checkpoints: 1. Logs the truncation. 2. Drops buffers, even if they're dirty. 3. Truncates some number of files. Step 2 could previously be canceled if it had to wait for I/O, and step 3 could and still can fail in file APIs. All orderings of these operations have data corruption hazards if interrupted, so we can't give up until the whole operation is done. When dirty pages were discarded but the corresponding blocks were left on disk due to ERROR, old page versions could come back from disk, reviving deleted data (see pgsql-bugs #18146 and several like it). When primary and standby were allowed to disagree on relation size, standbys could panic (see pgsql-bugs #18426) or revive data unknown to visibility management on the primary (theorized). Changes: * WAL is now unconditionally flushed first * smgrtruncate() is now called in a critical section, preventing interrupts and causing PANIC on file API failure * smgrtruncate() has a new parameter for existing fork sizes, because it can't call smgrnblocks() itself inside a critical section The changes apply to RelationTruncate(), smgr_redo() and pg_truncate_visibility_map(). That last is also brought up to date with other evolutions of the truncation protocol. The VACUUM FileTruncate() failure mode had been discussed in older reports than the ones referenced below, with independent analysis from many people, but earlier theories on how to fix it were too complicated to back-patch. The more recently invented cancellation bug was diagnosed by Alexander Lakhin. Other corruption scenarios were spotted by me while iterating on this patch and earlier commit 75818b3a. Back-patch to all supported releases. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reported-by: rootcause000@gmail.com Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/18146-04e908c662113ad5%40postgresql.org Discussion: https://postgr.es/m/18426-2d18da6586f152d6%40postgresql.org
* Remove pg_attribute.attcacheoff columnDavid Rowley2024-12-20
| | | | | | | | | The column is no longer needed as the offset is now cached in the CompactAttribute struct per commit 5983a4cff. Author: David Rowley Reviewed-by: Andres Freund, Victor Yegorov Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
* Relax regression test for fsync check of backend-level statsMichael Paquier2024-12-20
| | | | | | | | | | | One test added in 9aea73fc61d4 did not take into account that the backend may have some fsync even after a checkpoint. Let's relax it to be more flexible. Per report from buildfarm member grassquit, via Alexander Lakhin. Author: Bertrand Drouvot Discussion: https://postgr.es/m/6143ab0a-9e88-4790-8d9d-50ba45657761@gmail.com
* Introduce CompactAttribute array in TupleDesc, take 2David Rowley2024-12-20
| | | | | | | | | | | | | | | | | | | | | | | | The new compact_attrs array stores a few select fields from FormData_pg_attribute in a more compact way, using only 16 bytes per column instead of the 104 bytes that FormData_pg_attribute uses. Using CompactAttribute allows performance-critical operations such as tuple deformation to be performed without looking at the FormData_pg_attribute element in TupleDesc which means fewer cacheline accesses. For some workloads, tuple deformation can be the most CPU intensive part of processing the query. Some testing with 16 columns on a table where the first column is variable length showed around a 10% increase in transactions per second for an OLAP type query performing aggregation on the 16th column. However, in certain cases, the increases were much higher, up to ~25% on one AMD Zen4 machine. This also makes pg_attribute.attcacheoff redundant. A follow-on commit will remove it, thus shrinking the FormData_pg_attribute struct by 4 bytes. Author: David Rowley Reviewed-by: Andres Freund, Victor Yegorov Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
* Remove final mention of FREEZE_PAGE from commentsMelanie Plageman2024-12-19
| | | | | | | b7493e1ab35 removed leftover mentions of XLOG_HEAP2_FREEZE_PAGE records from comments but neglected to remove one mention of FREEZE_PAGE. Reported off-list by Alexander Lakhin
* Get rid of old version of BuildTupleHashTable().Tom Lane2024-12-19
| | | | | | | | | | | | | | It was reasonable to preserve the old API of BuildTupleHashTable() in the back branches, but in HEAD we should actively discourage use of that version. There are no remaining callers in core, so just get rid of it. Then rename BuildTupleHashTableExt() back to BuildTupleHashTable(). While at it, fix up the miserably-poorly-maintained header comment for BuildTupleHashTable[Ext]. It looks like more than one patch in this area has had the opinion that updating comments is beneath them. Discussion: https://postgr.es/m/538343.1734646986@sss.pgh.pa.us
* Use ExecGetCommonSlotOps infrastructure in more places.Tom Lane2024-12-19
| | | | | | | | | | | | | | Append, MergeAppend, and RecursiveUnion can all use the support functions added in commit 276279295. The first two can report a fixed result slot type if all their children return the same fixed slot type. That does nothing for the append step itself, but might allow optimizations in the parent plan node. RecursiveUnion can optimize tuple hash table operations in the same way as SetOp now does. Patch by me; thanks to Richard Guo and David Rowley for review. Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
* Improve planner's handling of SetOp plans.Tom Lane2024-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | Remove the code for inserting flag columns in the inputs of a SetOp. That was the only reason why there would be resjunk columns in a set-operations plan tree, so we can get rid of some code that supported that, too. Get rid of choose_hashed_setop() in favor of building Paths for the hashed and sorted alternatives, and letting them fight it out within add_path(). Remove set_operation_ordered_results_useful(), which was giving wrong answers due to examining the wrong ancestor node: we need to examine the immediate SetOperationStmt parent not the topmost node. Instead make each caller of recurse_set_operations() pass down the relevant parent node. (This thinko seems to have led only to wasted planning cycles and possibly-inferior plans, not wrong query answers. Perhaps we should back-patch it, but I'm not doing so right now.) Teach generate_nonunion_paths() to consider pre-sorted inputs for sorted SetOps, rather than always generating a Sort node. Patch by me; thanks to Richard Guo and David Rowley for review. Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
* Convert SetOp to read its inputs as outerPlan and innerPlan.Tom Lane2024-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original design for set operations involved appending the two input relations into one and adding a flag column that allows distinguishing which side each row came from. Then the SetOp node pries them apart again based on the flag. This is bizarre. The only apparent reason to do it is that when sorting, we'd only need one Sort node not two. But since sorting is at least O(N log N), sorting all the data is actually worse than sorting each side separately --- plus, we have no chance of taking advantage of presorted input. On top of that, adding the flag column frequently requires an additional projection step that adds cycles, and then the Append node isn't free either. Let's get rid of all of that and make the SetOp node have two separate children, using the existing outerPlan/innerPlan infrastructure. This initial patch re-implements nodeSetop.c and does a bare minimum of work on the planner side to generate correctly-shaped plans. In particular, I've tried not to change the cost estimates here, so that the visible changes in the regression test results will only involve removal of useless projection steps and not any changes in whether to use sorted vs hashed mode. For SORTED mode, we combine successive identical tuples from each input into groups, and then merge-join the groups. The tuple comparisons now use SortSupport instead of simple equality, but the group-formation part should involve roughly the same number of tuple comparisons as before. The cross-comparisons between left and right groups probably add to that, but I'm not sure to quantify how many more comparisons we might need. For HASHED mode, nodeSetop's logic is almost the same as before, just refactored into two separate loops instead of one loop that has an assumption that it will see all the left-hand inputs first. In both modes, I added early-exit logic to not bother reading the right-hand relation if the left-hand input is empty, since neither INTERSECT nor EXCEPT modes can produce any output if the left input is empty. This could have been done before in the hashed mode, but not in sorted mode. Sorted mode can also stop as soon as it exhausts the left input; any remaining right-hand tuples cannot have matches. Also, this patch adds some infrastructure for detecting whether child plan nodes all output the same type of tuple table slot. If they do, the hash table logic can use slightly more efficient code based on assuming that that's the input slot type it will see. We'll make use of that infrastructure in other plan node types later. Patch by me; thanks to Richard Guo and David Rowley for review. Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
* Remove extra prefetch iterator setup for Bitmap Table ScanMelanie Plageman2024-12-19
| | | | | | | 1a0da347a7ac98db replaced Bitmap Table Scan's separate private and shared bitmap iterators with a unified iterator. It accidentally set up the prefetch iterator twice for non-parallel bitmap table scans. Remove the extra set up call to tbm_begin_iterate().
* Fix bitmap table scan crash on iterator releaseMelanie Plageman2024-12-19
| | | | | | | | | | | 1a0da347a7ac98db replaced Bitmap Table Scan's individual private and shared iterators with a unified iterator. It neglected, however, to check if the iterator had already been cleaned up before doing so on rescan. Add this check both on rescan and end scan to be safe. Reported-by: Richard Guo Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs48nrhcLY1kcd-u9oD%2B6yiS631F_8Fx8ZGsO-BYDwH%2Bbyw%40mail.gmail.com
* Avoid nbtree index scan SAOP scanBehind confusion.Peter Geoghegan2024-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consistently reset so->scanBehind at the beginning of nbtree array advancement, even during sktrig_required=false calls (calls where array advancement is triggered by an unsatisfied non-required array scan key). Otherwise, it's possible for queries to fail to return all relevant tuples to the scan given a low-order required scan key that was previously deemed "satisfied" by a truncated high key attribute value. This only happened at the point where a later non-required array scan key needed to be "advanced" once on the next leaf page (that is, once the right sibling of the truncated high key page was reached). The underlying issue was that later code within _bt_advance_array_keys assumed that the so->scanBehind flag must have been set using the current page's high key (not the previous page's high key). Any later successful recheck call to _bt_check_compare would therefore spuriously be prevented from making _bt_advance_array_keys return true, based on the faulty belief that the truncated attribute must be from the scan's current tuple (i.e. the non-pivot tuple at the start of the next page). _bt_advance_array_keys would return false for the tuple, ultimately resulting in _bt_checkkeys failing to return a matching tuple. Oversight in commit 5bf748b8, which enhanced nbtree ScalarArrayOp execution. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkJKncfqyAUTeuB5GgRhT1vhsWO2q11dbZNqKmvjopP_g@mail.gmail.com Backpatch: 17-, where commit 5bf748b8 first appears.
* bootstrap: pure parser and reentrant scannerPeter Eisentraut2024-12-19
| | | | | | | | | | | | | | | | | Use the flex %option reentrant and the bison option %pure-parser to make the generated scanner and parser pure, reentrant, and thread-safe. Make the generated scanner use palloc() etc. instead of malloc() etc. For the bootstrap scanner and parser, reentrancy and memory management aren't that important, but we make this change here anyway so that all the scanners and parsers in the backend use a similar set of options and APIs. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
* Small whitespace improvementPeter Eisentraut2024-12-19
| | | | | Author: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
* Add backend-level statistics to pgstatsMichael Paquier2024-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new variable-numbered statistics kind in pgstats, where the object ID key of the stats entries is based on the proc number of the backends. This acts as an upper-bound for the number of stats entries that can exist at once. The entries are created when a backend starts after authentication succeeds, and are removed when the backend exits, making the stats entry exist for as long as their backend is up and running. These are not written to the pgstats file at shutdown (note that write_to_file is disabled, as a safety measure). Currently, these stats include only information about the I/O generated by a backend, using the same layer as pg_stat_io, except that it is now possible to know how much activity is happening in each backend rather than an overall aggregate of all the activity. A function called pg_stat_get_backend_io() is added to access this data depending on the PID of a backend. The existing structure could be expanded in the future to add more information about other statistics related to backends, depending on requirements or ideas. Auxiliary processes are not included in this set of statistics. These are less interesting to have than normal backends as they have dedicated entries in pg_stat_io, and stats kinds of their own. This commit includes also pg_stat_reset_backend_stats(), function able to reset all the stats associated to a single backend. Bump catalog version and PGSTAT_FILE_FORMAT_ID. Author: Bertrand Drouvot Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, Michael Paquier, Nazir Bilal Yavuz Discussion: https://postgr.es/m/ZtXR+CtkEVVE/LHF@ip-10-97-1-34.eu-west-3.compute.internal
* Extract logic filling pg_stat_get_io()'s tuplestore into its own routineMichael Paquier2024-12-19
| | | | | | | | | | | This commit adds pg_stat_io_build_tuples(), a helper routine for pg_stat_get_io(), that fills its result tuplestore based on the contents of PgStat_BktypeIO. This will be used in a follow-up commit that uses the same structures as pg_stat_io for reporting, including the same object types and contexts, but for a different statistics kind. Author: Bertrand Drouvot, Michael Paquier Discussion: https://postgr.es/m/ZtXR+CtkEVVE/LHF@ip-10-97-1-34.eu-west-3.compute.internal
* Optimize grouping equality checks with virtual slotsDavid Rowley2024-12-19
| | | | | | | | | | | | | | | | | | 8f4ee9626 fixed an old Assert failure that could happen when the slot type used to look up the hash table for BuildTupleHashTableExt() users wasn't a TTSOpsMinimalTuple slot. The fix for that in the back branches had to be to pass the TupleTableSlotOps as NULL, however in master, since we have the inputOps parameter as was added by d96d1d515, we can pass that down instead. At least one caller uses a fixed slot that's always TTSOpsVirtual, so passing down inputOps for these cases allows ExecBuildGroupingEqual() to skip adding the EEOP_INNER_FETCHSOME ExprEvalStep. This should increase the performance of hashed subplans very slightly. Author: Tom Lane, David Rowley Discussion: https://postgr.es/m/2543667.1734483723@sss.pgh.pa.us
* Fix Assert failure in WITH RECURSIVE UNION queriesDavid Rowley2024-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the non-recursive part of a recursive CTE ended up using TTSOpsBufferHeapTuple as the table slot type, then a duplicate value could cause an Assert failure in CheckOpSlotCompatibility() when checking the hash table for the duplicate value. The expected slot type for the deform step was TTSOpsMinimalTuple so the Assert failed when the TTSOpsBufferHeapTuple slot was used. This is a long-standing bug which we likely didn't notice because it seems much more likely that the non-recursive term would have required projection and used a TTSOpsVirtual slot, which CheckOpSlotCompatibility is ok with. There doesn't seem to be any harm done here other than the Assert failure. Both TTSOpsMinimalTuple and TTSOpsBufferHeapTuple slot types require tuple deformation, so the EEOP_*_FETCHSOME ExprState step would have properly existed in the ExprState. The solution is to pass NULL for the ExecBuildGroupingEqual's 'lops' parameter. This means the ExprState's EEOP_*_FETCHSOME step won't expect a fixed slot type. This makes CheckOpSlotCompatibility() happy as no checking is performed when the ExprEvalStep is not expecting a fixed slot type. Reported-by: Richard Guo Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAMbWs4-8U9q2LAtf8+ghV11zeUReA3AmrYkxzBEv0vKnDxwkKA@mail.gmail.com Backpatch-through: 13, all supported versions
* Remove leftover mentions of XLOG_HEAP2_FREEZE_PAGE recordsMelanie Plageman2024-12-18
| | | | | | | | | | | f83d709760d merged the separate XLOG_HEAP2_FREEZE_PAGE records into a new combined prune, freeze, and vacuum record with opcode XLOG_HEAP2_PRUNE_VACUUM_SCAN. Remove the last few references to XLOG_HEAP2_FREEZE_PAGE records which were accidentally left behind. Reported-by: Tomas Vondra Reviewed-by: Robert Haas Discussion: https://postgr.es/m/CA%2BTgmoY1tYff-1CEn8kYt5FsOrynTbtr%3DUZw%3D7mTC1Hv1HpeBQ%40mail.gmail.com
* Bitmap Table Scans use unified TBMIteratorMelanie Plageman2024-12-18
| | | | | | | | | | | | | | With the repurposing of TBMIterator as an interface for both parallel and serial iteration through TIDBitmaps in commit 7f9d4187e7bab10329cc, bitmap table scans may now use it. Modify bitmap table scan code to use the TBMIterator. This requires moving around a bit of code, so a few variables are initialized elsewhere. Author: Melanie Plageman Reviewed-by: Tomas Vondra Discussion: https://postgr.es/m/c736f6aa-8b35-4e20-9621-62c7c82e2168%40vondra.me
* Add common interface for TBMIteratorsMelanie Plageman2024-12-18
| | | | | | | | | | | | | Add and use TBMPrivateIterator, which replaces the current TBMIterator for serial use cases, and repurpose TBMIterator to be a unified interface for both the serial ("private") and parallel ("shared") TID Bitmap iterator interfaces. This encapsulation simplifies call sites for callers supporting both parallel and serial TID Bitmap access. TBMIterator is not yet used in this commit. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Heikki Linnakangas Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi
* Fix overflow danger in SampleHeapTupleVisible()Melanie Plageman2024-12-18
| | | | | | | | | | | 68d9662be1c4b70 made HeapScanDesc->rs_ntuples unsigned but neglected to change how it was being used in SampleHeapTupleVisible(). Return early if rs_ntuples is 0 to avoid overflowing and incorrectly executing the loop code in SampleHeapTupleVisible(). Reported-by: Ranier Vilela Discussion: https://postgr.es/m/CAEudQAot_xQoZyPZjpj1aBUPrPykY5mOPHGyvfe%3Djz%2BWowdA3A%40mail.gmail.com
* Make rs_cindex and rs_ntuples unsignedMelanie Plageman2024-12-18
| | | | | | | | | | | | | | | | | HeapScanDescData.rs_cindex and rs_ntuples can't be less than 0. All scan types using the heap scan descriptor expect these values to be >= 0. Make that expectation clear by making rs_cindex and rs_ntuples unsigned. Also remove the test in heapam_scan_bitmap_next_tuple() that checks if rs_cindex < 0. This was never true, but now that rs_cindex is unsigned, it makes even less sense. While we are at it, initialize both rs_cindex and rs_ntuples to 0 in initscan(). Author: Melanie Plageman Reviewed-by: Dilip Kumar Discussion: https://postgr.es/m/CAAKRu_ZxF8cDCM_BFi_L-t%3DRjdCZYP1usd1Gd45mjHfZxm0nZw%40mail.gmail.com
* psql: Add more information about service nameMichael Paquier2024-12-18
| | | | | | | | | | | | | | This commit adds support for the following items in psql, able to show a service name, when available: - Variable SERVICE. - Substitution %s in PROMPT{1,2,3}. This relies on 4b99fed7541e, that has made the service name available in PGconn for libpq. Author: Michael Banck Reviewed-by: Greg Sabino Mullane Discussion: https://postgr.es/m/6723c612.050a0220.1567f4.b94a@mx.google.com
* libpq: Add service name to PGconn and PQservice()Michael Paquier2024-12-18
| | | | | | | | | | | | | | This commit adds one field to PGconn for the database service name (if any), with PQservice() as routine to retrieve it. Like the other routines of this area, NULL is returned as result if the connection is NULL. A follow-up patch will make use of this feature to be able to display the service name in the psql prompt. Author: Michael Banck Reviewed-by: Greg Sabino Mullane Discusion: https://postgr.es/m/6723c612.050a0220.1567f4.b94a@mx.google.com
* Fix memory leak in pg_restore with zstd-compressed data.Tom Lane2024-12-17
| | | | | | | | | | EndCompressorZstd() neglected to free everything. This was most visible with a lot of large objects in the dump. Per report from Tomasz Szypowski. Back-patch to v16 where this code came in. Discussion: https://postgr.es/m/DU0PR04MB94193D038A128EF989F922D199042@DU0PR04MB9419.eurprd04.prod.outlook.com
* Fix incorrect slot type in BuildTupleHashTableExtDavid Rowley2024-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | 0f5738202 adjusted the execGrouping.c code so it made use of ExprStates to generate hash values. That commit made a wrong assumption that the slot type to pass to ExecBuildHash32FromAttrs() is always &TTSOpsMinimalTuple. That's not the case as the slot type depends on the slot type passed to LookupTupleHashEntry(), which for nodeRecursiveunion.c, could be any of the current slot types. Here we fix this by adding a new parameter to BuildTupleHashTableExt() to allow the slot type to be passed in. In the case of nodeSubplan.c and nodeAgg.c the slot type is always &TTSOpsVirtual, so for both of those cases, it's beneficial to pass the known slot type as that allows ExecBuildHash32FromAttrs() to skip adding the tuple deform step to the resulting ExprState. Another possible fix would have been to have ExecBuildHash32FromAttrs() set "fetch.kind" to NULL so that ExecComputeSlotInfo() always determines the EEOP_INNER_FETCHSOME is required, however, that option isn't favorable as slows down aggregation and hashed subplan evaluation due to the extra (needless) deform step. Thanks to Nathan Bossart for bisecting to find the offending commit based on Paul's report. Reported-by: Paul Ramsey <pramsey@cleverelephant.ca> Discussion: https://postgr.es/m/99F064C1-B3EB-4BE7-97D2-D2A0AA487A71@cleverelephant.ca
* Accommodate very large dshash tables.Nathan Bossart2024-12-17
| | | | | | | | | | | | | | | | | If a dshash table grows very large (e.g., the dshash table for cumulative statistics when there are millions of tables), resizing it may fail with an error like: ERROR: invalid DSA memory alloc request size 1073741824 To fix, permit dshash resizing to allocate more than 1 GB by providing the DSA_ALLOC_HUGE flag. Reported-by: Andreas Scherbaum Author: Matthias van de Meent Reviewed-by: Cédric Villemain, Michael Paquier, Andres Freund Discussion: https://postgr.es/m/80a12d59-0d5e-4c54-866c-e69cd6536471%40pgug.de Backpatch-through: 13
* Skip useless calculation of join RTE column names during EXPLAIN.Tom Lane2024-12-17
| | | | | | | | | | | | | | | | | There's no need for set_simple_column_names() to compute unique column names for join RTEs, because a finished plan tree will not contain any join alias Vars that we could need names for. Its other, internal callers will not pass it any join RTEs anyway, so the upshot is we can just skip join RTEs here. Aside from getting rid of a klugy against-its-documentation use of set_relation_column_names, this can speed up EXPLAIN substantially when considering many-join queries, because the upper join RTEs tend to have a lot of columns. Sami Imseih, with cosmetic changes by me Discussion: https://postgr.es/m/CAA5RZ0th3q-0p1pri58z9grG8r8azmEBa8o1rtkwhLmJg_cH+g@mail.gmail.com
* Count pages set all-visible and all-frozen in VM during vacuumMelanie Plageman2024-12-17
| | | | | | | | | | | | | | | | Heap vacuum already counts and logs pages with newly frozen tuples. Now count and log the number of pages newly set all-visible and all-frozen in the visibility map. Pages that are all-visible but not all-frozen are debt for future aggressive vacuums. The counts of newly all-visible and all-frozen pages give us insight into the rate at which this debt is being accrued and paid down. Author: Melanie Plageman Reviewed-by: Masahiko Sawada, Alastair Turner, Nitin Jadhav, Andres Freund, Bilal Yavuz, Tomas Vondra Discussion: https://postgr.es/m/flat/CAAKRu_ZQe26xdvAqo4weHLR%3DivQ8J4xrSfDDD8uXnh-O-6P6Lg%40mail.gmail.com#6d8d2b4219394f774889509bf3bdc13d, https://postgr.es/m/ctdjzroezaxmiyah3gwbwm67defsrwj2b5fpfs4ku6msfpxeia%40mwjyqlhwr2wu
* Make visibilitymap_set() return previous state of vmbitsMelanie Plageman2024-12-17
| | | | | | | | | | | | It can be useful to know the state of a relation page's VM bits before visibilitymap_set(). visibilitymap_set() has the old value on hand, so returning it is simple. This commit does not use visibilitymap_set()'s new return value. Author: Melanie Plageman Reviewed-by: Masahiko Sawada, Andres Freund, Nitin Jadhav, Bilal Yavuz Discussion: https://postgr.es/m/flat/CAAKRu_ZQe26xdvAqo4weHLR%3DivQ8J4xrSfDDD8uXnh-O-6P6Lg%40mail.gmail.com#6d8d2b4219394f774889509bf3bdc13d, https://postgr.es/m/ctdjzroezaxmiyah3gwbwm67defsrwj2b5fpfs4ku6msfpxeia%40mwjyqlhwr2wu
* Rename LVRelState->frozen_pagesMelanie Plageman2024-12-17
| | | | | | | | | | | | | Rename frozen_pages to new_frozen_tuple_pages in LVRelState, the struct used for tracking state during vacuuming of a heap relation. frozen_pages sounds like it tracks pages set all-frozen. That is a misnomer. It only includes pages with at least one newly frozen tuple. It also includes pages that are not all-frozen. Author: Melanie Plageman Reviewed-by: Andres Freund, Masahiko Sawada, Nitin Jadhav, Bilal Yavuz Discussion: https://postgr.es/m/ctdjzroezaxmiyah3gwbwm67defsrwj2b5fpfs4ku6msfpxeia%40mwjyqlhwr2wu
* Set max_safe_fds whenever we create shared memory and semaphores.Tom Lane2024-12-17
| | | | | | | | | | | | | | | Formerly we skipped this in bootstrap/check mode and in single-user mode. That's bad in check mode because it may allow accepting a value of max_connections that doesn't actually work: on platforms where semaphores consume file descriptors, there may not be enough free FDs left over to satisfy fd.c, causing postmaster start to fail. It's also not great in single-user mode, because fd.c will operate with just the minimum allowable value of max_safe_fds, resulting in excess file open/close overhead if anything moderately complicated is done in single-user mode. (There may be some penalty for bootstrap mode too, though probably not much.) Discussion: https://postgr.es/m/2081982.1734393311@sss.pgh.pa.us
* Set the stack_base_ptr in main(), not in random other places.Tom Lane2024-12-17
| | | | | | | | | | | | | | | | | | | Previously we did this in PostmasterMain() and InitPostmasterChild(), which meant that stack depth checking was disabled in non-postmaster server processes, for instance in single-user mode. That seems like a fairly bad idea, since there's no a-priori restriction on the complexity of queries we will run in single-user mode. Moreover, this led to not having quite the same stack depth limit in all processes, which likely has no real-world effect but it offends my inner neatnik. Setting the depth in main() guarantees that check_stack_depth() is armed and has a consistent interpretation of stack depth in all forms of server processes. While at it, move the code associated with checking the stack depth out of tcop/postgres.c (which was never a great home for it) into a new file src/backend/utils/misc/stack_depth.c. Discussion: https://postgr.es/m/2081982.1734393311@sss.pgh.pa.us
* Update comments about index parallel buildsTomas Vondra2024-12-17
| | | | | | | | | | | | Commit b43757171470 allowed parallel builds for BRIN, but left behind two comments claiming only btree indexes support parallel builds. Reported by Egor Rogov, along with similar issues in SGML docs. Backpatch to 17, where parallel builds for BRIN were introduced. Reported-by: Egor Rogov Backpatch-through: 17 Discussion: https://postgr.es/m/114e2d5d-125e-07d8-94aa-5ad175fb7443@postgrespro.ru
* Remove ts_locale.c's lowerstr()Peter Eisentraut2024-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lowerstr() and lowerstr_with_len() in ts_locale.c do the same thing as str_tolower() that the rest of the system uses, except that the former don't use the common locale provider framework but instead use the global libc locale settings. This patch replaces uses of lowerstr*() with str_tolower(..., DEFAULT_COLLATION_OID). For instances that use a libc locale globally, this will result in exactly the same behavior. For instances that use other locale providers, you now get consistent behavior and are no longer dependent on the libc locale settings (for this case; there are others). Most uses of these functions are for processing dictionary and configuration files. In those cases, using the default collation seems appropriate. At least we don't have a more specific collation available. But the code in contrib/pg_trgm should really depend on the collation of the columns being processed. This is not done here, this can be done in a separate patch. (You can probably construct some edge cases where this change would create some locale-related upgrade incompatibility, for example if before you used a combination of ICU and a differently-behaving libc locale. We can document this in the release notes, but I don't think there is anything more we can do about this.) Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://www.postgresql.org/message-id/flat/653f3b84-fc87-45a7-9a0c-bfb4fcab3e7d%40eisentraut.org
* Remove ts_locale.c's t_isdigit(), t_isspace(), t_isprint()Peter Eisentraut2024-12-17
| | | | | | | | | | | | | | | | | | | | | | | | These do the same thing as the standard isdigit(), isspace(), and isprint() but with multibyte and encoding support. But all the callers are only interested in analyzing single-byte ASCII characters. So this extra layer is overkill and we can replace the uses with the standard functions. All the t_is*() functions in ts_locale.c are under scrutiny because they don't use the common locale provider framework but instead use the global libc locale settings. For the functions being touched by this patch, we don't need all that anyway, as mentioned above, so the simplest solution is to just remove them. The few remaining t_is*() functions will need a different treatment in a separate patch. pg_trgm has some compile-time options with macros such as KEEPONLYALNUM. These are not documented, and the non-default variant is not supported by any test cases. As part of this undertaking, I'm removing the non-default variant, as it is in the way of cleanup. So in this case, the not-KEEPONLYALNUM code path is gone. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://www.postgresql.org/message-id/flat/653f3b84-fc87-45a7-9a0c-bfb4fcab3e7d%40eisentraut.org
* Avoid unnecessary wrapping for more complex expressionsRichard Guo2024-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When pulling up a subquery that is under an outer join, if the subquery's target list contains a strict expression that uses a subquery variable, it's okay to pull up the expression without wrapping it in a PlaceHolderVar: if the subquery variable is forced to NULL by the outer join, the expression result will come out as NULL too. If the strict expression does not contain any subquery variables, the current code always wraps it in a PlaceHolderVar. While this is not incorrect, the analysis could be tighter: if the strict expression contains any variables of rels that are under the same lowest nulling outer join as the subquery, we can also avoid wrapping it. This is safe because if the subquery variable is forced to NULL by the outer join, the variables of rels that are under the same lowest nulling outer join will also be forced to NULL, resulting in the expression evaluating to NULL as well. Therefore, it's not necessary to force the expression to be evaluated below the outer join. It could be beneficial to get rid of such PHVs because they could imply lateral dependencies, which force us to resort to nestloop joins. This patch checks if the lateral references in the strict expression contain any variables of rels under the same lowest nulling outer join as the subquery, and avoids wrapping the expression in that case. This is fundamentally a generalization of the optimizations for bare Vars and PHVs introduced in commit f64ec81a8. No backpatch as this could result in plan changes. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs4_ENtfRdLaM_bXAxiKRYO7DmwDBDG4_2=VTDi0mJP-jAw@mail.gmail.com
* Tweak some comments related to variable-numbered stats in pgstat.cMichael Paquier2024-12-17
| | | | | | | | | These comments referred to database objects, but depending on the stats kind dealt with this may not be true. Issues found while reviewing a different patch in this area. Discussion: https://postgr.es/m/ZtXR+CtkEVVE/LHF@ip-10-97-1-34.eu-west-3.compute.internal
* Print out error position for some more DDLsMichael Paquier2024-12-17
| | | | | | | | | | | | | | The following commands gain some information about the error position in the query, should they fail when looking at the type used: - CREATE TYPE (LIKE) - CREATE TABLE OF Both are related to typenameType() where the type name lookup is done. These calls gain the ParseState that already exists in these paths. Author: Kirill Reshke, Jian He Reviewed-by: Álvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/CALdSSPhqfvKbDwqJaY=yEePi_aq61GmMpW88i6ZH7CMG_2Z4Cg@mail.gmail.com
* pg_combinebackup: Fix PITR comparison test in 002_compare_backupsMichael Paquier2024-12-17
| | | | | | | | | | | | | | | | | The test was creating both the dumps to compare from the same database on the same node, so it would never detect any mismatches when comparing the logical dumps of the two servers. Fixing this issue has revealed that there is a difference in the dumps: the tablespaces paths are different. This commit uses compare_text() with a custom comparison function to erase the difference (slightly tweaked to be able to work with WIN32 and non-WIN32 paths). This way, the non-relevant parts of the tablespace path are ignored from the check with the basic structure of the query string still compared. Author: Dagfinn Ilmari Mannsåker Discussion: https://postgr.es/m/87h67653ns.fsf@wibble.ilmari.org Backpatch-through: 17
* psql: Tab completion for JOIN ... USING column listTomas Vondra2024-12-16
| | | | | | | | | For JOIN ... USING, offer attribute names for the first member of the column list. Author: Andreas Karlsson Reviewed-By: Tomas Vondra Discussion: https://postgr.es/m/3a7e27bc-d6ed-4cb0-9b21-f21143fc1b37@proxel.se
* psql: Tab completion for JOIN ... ON/USINGTomas Vondra2024-12-16
| | | | | | | | | Offer ON/USING clauses for join types that require join conditions (i.e. anything except for NATURAL/CROSS joins). Author: Andreas Karlsson Reviewed-By: Tomas Vondra Discussion: https://postgr.es/m/3a7e27bc-d6ed-4cb0-9b21-f21143fc1b37@proxel.se
* psql: Tab completion for LATERAL joinsTomas Vondra2024-12-16
| | | | | | | | When listing selectable objects after a JOIN, offer also LATERAL. Author: Andreas Karlsson Reviewed-By: Tomas Vondra Discussion: https://postgr.es/m/3a7e27bc-d6ed-4cb0-9b21-f21143fc1b37@proxel.se
* Refactor string case conversion into provider-specific files.Jeff Davis2024-12-16
| | | | | | | | | Create API entry points pg_strlower(), etc., that work with any provider and give the caller control over the destination buffer. Then, move provider-specific logic into pg_locale_builtin.c, pg_locale_icu.c, and pg_locale_libc.c as appropriate. Discussion: https://postgr.es/m/7aa46d77b377428058403723440862d12a8a129a.camel@j-davis.com
* psql: Tab completion for CREATE MATERIALIZED VIEW ... USINGTomas Vondra2024-12-16
| | | | | | | | | The tab completion didn't offer USING for CREATE MATERIALIZED VIEW, so add it, and offer a list of access methods, followed by SELECT. Author: Kirill Reshke Reviewed-By: Karina Litskevich Discussion: https://postgr.es/m/CALdSSPhVELkvutquqrDB=Ujfq_Pjz=6jn-kzh+291KPNViLTfw@mail.gmail.com
* psql: Tab completion for CREATE TEMP TABLE ... USINGTomas Vondra2024-12-16
| | | | | | | | | The USING keyword was offered only for persistent tables, not for temporary ones. So improve that. Author: Kirill Reshke Reviewed-By: Karina Litskevich Discussion: https://postgr.es/m/CALdSSPhVELkvutquqrDB=Ujfq_Pjz=6jn-kzh+291KPNViLTfw@mail.gmail.com
* psql: Tab completion for ALTER TYPE ... CASCADE/RESTRICTTomas Vondra2024-12-16
| | | | | | | | | | | | Updates table completion for ALTER TYPE to offer CASCADE/RESTRICT for a number of actions on attributes: ALTER TYPE ... ADD/DROP/RENAME ATTRIBUTE ... [CASCADE|RESTRICT] ALTER TYPE ... TYPE ... [CASCADE|RESTRICT] Author: Kirill Reshke Reviewed-By: Karina Litskevich Discussion: https://postgr.es/m/CALdSSPhVELkvutquqrDB=Ujfq_Pjz=6jn-kzh+291KPNViLTfw@mail.gmail.com
* psql: Tab completion for ALTER TYPE ... ADD ATTRIBUTETomas Vondra2024-12-16
| | | | | | | | | Improve psql tab completion for ALTER TYPE ... ADD ATTRIBUTE to offer a list of existing data types (until now no options were offered). Author: Kirill Reshke Reviewed-By: Karina Litskevich Discussion: https://postgr.es/m/CALdSSPhVELkvutquqrDB=Ujfq_Pjz=6jn-kzh+291KPNViLTfw@mail.gmail.com
* Make 009_twophase.pl test pass with recovery_min_apply_delay setHeikki Linnakangas2024-12-16
| | | | | | | | | | | | | | The test failed if you ran the regression tests with TEMP_CONFIG with recovery_min_apply_delay = '500ms'. Fix the race condition by waiting for transaction to be applied in the replica, like in a few other tests. The failing test was introduced in commit cbfbda7841. Backpatch to all supported versions like that commit (except v12, which is no longer supported). Reported-by: Alexander Lakhin Discussion: https://www.postgresql.org/message-id/09e2a70a-a6c2-4b5c-aeae-040a7449c9f2@gmail.com