aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* Fix incorrect copy-pasto in error message of pg_waldump.cMichael Paquier2022-12-27
| | | | | The error message used on fclose() failure was incorrect, so fix it. Oversight in d497093, that I have somehow managed to miss.
* pg_waldump: Add --save-fullpage=PATH to save full page images from WAL recordsMichael Paquier2022-12-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This option extracts (potentially decompressing) full-page images included in WAL records into a given target directory. These images are subject to the same filtering rules as the normal display of the WAL records, hence with --relation one can for example extract only the FPIs issued on the relation defined. By default, the records are printed or their stats computed (--stats), using --quiet would only save the images without any output generated. This is a tool aimed mostly for very experienced users, useful for fixing page-level corruption or just analyzing the past state of a page, and there were no easy way to do that with the in-core tools up to now when looking at WAL. Each block is saved in a separate file, to ease their manipulation, with the file respecting <lsn>.<ts>.<db>.<rel>.<blk>_<fork> with as format. For instance, 00000000-010000C0.1663.1.6117.123_main refers to: - WAL record LSN in hexa format (00000000-010000C0). - Tablespace OID (1663). - Database OID (1). - Relfilenode (6117). - Block number (123). - Fork name of the file this block came from (_main). Author: David Christensen Reviewed-by: Sho Kato, Justin Pryzby, Bharath Rupireddy, Matthias van de Meent Discussion: https://postgr.es/m/CAOxo6XKjQb2bMSBRpePf3ZpzfNTwjQUc4Tafh21=jzjX6bX8CA@mail.gmail.com
* Add 'logical_decoding_mode' GUC.Amit Kapila2022-12-26
| | | | | | | | | | | | | | | | This enables streaming or serializing changes immediately in logical decoding. This parameter is intended to be used to test logical decoding and replication of large transactions for which otherwise we need to generate the changes till logical_decoding_work_mem is reached. This helps in reducing the timing of existing tests related to logical replication of in-progress transactions and will help in writing tests for for the upcoming feature for parallelly applying large in-progress transactions. Author: Shi yu Reviewed-by: Sawada Masahiko, Shveta Mallik, Amit Kapila, Dilip Kumar, Kuroda Hayato, Kyotaro Horiguchi Discussion: https://postgr.es/m/OSZPR01MB63104E7449DBE41932DB19F1FD1B9@OSZPR01MB6310.jpnprd01.prod.outlook.com
* Switch query fixing aclitems in ~15 from O(N^2) to O(N) in upgrade_adapt.sqlMichael Paquier2022-12-26
| | | | | | | | | | | | f4f2f2b was doing a sequential scan of pg_class before checking if a relation had attributes dependent on aclitem as data typewhen building the set of ALTER TABLE queries, but it would be costly on a regression database. While on it, make the query style more consistent with the rest. Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20221223032724.GQ1153@telsasoft.com
* Convert enum_in() to report errors softly.Tom Lane2022-12-25
| | | | | | | | I missed this in my initial survey, probably because I examined the contents of pg_type in the postgres database, which lacks any enumerated types. Discussion: https://postgr.es/m/CAAJ_b97KeDWUdpTKGOaFYPv0OicjOu6EW+QYWj-Ywrgj_aEy1g@mail.gmail.com
* Convert jsonpath's input function to report errors softlyAndrew Dunstan2022-12-24
| | | | | | Reviewed by Tom Lane Discussion: https://postgr.es/m/a8dc5700-c341-3ba8-0507-cc09881e6200@dunslane.net
* Make the numeric-OID cases of regprocin and friends be non-throwing.Tom Lane2022-12-24
| | | | | | | | | While at it, use a common subroutine already. This doesn't move the needle very far in terms of making these functions non-throwing; the only case we're now able to trap is numeric-OID-is-out-of-range. Still, it seems like a pretty non-controversial step in that direction.
* Fix recent accidental omission in pg_proc.datDavid Rowley2022-12-24
| | | | | | | | | | | | | ed1a88dda added support functions for the ntile(), percent_rank() and cume_dist() window functions but neglected to actually add these support functions to the pg_proc entry for the corresponding window function. Also, take this opportunity to add these window functions to one of the regression tests added in ed1a88dda to give the support functions a little bit of exercise. If I'd done that in the first place then the omission would have been more obvious. Bump the catversion, again.
* Fix end LSN determination in recently added testAlvaro Herrera2022-12-23
| | | | | | | | | | | | | | | | | | The test added in commit e44dae07f931 has a thinko: it wants to read info about a few WAL records, but it obtains the LSN of the final record to read by asking for the WAL insert position; however, pg_get_wal_records_info only accepts to read up to the flush position (cf. IsFutureLSN()). In normal conditions there is no difference, since the last record written by the preceding loop is known flushed and it's the one the test wants; but it's possible to have some other process insert another WAL record that isn't flushed, and that causes the whole test to explode. Fix by having pg_get_wal_records_info() read only up to the flushed position. Backpatch to 15, which is where pg_walinspect appeared. Author: Karina Litskevich <litskevichkarina@gmail.com> Discussion: https://postgr.es/m/a5559c95-52c3-5eea-cd63-9b4f1c70ff96@gmail.com
* Fix bug in translate_col_privs_multilevelDavid Rowley2022-12-24
| | | | | | | | | | | | | Fix incorrect code which was trying to convert a Bitmapset of columns at the attnums according to a parent table and transform them into the equivalent Bitmapset with same attnums according to the given child table. This code is new as of a61b1f748 and was failing to do the correct translation when there was an intermediate parent table between 'rel' and 'top_parent_rel'. Reported-by: Ranier Vilela Author: Richard Guo, Amit Langote Discussion: https://postgr.es/m/CAEudQArohfB_Gy%2BhcH2-bANUkxgjJiP%3DABq01_LgTNTbcNijag%40mail.gmail.com
* Allow parent's WaitEventSets to be freed after fork().Thomas Munro2022-12-23
| | | | | | | | | | | | | An epoll fd belonging to the parent should be closed in the child. A kqueue fd is automatically closed by fork(), but we should still adjust our counter. For poll and Windows systems, nothing special is required. On all systems we free the memory. No caller yet, but we'll need this if we start using WaitEventSet in the postmaster as planned. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
* Don't leak a signalfd when using latches in the postmaster.Thomas Munro2022-12-23
| | | | | | | | | | | At the time of commit 6a2a70a02 we didn't use latch infrastructure in the postmaster. We're planning to start doing that, so we'd better make sure that the signalfd inherited from a postmaster is not duplicated and then leaked in the child. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
* Add WL_SOCKET_ACCEPT event to WaitEventSet API.Thomas Munro2022-12-23
| | | | | | | | | | | | | | | To be able to handle incoming connections on a server socket with the WaitEventSet API, we'll need a new kind of event to indicate that the the socket is ready to accept a connection. On Unix, it's just the same as WL_SOCKET_READABLE, but on Windows there is a different underlying kernel event that we need to map our abstraction to. No user yet, but a proposed patch would use this. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
* Update upgrade_adapt.sql to handle tables using aclitem as data typeMichael Paquier2022-12-23
| | | | | | | | | | | | | | | | | | | | | The regression test suite includes a table called "tab_core_types" that has one attribute based on the type "aclitem". Keeping this attribute as-is causes hard failures when running pg_upgrade with an origin on ~15. This commit updates upgrade_adapt.sql to automatically detect the tables with such attributes and switch them to text so as pg_upgrade is able to go through its run. This does not provide the same detection coverage as pg_upgrade, where we are able to find out aclitems used in arrays, domains or even composite types, but this is (I guess) enough for most things like an instance that had installcheck run on before the upgrade with a dump generated from it. Note that the buildfarm code has taken the simplest approach of just dropping "tab_core_types", so what we have here is more modular. Author: Anton A. Melnikov Discussion: https://postgr.es/m/49f389ba-95ce-8a9b-09ae-f60650c0e7c7@inbox.ru
* Fix some incorrectness in upgrade_adapt.sql on query for WITH OIDSMichael Paquier2022-12-23
| | | | | | | | | | | | | The query used to disable WITH OIDS in all the relations making use of it was checking for materialized views, but this is not a supported operation. On the contrary, this needs to be done on foreign tables. While on it, use quote_ident() in the ALTER TABLE strings built on the relation name. Author: Anton A. Melnikov, Michael Paquier Discussion: https://postgr.es/m/49f389ba-95ce-8a9b-09ae-f60650c0e7c7@inbox.ru Backpatch-through: 12
* Fix come incorrect elog() messages in aclchk.cMichael Paquier2022-12-23
| | | | | | | | | | | | | | Three error strings used with cache lookup failures were referring to incorrect object types for ACL checks: - Schemas - Types - Foreign Servers There errors should never be triggered, but if they do incorrect information would be reported. Author: Justin Pryzby Discussion: https://postgr.es/m/20221222153041.GN1153@telsasoft.com Backpatch-through: 11
* Rename pg_dissect_walfile_name() to pg_split_walfile_name()Michael Paquier2022-12-23
| | | | | | | | | | | | | | | The former name was discussed as being confusing, so use "split", as per a suggestion from Magnus Hagander. While on it, one of the output arguments is renamed from "segno" to "segment_number", as per a suggestion from Kyotaro Horiguchi. The documentation is updated to reflect all these changes. Bump catalog version. Author: Bharath Rupireddy, Michael Paquier Discussion: https://postgr.es/m/CABUevEytQVaOOhGdoh0D7hGwe3fuKcRF6NthsSW7ww04EmtFgQ@mail.gmail.com
* Allow window functions to adjust their frameOptionsDavid Rowley2022-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | WindowFuncs such as row_number() don't care if it's called with ROWS UNBOUNDED PRECEDING AND CURRENT ROW or with RANGE UNBOUNDED PRECEDING AND CURRENT ROW. The latter is less efficient as the RANGE option requires that the executor check for peer rows, so using the ROW option instead would cause less overhead. Because RANGE is part of the default frame options for WindowClauses, it means WindowAgg is, by default, working much harder than it needs to for window functions where the ROWS / RANGE option has no effect on the window function's result. On a test query from the discussion thread, a performance improvement of 344% was seen by using ROWS instead of RANGE. Here we add a new support function node type to allow support functions to be called for window functions so that the most optimal version of the frame options can be set. The planner has been adjusted so that the frame options are changed only if all window functions sharing the same window clause agree on what the optimized frame options are. Here we give the ability for row_number(), rank(), dense_rank(), percent_rank(), cume_dist() and ntile() to alter their WindowClause's frameOptions. Reviewed-by: Vik Fearing, Erwin Brandstetter, Zhihong Yu Discussion: https://postgr.es/m/CAGHENJ7LBBszxS+SkWWFVnBmOT2oVsBhDMB1DFrgerCeYa_DyA@mail.gmail.com Discussion: https://postgr.es/m/CAApHDvohAKEtTXxq7Pc-ic2dKT8oZfbRKeEJP64M0B6+S88z+A@mail.gmail.com
* Improve notation of cacheinfo table in syscache.c.Thomas Munro2022-12-23
| | | | | | | | | | Use C99 designated initializer syntax for the array elements, instead of writing the enumerator name and position in a comment. Replace nkeys and key with a local variadic macro, for a shorter notation. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/CA%2BhUKGKdpDjKL2jgC-GpoL4DGZU1YPqnOFHbDqFkfRQcPaR5DQ%40mail.gmail.com
* Use scanned_pages to decide when to failsafe check.Peter Geoghegan2022-12-22
| | | | | | | | | | | | | | | | Perform a failsafe check every time VACUUM's first heap scan scans a further FAILSAFE_EVERY_PAGES pages, rather than using an approach based on the number of physical blocks that our current blkno is from the blkno at the time of the previous failsafe check. That way VACUUM will perform a failsafe check every time it has scanned a uniform number of pages, without it mattering when or how VACUUM skipped pages using the visibility map. Sami Imseih, with changes to FAILSAFE_EVERY_PAGES comments added by me. Author: Sami Imseih <simseih@amazon.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/401CE010-4049-4B94-9961-0B610A5D254D%40amazon.com
* Refactor how VACUUM passes around its XID cutoffs.Peter Geoghegan2022-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use a dedicated struct for the XID/MXID cutoffs used by VACUUM, such as FreezeLimit and OldestXmin. This state is initialized in vacuum.c, and then passed around by code from vacuumlazy.c to heapam.c freezing related routines. The new convention is that everybody works off of the same cutoff state, which is passed around via pointers to const. Also simplify some of the logic for dealing with frozen xmin in heap_prepare_freeze_tuple: add dedicated "xmin_already_frozen" state to clearly distinguish xmin XIDs that we're going to freeze from those that were already frozen from before. That way the routine's xmin handling code is symmetrical with the existing xmax handling code. This is preparation for an upcoming commit that will add page level freezing. Also refactor the control flow within FreezeMultiXactId(), while adding stricter sanity checks. We now test OldestXmin directly, instead of using FreezeLimit as an inexact proxy for OldestXmin. This is further preparation for the page level freezing work, which will make the function's caller cede control of page level freezing to the function where appropriate (where heap_prepare_freeze_tuple sees a tuple that happens to contain a MultiXactId in its xmax). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/CAH2-WznS9TxXmz2_=SY+SyJyDFbiOftKofM9=aDo68BbXNBUMA@mail.gmail.com
* Avoid O(N^2) cost when pulling up lots of UNION ALL subqueries.Tom Lane2022-12-22
| | | | | | | | | | | | | | | | | | | perform_pullup_replace_vars() knows how to scan the whole parent query tree when we are replacing Vars during a subquery flattening operation. However, for the specific case of flattening a UNION ALL leaf query, that's mostly wasted work: the only place where relevant Vars could exist is in the AppendRelInfo that we just made for this leaf. Teaching perform_pullup_replace_vars() to just deal with that and exit is worthwhile because, if we have N such subqueries to pull up, we were spending O(N^2) work uselessly mutating the AppendRelInfos for all the other subqueries. While we're at it, avoid calling substitute_phv_relids if there are no PlaceHolderVars, and remove an obsolete check of parse->hasSubLinks. Andrey Lepikhov and Tom Lane Discussion: https://postgr.es/m/703c09a2-08f3-d2ec-b33d-dbecd62428b8@postgrespro.ru
* Add some recursion and looping defenses in prepjointree.c.Tom Lane2022-12-22
| | | | | | | | | | | | | Andrey Lepikhov demonstrated a case where we spend an unreasonable amount of time in pull_up_subqueries(). Not only is that recursing with no explicit check for stack overrun, but the code seems not interruptable by control-C. Let's stick a CHECK_FOR_INTERRUPTS there, along with sprinkling some stack depth checks. An actual fix for the excessive time consumption seems a bit risky to back-patch; but this isn't, so let's do so. Discussion: https://postgr.es/m/703c09a2-08f3-d2ec-b33d-dbecd62428b8@postgrespro.ru
* Remove dead codePeter Eisentraut2022-12-22
| | | | | The second appearance of NamespaceRelationId in this if-else chain is in error and can be removed.
* Add work-around for VA_ARGS_NARGS() on MSVC.Thomas Munro2022-12-22
| | | | | | | | | | | | | | The previous coding of VA_ARGS_NARGS() always returned 1 on Visual Studio, because it treats __VA_ARGS__ as a single token unless you jump through extra hoops. Newer compilers have an option to fix that. Add a comment about that so that we can remember to clean this up in the future when our minimum MSVC version advances. Author: Victor Spirin <v.spirin@postgrespro.ru> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/f450fc57-a147-19d0-e50c-33571c52cc13%40postgrespro.ru
* Fix operator typo in tablecmds.cMichael Paquier2022-12-22
| | | | | | | | | | | | | | A bitwise operator was getting used on two bools in ATAddCheckConstraint() to track if constraints should be merged or not with the existing ones of a relation, though obviously this should use a boolean OR operator. This led to the same result, but let's be clean. Oversight in 074c5cf. Author: Ranier Vilela Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/CAEudQAp2R2fbbi0OHHhv_n4=Ch0t1VtjObR9YMqtGKHJ+faUFQ@mail.gmail.com
* Add palloc_aligned() to allow aligned memory allocationsDavid Rowley2022-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces palloc_aligned() and MemoryContextAllocAligned() which allow callers to obtain memory which is allocated to the given size and also aligned to the specified alignment boundary. The alignment boundaries may be any power-of-2 value. Currently, the alignment is capped at 2^26, however, we don't expect values anything like that large. The primary expected use case is to align allocations to perhaps CPU cache line size or to maybe I/O page size. Certain use cases can benefit from having aligned memory by either having better performance or more predictable performance. The alignment is achieved by requesting 'alignto' additional bytes from the underlying allocator function and then aligning the address that is returned to the requested alignment. This obviously does waste some memory, so alignments should be kept as small as what is required. It's also important to note that these alignment bytes eat into the maximum allocation size. So something like: palloc_aligned(MaxAllocSize, 64, 0); will not work as we cannot request MaxAllocSize + 64 bytes. Additionally, because we're just requesting the requested size plus the alignment requirements from the given MemoryContext, if that context is the Slab allocator, then since slab can only provide chunks of the size that's specified when the slab context is created, then this is not going to work. Slab will generate an error to indicate that the requested size is not supported. The alignment that is requested in palloc_aligned() is stored along with the allocated memory. This allows the alignment to remain intact through repalloc() calls. Author: Andres Freund, David Rowley Reviewed-by: Maxim Orlov, Andres Freund, John Naylor Discussion: https://postgr.es/m/CAApHDvpxLPUMV1mhxs6g7GNwCP6Cs6hfnYQL5ffJQTuFAuxt8A%40mail.gmail.com
* Introduce float4in_internalAndrew Dunstan2022-12-21
| | | | | | | | | | | | | This is the guts of float4in, callable as a routine to input floats, which will be useful in an upcoming patch for allowing soft errors in the seg module's input function. A similar operation was performed some years ago for float8in in commit 50861cd683e. Reviewed by Tom Lane Discussion: https://postgr.es/m/cee4e426-d014-c0b7-aa22-a659f2cd9130@dunslane.net
* Fix newly introduced bug in slab.cDavid Rowley2022-12-22
| | | | | | | | | | | | | | | | | | | | d21ded75f changed the way slab.c works but introduced a bug that meant we could end up with the slab's curBlocklistIndex pointing to the wrong list. The condition which was checking for this was failing to account for two things: 1. The curBlocklistIndex could be 0 as we've currently got no non-full blocks to put chunks on. In this case, the dlist_is_empty() check cannot be performed as there can be any number of completely full blocks at that index. 2. The curBlocklistIndex may be greater than the index we just moved the block onto. Since we need to ensure we fill up fuller blocks first, we must reset curBlocklistIndex when changing any blocklist element that's less than the curBlocklistIndex too. Reported-by: Takamichi Osumi Discussion: https://postgr.es/m/TYCPR01MB8373329C6329768D7E093D68EDEB9@TYCPR01MB8373.jpnprd01.prod.outlook.com
* Make more consistent some translated strings related to compressionMichael Paquier2022-12-21
| | | | | | | | | | This commit changes some of the bbstreamer files and pg_dump to use the same style as a few other places (like common/compression.c), where the name of the compression method is not part of the string, but an argument of it. This reduces a bit the translation work with less string patterns. Discussion: https://postgr.es/m/Y5/5tdK+4n3clvtU@paquier.xyz
* Switch some system functions to use get_call_result_type()Michael Paquier2022-12-21
| | | | | | | | | | | | | | | | | This shaves some code by replacing the combinations of CreateTemplateTupleDesc()/TupleDescInitEntry() hardcoding a mapping of the attributes listed in pg_proc.dat by get_call_result_type() to build the TupleDesc needed for the rows generated. get_call_result_type() is more expensive than the former style, but this removes some duplication with the lists of OUT parameters (pg_proc.dat and the attributes hardcoded in these code paths). This is applied to functions that are not considered as critical (aka that could be called repeatedly for monitoring purposes). Author: Bharath Rupireddy Reviewed-by: Robert Haas, Álvaro Herrera, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CALj2ACV23HW5HP5hFjd89FNS-z5X8r2jNXdMXcpN2BgTtKd87w@mail.gmail.com
* Use existing SSL certs in LDAP tests instead of generating themAndrew Dunstan2022-12-20
| | | | | | | | The SSL test suite has a bunch of pre-existing certificates, so it's better simply to use what we already have than generate new certificates each time the LDAP tests are run. Discussion: https://postgr.es/m/bc305c7a-f390-44f2-2e82-9bcaec6108da@dunslane.net
* Add copyright notices to meson filesAndrew Dunstan2022-12-20
| | | | Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net
* Allow batching of inserts during cross-partition updates.Etsuro Fujita2022-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 927f453a9 disallowed batching added by commit b663a4136 to be used for the inserts performed as part of cross-partition updates of partitioned tables, mainly because the previous code in nodeModifyTable.c couldn't handle pending inserts into foreign-table partitions that are also UPDATE target partitions. But we don't have such a limitation anymore (cf. commit ffbb7e65a), so let's allow for this by removing from execPartition.c the restriction added by commit 927f453a9 that batching is only allowed if the query command type is CMD_INSERT. In postgres_fdw, since commit 86dc90056 changed it to effectively disable cross-partition updates in the case where a foreign-table partition chosen to insert rows into is also an UPDATE target partition, allow batching in the case where a foreign-table partition chosen to do so is *not* also an UPDATE target partition. This is enabled by the "batch_size" option added by commit b663a4136, which is disabled by default. This patch also adjusts the test case added by commit 927f453a9 to confirm that the inserts performed as part of a cross-partition update of a partitioned table indeed uses batching. Amit Langote, reviewed and/or tested by Georgios Kokolatos, Zhihong Yu, Bharath Rupireddy, Hou Zhijie, Vignesh C, and me. Discussion: http://postgr.es/m/CA%2BHiwqH1Lz1yJmPs%3DaD-pzd_HLLynLHvq5iYeT9mB0bBV7oJ6w%40mail.gmail.com
* Add enable_presorted_aggregate GUCDavid Rowley2022-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1349d279 added query planner support to allow more efficient execution of aggregate functions which have an ORDER BY or a DISTINCT clause. Prior to that commit, the planner would only request that the lower planner produce a plan with the order required for the GROUP BY clause and it would be left up to nodeAgg.c to perform the final sort of records within each group so that the aggregate transition functions were called in the correct order. Now that the planner requests the lower planner produce a plan with the GROUP BY and the ORDER BY / DISTINCT aggregates in mind, there is the possibility that the planner chooses a plan which could be less efficient than what would have been produced before 1349d279. While developing 1349d279, I had in mind that Incremental Sort would help us in cases where an index exists only on the GROUP BY column(s). Incremental Sort would just replace the implicit tuplesorts which are being performed in nodeAgg.c. However, because the planner has the flexibility to instead choose a plan which just performs a full sort on both the GROUP BY and ORDER BY / DISTINCT aggregate columns, there is potential for the planner to make a bad choice. The costing for Incremental Sort is not perfect as it assumes an even distribution of rows to sort within each sort group. Here we add an escape hatch in the form of the enable_presorted_aggregate GUC. This will allow users to get the pre-PG16 behavior in cases where they have no other means to convince the query planner to produce a plan which only sorts on the GROUP BY column(s). Discussion: https://postgr.es/m/CAApHDvr1Sm+g9hbv4REOVuvQKeDWXcKUAhmbK5K+dfun0s9CvA@mail.gmail.com
* Improve the performance of the slab memory allocatorDavid Rowley2022-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Slab has traditionally been fairly slow when compared with the AllocSet or Generation memory allocators. Part of this slowness came from having to write out an entire block when we allocate a new block in order to populate the free list indexes within the block's memory. Additional slowness came from having to move a block onto another dlist each time we palloc or pfree a chunk from it. Here we optimize both of those cases and do a little bit extra to improve the performance of the slab allocator. Here, instead of writing out the free list indexes when allocating a new block, we introduce the concept of "unused" chunks. When a block is first allocated all chunks are unused. These chunks only make it onto the free list when they are pfree'd. When allocating new chunks on an existing block, we have the choice of consuming a chunk from the free list or an unused chunk. When both exist, we opt to use one from the free list, as these have been used already and the memory of them is more likely to be cached by the CPU. Here we also reduce the number of block lists from there being one for every possible value of free chunks on a block to just having a small fixed number of block lists. We keep the 0th block list for completely full blocks and anything else stores blocks for some range of free chunks with fuller blocks appearing on lower block list array elements. This reduces how often we must move a block to another list when we allocate or free chunks, but still allows us to prefer to put new chunks on fuller blocks and perhaps allow blocks with fewer chunks to be free'd later once all their remaining chunks have been pfree'd. Additionally, we now store a list of "emptyblocks", which are blocks that no longer contain any allocated chunks. We now keep up to 10 of these around to avoid having to thrash malloc/free when allocation patterns continually cause blocks to become free of any allocated chunks only to allocate more chunks again. Now only once we have 10 of these, we free the block. This does raise the high water mark for the total memory that a slab context can consume. It does not seem entirely unreasonable that we might one day want to make this a property of SlabContext rather than a compile-time constant. Let's wait and see if there is any evidence to support that this is required before doing it. Author: Andres Freund, David Rowley Tested-by: Tomas Vondra, John Naylor Discussion: https://postgr.es/m/20210717194333.mr5io3zup3kxahfm@alap3.anarazel.de
* Move variable increment to the end of the loopJohn Naylor2022-12-20
| | | | | | | | | | This is less error prone and matches the placement of other code in the file. Justin Pryzby Reviewed by Tom Lane Discussion: https://www.postgresql.org/message-id/20221123172436.GJ11463@telsasoft.com
* Add pg_dissect_walfile_name()Michael Paquier2022-12-20
| | | | | | | | | | | | | | | | | | | This function takes in input a WAL segment name and returns a tuple made of the segment sequence number (dependent on the WAL segment size of the cluster) and its timeline, as of a thin SQL wrapper around the existing XLogFromFileName(). This function has multiple usages, like being able to compile a LSN from a file name and an offset, or finding the timeline of a segment without having to do to some maths based on the first eight characters of the segment. Bump catalog version. Author: Bharath Rupireddy Reviewed-by: Nathan Bossart, Kyotaro Horiguchi, Maxim Orlov, Michael Paquier Discussion: https://postgr.es/m/CALj2ACWV=FCddsxcGbVOA=cvPyMr75YCFbSQT6g4KDj=gcJK4g@mail.gmail.com
* Remove hardcoded dependency to cryptohash type in the internals of SCRAMMichael Paquier2022-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SCRAM_KEY_LEN was a variable used in the internal routines of SCRAM to size a set of fixed-sized arrays used in the SHA and HMAC computations during the SASL exchange or when building a SCRAM password. This had a hard dependency on SHA-256, reducing the flexibility of SCRAM when it comes to the addition of more hash methods. A second issue was that SHA-256 is assumed as the cryptohash method to use all the time. This commit renames SCRAM_KEY_LEN to a more generic SCRAM_KEY_MAX_LEN, which is used as the size of the buffers used by the internal routines of SCRAM. This is aimed at tracking centrally the maximum size necessary for all the hash methods supported by SCRAM. A global variable has the advantage of keeping the code in its simplest form, reducing the need of more alloc/free logic for all the buffers used in the hash calculations. A second change is that the key length (SHA digest length) and hash types are now tracked by the state data in the backend and the frontend, the common portions being extended to handle these as arguments by the internal routines of SCRAM. There are a few RFC proposals floating around to extend the SCRAM protocol, including some to use stronger cryptohash algorithms, so this lifts some of the existing restrictions in the code. The code in charge of parsing and building SCRAM secrets is extended to rely on the key length and on the cryptohash type used for the exchange, assuming currently that only SHA-256 is supported for the moment. Note that the mock authentication simply enforces SHA-256. Author: Michael Paquier Reviewed-by: Peter Eisentraut, Jonathan Katz Discussion: https://postgr.es/m/Y5k3Qiweo/1g9CG6@paquier.xyz
* Fix comment that was missing a word.Robert Haas2022-12-19
| | | | | | Ted Yu Discussion: http://postgr.es/m/CALte62wkFB05=RTWf7BL_6MfWs2=DY=ai-K7LWn_+0TJUuPJ2w@mail.gmail.com
* Fix typo in commentPeter Eisentraut2022-12-19
| | | | Author: Ted Yu <yuzhihong@gmail.com>
* Expose some information about backend subxact status.Robert Haas2022-12-19
| | | | | | | | | | | | | | | A new function pg_stat_get_backend_subxact() can be used to get information about the number of subtransactions in the cache of a particular backend and whether that cache has overflowed. This can be useful for tracking down performance problems that can result from overflowed snapshots. Dilip Kumar, reviewed by Zhihong Yu, Nikolay Samokhvalov, Justin Pryzby, Nathan Bossart, Ashutosh Sharma, Julien Rouhaud. Additional design comments from Andres Freund, Tom Lane, Bruce Momjian, and David G. Johnston. Discussion: http://postgr.es/m/CAFiTN-ut0uwkRJDQJeDPXpVyTWD46m3gt3JDToE02hTfONEN=Q@mail.gmail.com
* Fix bit-rotted planner test case.Tom Lane2022-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | While fooling with my pet outer-join-variables patch, I discovered that the test case I added in commit 11086f2f2 no longer demonstrates what it's supposed to. The idea is to tempt the planner to reverse the order of the two outer joins, which would leave noplace to correctly evaluate the WHERE clause that's inserted between them. Before the addition of the delay_upper_joins mechanism, it would have taken the bait. However, subsequent improvements broke the test in two different ways. First, we now recognize the IS NULL coding pattern as an antijoin, and we won't re-order antijoins; even if we did, the IS NULL test clauses get removed so there would be no opportunity for them to misbehave. Second, the planner now discovers that nested parameterized indexscans are a lot cheaper than the double hash join it used back in the day, and that approach doesn't want to re-order the joins anyway. Thus, in HEAD the test passes even if one dikes out delay_upper_joins. To fix, change the IS NULL tests to COALESCE clauses, which produce the same results but the planner isn't smart enough to convert them to antijoins. It'll still go for parameterized indexscans though, so drop the index enabling that (don't know why I added that in the first place), and disable nestloop joining just to be sure. This time around, add an EXPLAIN to make the choice of plan visible.
* Doc: update pg_list.h header comments to include XidLists.Tom Lane2022-12-17
| | | | | | I realize that the XidList infrastructure is rather incomplete, but failing to mention it in adjacent comments takes that a bit too far.
* Fix inability to reference CYCLE column from inside its CTE.Tom Lane2022-12-16
| | | | | | | | | | | | | | | Such references failed with "cache lookup failed for type 0" because we didn't resolve the type of the CYCLE column until after analyzing the CTE's query. We can just move that processing to before the recursive parse_sub_analyze call, though. While here, invent a couple of local variables to make this code less egregiously wider-than-80-columns. Per bug #17723 from Vik Fearing. Back-patch to v14 where the CYCLE feature was added. Discussion: https://postgr.es/m/17723-2c4985ff111e7bba@postgresql.org
* pg_upgrade: Make testing different transfer modes easierPeter Eisentraut2022-12-16
| | | | | | | | | | | The environment variable PG_TEST_PG_UPGRADE_MODE can be set to override the default transfer mode for the pg_upgrade tests. (Automatically running the pg_upgrade tests for all supported modes would be too slow.) Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/50a97009-8ff9-ca4d-a0f6-6086a6775a5b%40enterprisedb.com
* pg_upgrade: Add --copy optionPeter Eisentraut2022-12-16
| | | | | | | | | | | | This option selects the default transfer mode. Having an explicit option is handy to make scripts and tests more explicit. It also makes it easier to talk about a "copy" mode rather than "the default mode" or something like that, since until now the default mode didn't have an externally visible name. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/50a97009-8ff9-ca4d-a0f6-6086a6775a5b%40enterprisedb.com
* Clean up dubious error handling in wellformed_xml().Tom Lane2022-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | This ancient bit of code was summarily trapping any ereport longjmp whatsoever and assuming that it must represent an invalid-XML report. It's not really appropriate to handle OOM-like situations that way: maybe the input is valid or maybe not, but we couldn't find out. And it'd be a seriously bad idea to ignore, say, a query cancel error that way. (Perhaps that can't happen because there is no CHECK_FOR_INTERRUPTS anywhere within xml_parse, but even if that's true today it's obviously a very fragile assumption.) But in the wake of the previous commit, we can drop the PG_TRY here altogether, and use the soft error mechanism to catch only the kinds of errors that are legitimate to treat as invalid-XML. (This is our first use of the soft error mechanism for something not directly related to a datatype input function. It won't be the last.) xml_is_document can be converted in the same way. That one is not actively broken, because it was checking specifically for ERRCODE_INVALID_XML_DOCUMENT rather than trapping everything; but the code is still shorter and probably faster this way. Discussion: https://postgr.es/m/3564577.1671142683@sss.pgh.pa.us
* Convert xml_in to report errors softly.Tom Lane2022-12-16
| | | | | | | | | | | | | | | | The key idea here is that xml_parse must distinguish hard errors from soft errors. We want to throw a hard error for libxml initialization failures: those might be out-of-memory, or something else, but in any case they are not the fault of the input string. If we get to the point of parsing the input, and something goes wrong, we can fairly consider that to mean bad input. One thing that arguably does mean bad input, but I didn't trouble to handle softly, is encoding conversion failure while converting the server encoding to UTF8. This might be something to improve later, but it seems like a pretty low-probability scenario. Discussion: https://postgr.es/m/3564577.1671142683@sss.pgh.pa.us
* Fix typo in reference to __FreeBSD__.Thomas Munro2022-12-16
| | | | | | | | | | Commit a2a8acd152 introduced a platform-dependent mechanism to prevent developers from referencing errno in the argument list of elog()/ereport(), but didn't use the right macro to detect FreeBSD, so it didn't actually work there. Reported-by: Japin Li <japinli@hotmail.com> Discussion: https://postgr.es/m/MEYP282MB16693AAEEF84F47D8F7CA007B6E69%40MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM