aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* Tidy up after incremental JSON parser patchAndrew Dunstan2024-04-04
| | | | | | Remove junk left over from non-vpath builds. Try to remedy gettext error on some platforms.
* Fix warnings re typedef redefinition in ea7b4e9a2a and 3311ea86edAndrew Dunstan2024-04-04
| | | | Per gripe from Tom Lane and the buildfarm
* Add missing initialization in transformJsonFuncExpr()Amit Langote2024-04-04
| | | | | | | | de3600452b added some code for the new JSON_TABLE_OP to that function but missed to initialize the default_format variable. Reported-by: Erik Rijkers <er@xs4all.nl> Discussion: https://postgr.es/m/254b2fa2-2f6b-a30a-20ee-21f8a2c12a50@xs4all.nl
* Fix typo introduced in 6185c9737Amit Langote2024-04-04
| | | | | Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxGHiU0p0usjh5hnR0_ByZn4tq1FC3eKAtrQgJeKU6W9kw@mail.gmail.com
* Add basic JSON_TABLE() functionalityAmit Langote2024-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | JSON_TABLE() allows JSON data to be converted into a relational view and thus used, for example, in a FROM clause, like other tabular data. Data to show in the view is selected from a source JSON object using a JSON path expression to get a sequence of JSON objects that's called a "row pattern", which becomes the source to compute the SQL/JSON values that populate the view's output columns. Column values themselves are computed using JSON path expressions applied to each of the JSON objects comprising the "row pattern", for which the SQL/JSON query functions added in 6185c9737cf4 are used. To implement JSON_TABLE() as a table function, this augments the TableFunc and TableFuncScanState nodes that are currently used to support XMLTABLE() with some JSON_TABLE()-specific fields. Note that the JSON_TABLE() spec includes NESTED COLUMNS and PLAN clauses, which are required to provide more flexibility to extract data out of nested JSON objects, but they are not implemented here to keep this commit of manageable size. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewers have included (in no particular order): Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
* pg_upgrade: Fix typo in messagePeter Eisentraut2024-04-04
|
* Use incremental parsing of backup manifests.Andrew Dunstan2024-04-04
| | | | | | | | | | | | This changes the three callers to json_parse_manifest() to use json_parse_manifest_incremental_chunk() if appropriate. In the case of the backend caller, since we don't know the size of the manifest in advance we always call the incremental parser. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net
* Add support for incrementally parsing backup manifestsAndrew Dunstan2024-04-04
| | | | | | | | | | | | This adds the infrastructure for using the new non-recursive JSON parser in processing manifests. It's important that callers make sure that the last piece of json handed to the incremental manifest parser contains the entire last few lines of the manifest, including the checksum. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net
* Introduce a non-recursive JSON parserAndrew Dunstan2024-04-04
| | | | | | | | | | | | | | | | | | | | | | | This parser uses an explicit prediction stack, unlike the present recursive descent parser where the parser state is represented on the call stack. This difference makes the new parser suitable for use in incremental parsing of huge JSON documents that cannot be conveniently handled piece-wise by the recursive descent parser. One potential use for this will be in parsing large backup manifests associated with incremental backups. Because this parser is somewhat slower than the recursive descent parser, it is not replacing that parser, but is an additional parser available to callers. For testing purposes, if the build is done with -DFORCE_JSON_PSTACK, all JSON parsing is done with the non-recursive parser, in which case only trivial regression differences in error messages should be observed. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net
* Secondary refactor of heap scanning functionsDavid Rowley2024-04-04
| | | | | | | | Similar to 44086b097, refactor heap scanning functions to be more suitable for the read stream API. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com
* Coordinate emit_log_hook and all log destinations to share the same timevalMichael Paquier2024-04-04
| | | | | | | | | | | | | | | | | | This would cause the timestamp values used by emit_log_hook and all the other log destinations to differ, because the timestamps are reset before sending the logs to the server and after calling the hook. This change matters for emit_log_hook when generating log information with 'n' or 'm' in log_line_prefix through log_status_format(), or when doing direct calls to get_formatted_log_time() like in the JSON or CSV log formats. While on it, this commit fixes a couple of comments related to the formatted timestamps where the JSON was not mentioned. Oversight in dc686681e079, that I have noticed while reviewing this patch. Author: Kambam Vinay, Michael Paquier Discussion: https://postgr.es/m/CANiRfmsK36A0i8mnQtzaxhSm3CUCimPwJPp4WQNq53OdSNkgWg@mail.gmail.com
* Preliminary refactor of heap scanning functionsDavid Rowley2024-04-04
| | | | | | | | | | | | | | | | | | | | | | | | To allow the use of the read stream API added in b5a9b18cd for sequential scans on heap tables, here we make some adjustments to make that change less invasive and perhaps make the code easier to follow in the process. Here heapgetpage() gets broken into two functions: 1) The part which reads the block has now been moved into a function named heapfetchbuf(). 2) The part which performed pruning and populated the scan's rs_vistuples[] array is now moved into a new function named heap_prepare_pagescan(). The functionality provided by heap_prepare_pagescan() was only ever required by SO_ALLOW_PAGEMODE scans, so the branching that was previously done in heapgetpage() is no longer needed as we simply just don't call heap_prepare_pagescan() from heapgettup() in the refactored code. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com
* pg_regress: Save errno in emit_tap_output_v() and switch to %mMichael Paquier2024-04-04
| | | | | | | | | | | | | | | | | | emit_tap_output_v() includes some fprintf() calls for some output related to the TAP protocol, that may clobber errno and break %m. This commit makes the logging of pg_regress smarter by saving errno before restoring it in vfprintf() where the input strings are used, removing the need for strerror(). All logs are switched to %m rather than strerror(), shaving some code. This was not a problem until now as pg_regress.c did not use %m, but the change is simple enough that we have no reason to not support this placeholder, and that will avoid future mistakes if new logs that include %m are added. Author: Dagfinn Ilmari Mannsåker Reviewed-by: Peter Eisentraunt, Michael Paquier Discussion: https://postgr.es/m/87sf13jhuw.fsf@wibble.ilmari.org
* CREATE INDEX: do not update stats during binary upgrade.Jeff Davis2024-04-03
| | | | | | | | | | | During binary upgrade, indexes are created before the data is moved into place, so it will always be zero. This is not currently a major problem, but will be when we try to preserve statistics during upgrade. Author: Corey Huinker Discussion: https://postgr.es/m/CADkLM=daPdFB8V0tgFxK-dLowFsAEzWRWJHyxij7BG3kBjcouA@mail.gmail.com
* Invent SERIALIZE option for EXPLAIN.Tom Lane2024-04-03
| | | | | | | | | | | | | | | | EXPLAIN (ANALYZE, SERIALIZE) allows collection of statistics about the volume of data emitted by a query, as well as the time taken to convert the data to the on-the-wire format. Previously there was no way to investigate this without actually sending the data to the client, in which case network transmission costs might swamp what you wanted to see. In particular this feature allows investigating the costs of de-TOASTing compressed or out-of-line data during formatting. Stepan Rutz and Matthias van de Meent, reviewed by Tomas Vondra and myself Discussion: https://postgr.es/m/ca0adb0e-fa4e-c37e-1cd7-91170b18cae1@gmx.de
* Fix the parameters order for TableAmRoutine.relation_copy_for_cluster()Alexander Korotkov2024-04-04
| | | | | | | | | | | | | Specify OldTable first, NewTable second as used by table_relation_copy_for_cluster() and as implemented in heapam_relation_copy_for_cluster(). Backpatch to PostgreSQL 12, where TableAmRoutine was introduced. Discussion: https://postgr.es/m/ME3P282MB3166860D4911AE82F92DF7C5B63F2%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM Author: Japin Li Reviewed-by: Pavel Borisov Backpatch-through: 12
* Split XLogCtl->LogwrtResult into separate struct membersAlvaro Herrera2024-04-03
| | | | | | | | | | | | | | | | | | | After this change we have XLogCtl->logWriteResult and ->logFlushResult. There's no functional change, other than the fact that the assignment from shared memory to local is no longer done via struct assignment, but instead using a macro that copies each member separately. The current representation is inconvenient going forward; notably, we would like to add a new member "Copy" (to keep track of the last position copied into WAL buffers), so the symmetry between the values in shared memory vs. those in local would be lost. This also gives us freedom to later change the concurrency model for the values in shared memory: we can make them use atomics instead of relying on the info_lck spinlock. Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/202404031119.cd2kugjk2vho@alvherre.pgsql
* Inline pg_popcount() for small buffers.Nathan Bossart2024-04-03
| | | | | | | | | | | | | | If there aren't many bytes to process, the function call overhead of the optimized implementation isn't worth taking, so instead we inline a loop that consults pg_number_of_ones in that case. If there are many bytes to process, we accept the function call overhead because the optimized versions are likely to be faster. The threshold at which we use the optimized implementation is set to the smallest amount of data required to use special popcount instructions. Reviewed-by: Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/20240402155301.GA2750455%40nathanxps13
* Combine freezing and pruning steps in VACUUMHeikki Linnakangas2024-04-03
| | | | | | | | | | | | | | | | | | | | | | | | Execute both freezing and pruning of tuples in the same heap_page_prune() function, now called heap_page_prune_and_freeze(), and emit a single WAL record containing all changes. That reduces the overall amount of WAL generated. This moves the freezing logic from vacuumlazy.c to the heap_page_prune_and_freeze() function. The main difference in the coding is that in vacuumlazy.c, we looked at the tuples after the pruning had already happened, but in heap_page_prune_and_freeze() we operate on the tuples before pruning. The heap_prepare_freeze_tuple() function is now invoked after we have determined that a tuple is not going to be pruned away. VACUUM no longer needs to loop through the items on the page after pruning. heap_page_prune_and_freeze() does all the work. It now returns the list of dead offsets, including existing LP_DEAD items, to the caller. Similarly it's now responsible for tracking 'all_visible', 'all_frozen', and 'hastup' on the caller's behalf. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
* Refactor how heap_prune_chain() updates prunable_xidHeikki Linnakangas2024-04-03
| | | | | | | | | | | | | | | In preparation of freezing and counting tuples which are not candidates for pruning, split heap_prune_record_unchanged() into multiple functions, depending the kind of line pointer. That's not too interesting right now, but makes the next commit smaller. Recording the lowest soon-to-be prunable xid is one of the actions we take for unchanged LP_NORMAL item pointers but not for others, so move that to the new heap_prune_record_unchanged_lp_normal() function. The next commit will add more actions to these functions. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
* Fix zeroing of pg_serial page without SLRU bank lockAlvaro Herrera2024-04-03
| | | | | | | | | | | | | Bug in commit 53c2a97a9266: we failed to acquire the correct SLRU bank lock when iterating to zero-out intermediate pages in predicate.c. Rewrite the code block so that we follow the locking protocol correctly. Also update an outdated comment in the same file -- SerialSLRULock exists no more. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/2a25eaf4-a3a4-5fd1-6241-9d7c73142085@gmail.com
* Use the pairing heap instead of a flat array for LSN replay waitersAlexander Korotkov2024-04-03
| | | | | | | | | | | | | | 06c418e163 introduced pg_wal_replay_wait() procedure allowing to wait for the particular LSN to be replayed on standby. The waiters were stored in the flat array. Even though scanning small arrays is fast, that might be a problem at scale (a lot of waiting processes). This commit replaces the flat shared memory array with the pairing heap, which holds the waiter with the least LSN at the top. This gives us O(log N) complexity for both inserting and removing waiters. Reported-by: Alvaro Herrera Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql
* Drop global objects after completed testDaniel Gustafsson2024-04-03
| | | | | | | | | Project policy is to not leave global objects behind after a regress test run. This was found as a result of the development of a patch to make pg_regress detect such leftovers automatically, which in the end was withdrawn due to issues with parallel runs. Discussion: https://postgr.es/m/E1phvk7-000VAH-7k@gemulon.postgresql.org
* Ensure that the sync slots reach a consistent state after promotion without ↵Amit Kapila2024-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | losing data. We were directly copying the LSN locations while syncing the slots on the standby. Now, it is possible that at some particular restart_lsn there are some running xacts, which means if we start reading the WAL from that location after promotion, we won't reach a consistent snapshot state at that point. However, on the primary, we would have already been in a consistent snapshot state at that restart_lsn so we would have just serialized the existing snapshot. To avoid this problem we will use the advance_slot functionality unless the snapshot already exists at the synced restart_lsn location. This will help us to ensure that snapbuilder/slot statuses are updated properly without generating any changes. Note that the synced slot will remain as RS_TEMPORARY till the decoding from corresponding restart_lsn can reach a consistent snapshot state after which they will be marked as RS_PERSISTENT. Per buildfarm Author: Hou Zhijie Reviewed-by: Bertrand Drouvot, Shveta Malik, Bharath Rupireddy, Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB5716B3942AE49F3F725ACA92943B2@OS0PR01MB5716.jpnprd01.prod.outlook.com
* Minor improvements for waitlsn.cAlexander Korotkov2024-04-03
| | | | | | | | | | * Remove extra includes * Fill 'cur' in addLSNWaiter() before taking the spinlock * Initialize 'endtime' with zero in WaitForLSN() to avoid compiler warning Reported-by: Alvaro Herrera, Masahiko Sawada, Daniel Gustafsson Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql Discussion: https://postgr.es/m/CAD21AoAx7irptnPH1OkkkNh9E0M6X-phfX7sYZfwoMsc1qV1sQ%40mail.gmail.com
* Fix indentation from cafe1056558fDaniel Gustafsson2024-04-03
| | | | Per buildfarm animal koel
* Add error codes to some PANIC/FATAL errors reportsDaniel Gustafsson2024-04-03
| | | | | | | | | | | This adds errcodes to a set of PANIC and FATAL errors in xlog.c and relcache.c, which previously had no errcode at all set, in order to make fleetwide analysis of errorlogs easier. There are many more ereport/elogs left which could benefit from having an errcode but this at least makes a dent in the issue. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CAN55FZ1k8LgLEqncPGmz_fWnrobV6bjABOTH4tOWta6xNcPQig@mail.gmail.com
* Add built-in ERROR handling for archive callbacks.Nathan Bossart2024-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | Presently, the archiver process restarts when an archive callback ERRORs. To avoid this, archive module authors can use sigsetjmp(), manage a memory context, etc., but that requires a lot of extra code that will likely look roughly the same between modules. This commit adds basic archive callback ERROR handling to pgarch.c so that module authors won't ordinarily need to worry about this. While this built-in handler attempts to clean up anything that an archive module could conceivably have left behind, it is possible that some modules are doing unexpected things that require additional cleanup. Module authors should be sure to do any extra required cleanup in a PG_CATCH block within the archiving callback. The archiving callback is now called in a short-lived memory context that the archiver process resets between invocations. If a module requires longer-lived storage, it must maintain its own memory context. Thanks to these changes, the basic_archive module can be greatly simplified. Suggested-by: Andres Freund Reviewed-by: Andres Freund, Yong Li Discussion: https://postgr.es/m/20230217215624.GA3131134%40nathanxps13
* Improve eviction algorithm in ReorderBuffer using max-heap for many ↵Masahiko Sawada2024-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | subtransactions. Previously, when selecting the transaction to evict during logical decoding, we check all transactions to find the largest transaction. This could lead to a significant replication lag especially in the case where there are many subtransactions. This commit improves the eviction algorithm in ReorderBuffer using the max-heap with transaction size as the key to efficiently find the largest transaction. The max-heap starts with empty. While the max-heap is empty, we don't do anything for the max-heap when updating the memory counter. Therefore, we get the largest transaction in O(N) time, where N is the number of transactions including top-level transactions and subtransactions. We build the max-heap just before selecting the largest transactions if the number of transactions being decoded is higher than the threshold, MAX_HEAP_TXN_COUNT_THRESHOLD. After building the max-heap, we also update the max-heap when updating the memory counter. The intention is to efficiently find the largest transaction in O(1) time instead of incurring the cost of memory counter updates (O(log N)). Once the number of transactions got lower than the threshold, we reset the max-heap. The performance benchmark results showed significant speed up (more than x30 speed up on my machine) in decoding a transaction with 100k subtransactions, whereas there is no visible overhead in other cases. Reviewed-by: Amit Kapila, Hayato Kuroda, Vignesh C, Ajin Cherian, Tomas Vondra, Shubham Khanna, Peter Smith, Álvaro Herrera, Euler Taveira Discussion: https://postgr.es/m/CAD21AoAfKTgrBrLq96GcTv9d6k97zaQcDM-rxfKEt4GSe0qnaQ%40mail.gmail.com
* Don't adjust ressortgroupref in generate_setop_child_grouplist()David Rowley2024-04-03
| | | | | | | | | | This is already done inside assignSortGroupRef(), therefore is redundant. Oversight from 66c0185a3. Reported-by: Tom Lane Discussion: https://postgr.es/m/3703023.1711654574@sss.pgh.pa.us
* Add functions to binaryheap for efficient key removal and update.Masahiko Sawada2024-04-03
| | | | | | | | | | | | | | | | | | | | | | | Previously, binaryheap didn't support updating a key and removing a node in an efficient way. For example, in order to remove a node from the binaryheap, the caller had to pass the node's position within the array that the binaryheap internally has. Removing a node from the binaryheap is done in O(log n) but searching for the key's position is done in O(n). This commit adds a hash table to binaryheap in order to track the position of each nodes in the binaryheap. That way, by using newly added functions such as binaryheap_update_up() etc., both updating a key and removing a node can be done in O(1) on an average and O(log n) in worst case. This is known as the indexed binary heap. The caller can specify to use the indexed binaryheap by passing indexed = true. The current code does not use the new indexing logic, but it will be used by an upcoming patch. Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian, Tomas Vondra, Shubham Khanna Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com
* Make binaryheap enlargeable.Masahiko Sawada2024-04-03
| | | | | | | | | The node array space of the binaryheap is doubled when there is no available space. Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian, Tomas Vondra, Shubham Khanna Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com
* Move WaitLSNShmemInit() to CreateOrAttachShmemStructs()Alexander Korotkov2024-04-03
| | | | | | | Thanks to Andres Freund, Thomas Munrom and David Rowley for investigating this issue. Discussion: https://postgr.es/m/CAPpHfdvap5mMLikt8CUjA0osAvCJHT0qnYeR3f84EJ_Kvse0mg%40mail.gmail.com
* Don't zero tuple_fraction when planning UNIONs with ORDER BYsDavid Rowley2024-04-03
| | | | | | | | | | | | | | | | | Since 66c0185a3, the planner is able to use Merge Append -> Unique to implement UNION queries and each subquery is prompted to produce Paths correctly sorted by the UNION's targetlist. Here we remove some now redundant code which was zeroing the tuple_fraction at the parent level. This will allow the planner to consider cheap startup paths when planning the UNION's subqueries. EXCEPT and INTERSECT set operations still have the tuple_fraction zeroed in generate_nonunion_paths(). These operations currently always read all of their subqueries' tuples. Reported-by: Tom Lane Discussion: https://postgr.es/m/3703023.1711654574@sss.pgh.pa.us
* Implement pg_wal_replay_wait() stored procedureAlexander Korotkov2024-04-02
| | | | | | | | | | | | | | | | | | | | | | | | pg_wal_replay_wait() is to be used on standby and specifies waiting for the specific WAL location to be replayed before starting the transaction. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes on standby. The queue of waiters is stored in the shared memory array sorted by LSN. During replay of WAL waiters whose LSNs are already replayed are deleted from the shared memory array and woken up by setting of their latches. pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records implying a kind of self-deadlock. This is why it is only possible to implement pg_wal_replay_wait() as a procedure working in a non-atomic context, not a function. Catversion is bumped. Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru Author: Kartyshov Ivan, Alexander Korotkov Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
* Avoid deadlock during orphan temp table removal.Tom Lane2024-04-02
| | | | | | | | | | | | | | | | | | | | | | | If temp tables have dependencies (such as sequences) then it's possible for autovacuum's cleanup of orphan temp tables to deadlock against an incoming backend that's trying to clean out the temp namespace for its own use. That can happen because RemoveTempRelations' performDeletion call can visit objects within the namespace in an order different from the order in which a per-table deletion will visit them. To fix, observe that performDeletion will begin by taking an exclusive lock on the temp namespace (even though it won't actually delete it). So, if we can get a shared lock on the namespace, we can be sure we're not running concurrently with RemoveTempRelations, while also not conflicting with ordinary use of the namespace. This requires introducing a conditional version of LockDatabaseObject, but that's no big deal. (It's surprising we've got along without that this long.) Report and patch by Mikhail Zhilin. Back-patch to all supported branches. Discussion: https://postgr.es/m/c43ce028-2bc2-4865-9b89-3f706246eed5@postgrespro.ru
* Avoid function call overhead of pg_popcount() in syslogger.c.Nathan Bossart2024-04-02
| | | | | | | Instead of calling the pg_popcount() function for a single byte, we can look up the value in the pg_number_of_ones array. Discussion: https://postgr.es/m/20240401221117.GB2362108%40nathanxps13
* Refactor code for setting pg_popcount* function pointers.Nathan Bossart2024-04-02
| | | | | | | | | | | Presently, there are three copies of this code, and a proposed follow-up patch would add more code to each copy. This commit introduces a new inline function for this code and makes use of it in the pg_popcount*_choose functions, thereby reducing code duplication. Author: Paul Amonson Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
* Unwind #if spaghetti in hmac_openssl.c a bit.Tom Lane2024-04-02
| | | | | | | | | | | | | | | Make this code a little less confusing by defining a separate macro that controls whether we'll use ResourceOwner facilities to track the existence of a pg_hmac_ctx context. The proximate reason to touch this is that since b8bff07da, we got "unused function" warnings if building with older OpenSSL, because the #if guards around the ResourceOwner wrapper function definitions were different from those around the calls of those functions. Pulling the ResourceOwner machinations outside of the #ifdef HAVE_xxx guards fixes that and makes the code clearer too. Discussion: https://postgr.es/m/1394271.1712016101@sss.pgh.pa.us
* Allow SIGINT to cancel psql database reconnections.Robert Haas2024-04-02
| | | | | | | | | | | | | | | After installing the SIGINT handler in psql, SIGINT can no longer cancel database reconnections. For instance, if the user starts a reconnection and then needs to do some form of interaction (ie psql is polling), there is no way to cancel the reconnection process currently. Use PQconnectStartParams() in order to insert a cancel_pressed check into the polling loop. Tristan Partin, reviewed by Gurjeet Singh, Heikki Linnakangas, Jelte Fennema-Nio, and me. Discussion: http://postgr.es/m/D08WWCPVHKHN.3QELIKZJ2D9RZ@neon.tech
* Expose PQsocketPoll via libpqRobert Haas2024-04-02
| | | | | | | | | | | This is useful when connecting to a database asynchronously via PQconnectStart(), since it handles deciding between poll() and select(), and some of the required boilerplate. Tristan Partin, reviewed by Gurjeet Singh, Heikki Linnakangas, Jelte Fennema-Nio, and me. Discussion: http://postgr.es/m/D08WWCPVHKHN.3QELIKZJ2D9RZ@neon.tech
* Provide API for streaming relation data.Thomas Munro2024-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce an abstraction allowing relation data to be accessed as a stream of buffers, with an implementation that is more efficient than the equivalent sequence of ReadBuffer() calls. Client code supplies a callback that can say which block number it wants next, and then consumes individual buffers one at a time from the stream. This division puts read_stream.c in control of how far ahead it can see and allows it to read clusters of neighboring blocks with StartReadBuffers(). It also issues POSIX_FADV_WILLNEED advice ahead of time when random access is detected. Other variants of I/O stream will be proposed in future work (for example to support recovery, whose LsnReadQueue device in xlogprefetcher.c is a distant cousin of this code and should eventually be replaced by this), but this basic API is sufficient for many common executor usage patterns involving predictable access to a single fork of a single relation. Several patches using this API are proposed separately. This stream concept is loosely based on ideas from Andres Freund on how we should pave the way for later work on asynchronous I/O. Author: Thomas Munro <thomas.munro@gmail.com> Author: Heikki Linnakangas <hlinnaka@iki.fi> (contributions) Author: Melanie Plageman <melanieplageman@gmail.com> (contributions) Suggested-by: Andres Freund <andres@anarazel.de> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Tested-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
* Provide vectored variant of ReadBuffer().Thomas Munro2024-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Break ReadBuffer() up into two steps. StartReadBuffers() and WaitReadBuffers() give us two main advantages: 1. Multiple consecutive blocks can be read with one system call. 2. Advice (hints of future reads) can optionally be issued to the kernel ahead of time. The traditional ReadBuffer() function is now implemented in terms of those functions, to avoid duplication. A new GUC io_combine_limit is defined, and the functions for limiting per-backend pin counts are made into public APIs. Those are provided for use by callers of StartReadBuffers(), when deciding how many buffers to read at once. The following commit will add a higher level mechanism for doing that automatically with a practical interface. With some more infrastructure in later work, StartReadBuffers() could be extended to start real asynchronous I/O instead of just issuing advice and leaving WaitReadBuffers() to do the work synchronously. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> (some optimization tweaks) Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Tested-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
* Don't use the pg_am system catalog in new testAlvaro Herrera2024-04-02
| | | | | | | This causes deadlocks because it's a highly trafficked catalog. Use a regular table created by the same test instead. Discussion: https://postgr.es/m/f3e61e27-19d0-5e40-3eb2-53282fa0532a@gmail.com
* Revert "Custom reloptions for table AM"Alexander Korotkov2024-04-02
| | | | | | | | This reverts commit c95c25f9af4bc77f2f66a587735c50da08c12b37 due to multiple design issues spotted after commit. Reported-by: Jeff Davis Discussion: https://postgr.es/m/11550b536211d5748bb2865ed6cb3502ff073bf7.camel%40j-davis.com
* Use TidStore for dead tuple TIDs storage during lazy vacuum.Masahiko Sawada2024-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we used a simple array for storing dead tuple IDs during lazy vacuum, which had a number of problems: * The array used a single allocation and so was limited to 1GB. * The allocation was pessimistically sized according to table size. * Lookup with binary search was slow because of poor CPU cache and branch prediction behavior. This commit replaces that array with the TID store from commit 30e144287a. Since the backing radix tree makes small allocations as needed, the 1GB limit is now gone. Further, the total memory used is now often smaller by an order of magnitude or more, depending on the distribution of blocks and offsets. These two features should make multiple rounds of heap scanning and index cleanup an extremely rare event. TID lookup during index cleanup is also several times faster, even more so when index order is correlated with heap tuple order. Since there is no longer a predictable relationship between the number of dead tuples vacuumed and the space taken up by their TIDs, the number of tuples no longer provides any meaningful insights for users, nor is the maximum number predictable. For that reason this commit also changes to byte-based progress reporting, with the relevant columns of pg_stat_progress_vacuum renamed accordingly to max_dead_tuple_bytes and dead_tuple_bytes. For parallel vacuum, both the TID store and supplemental information specific to vacuum are shared among the parallel vacuum workers. As with the previous array, we don't take any locks on TidStore during parallel vacuum since writes are still only done by the leader process. Bump catalog version. Reviewed-by: John Naylor, (in an earlier version) Dilip Kumar Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com
* Fix assert failure when planning setop subqueries with CTEsDavid Rowley2024-04-02
| | | | | | | | | | | | | | | | | | | | | 66c0185a3 adjusted the UNION planner to request that union child queries produce Paths correctly ordered to implement the UNION by way of MergeAppend followed by Unique. The code there made a bad assumption that if the root->parent_root->parse had setOperations set that the query must be the child subquery of a set operation. That's not true when it comes to planning a non-inlined CTE which is parented by a set operation. This causes issues as the CTE's targetlist has no requirement to match up to the SetOperationStmt's groupClauses Fix this by adding a new parameter to both subquery_planner() and grouping_planner() to explicitly pass the SetOperationStmt only when planning set operation child subqueries. Thank you to Tom Lane for helping to rationalize the decision on the best function signature for subquery_planner(). Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/242fc7c6-a8aa-2daf-ac4c-0a231e2619c1@gmail.com
* Avoid "unused variable" warning on non-USE_SSL_ENGINE platforms.Tom Lane2024-04-01
| | | | | | | | | | | If we are building with openssl but USE_SSL_ENGINE didn't get set, initialize_SSL's variable "pkey" is declared but used nowhere. Apparently this combination hasn't been exercised in the buildfarm before now, because I've not seen this warning before, even though the code has been like this a long time. Move the declaration to silence the warning (and remove its useless initialization). Per buildfarm member sawshark. Back-patch to all supported branches.
* Introduce 'options' argument to heap_page_prune()Heikki Linnakangas2024-04-02
| | | | | | | | | | | | Currently there is only one option, HEAP_PAGE_PRUNE_MARK_UNUSED_NOW which replaces the old boolean argument, but upcoming patches will introduce at least one more. Having a lot of boolean arguments makes it hard to see at the call sites what the arguments mean, so prefer a bitmask of options with human-readable names. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Discussion: https://www.postgresql.org/message-id/20240401172219.fngjosaqdgqqvg4e@liskov
* Invent --transaction-size option for pg_restore.Tom Lane2024-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows pg_restore to wrap its commands into transaction blocks, somewhat like --single-transaction, except that we commit and start a new block after every N objects. Using this mode with a size limit of 1000 or so objects greatly reduces the number of transactions consumed by the restore, while preventing any one transaction from taking enough locks to overrun the receiving server's shared lock table. (A value of 1000 works well with the default lock table size of around 6400 locks. Higher --transaction-size values can be used if one has increased the receiving server's lock table size.) Excessive consumption of XIDs has been reported as a problem for pg_upgrade in particular, but it could be bad for any restore; and the change also reduces the number of fsyncs and amount of WAL generated, so it should provide speed benefits too. This patch does not try to make parallel workers batch the SQL commands they issue. The trouble with doing that is that other workers may need to see the objects a worker creates right away. Possibly this can be improved later. In this patch I have hard-wired pg_upgrade to use a transaction size of 1000 divided by the number of parallel restore jobs allowed (without that, we'd still be at risk of overrunning the shared lock table). Perhaps there would be value in adding another pg_upgrade option to allow user control of that, but I'm unsure that it's worth the trouble; I think few users would use it, and any who did would see not that much benefit compared to the default. Patch by me, but the original idea to batch SQL commands during restore is due to Robins Tharakan. Discussion: https://postgr.es/m/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com