aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Remove unused PruneState member relHeikki Linnakangas2024-03-20
| | | | | | | PruneState->rel is no longer being used, so just remove it. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240320013602.6sypr4cx6sefpemg@liskov
* Reorganize heap_page_prune() function commentHeikki Linnakangas2024-03-20
| | | | | | | | heap_page_prune()'s function header comment didn't explain the parameters in the same order they appear in the function. Fix that. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240320013602.6sypr4cx6sefpemg@liskov
* Separate equalRowTypes() from equalTupleDescs()Peter Eisentraut2024-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | This introduces a new function equalRowTypes() that is effectively a subset of equalTupleDescs() but only compares the number of attributes and attribute name, type, typmod, and collation. This is enough for most existing uses of equalTupleDescs(), which are changed to use the new function. The only remaining callers of equalTupleDescs() are those that really want to check the full tuple descriptor as such, without concern about record or row or record type semantics. The existing function hashTupleDesc() is renamed to hashRowType(), because it now corresponds more to equalRowTypes(). The purpose of this change is to be clearer about the semantics of the equality asked for by each caller. (At least one caller had a comment that questioned whether equalTupleDescs() was too restrictive.) For example, 4f622503d6d removed attstattarget from the tuple descriptor structure. It was not fully clear at the time how this should affect equalTupleDescs(). Now the answer is clear: By their own definitions, equalRowTypes() does not care, and equalTupleDescs() just compares whatever is in the tuple descriptor but does not care why it is in there. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/f656d6d9-6660-4518-a006-2f65cafbebd1%40eisentraut.org
* Remove redundant snapshot copying from parallel leader to workersHeikki Linnakangas2024-03-14
| | | | | | | | | | | | | | | | | | | | | | | | The parallel query infrastructure copies the leader backend's active snapshot to the worker processes. But BitmapHeapScan node also had bespoken code to pass the snapshot from leader to the worker. That was redundant, so remove it. The removed code was analogous to the snapshot serialization in table_parallelscan_initialize(), but that was the wrong role model. A parallel bitmap heap scan is more like an independent non-parallel bitmap heap scan in each parallel worker as far as the table AM is concerned, because the coordination is done in nodeBitmapHeapscan.c, and the table AM doesn't need to know anything about it. This relies on the assumption that es_snapshot == GetActiveSnapshot(). That's not a new assumption, things would get weird if you used the QueryDesc's snapshot for visibility checks in the scans, but the active snapshot for evaluating quals, for example. This could use some refactoring and cleanup, but for now, just add some assertions. Reviewed-by: Dilip Kumar, Robert Haas Discussion: https://www.postgresql.org/message-id/5f3b9d59-0f43-419d-80ca-6d04c07cf61a@iki.fi
* Make the order of the header file includes consistentPeter Eisentraut2024-03-13
| | | | | | | | Similar to commit 7e735035f20. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-WhpCFMbXCjtJ%2BFzmjfPrp7Hw1pk4p%2BZpU95Kh3ofZ1A%40mail.gmail.com
* Update obsolete index scan TID comments.Peter Geoghegan2024-03-11
| | | | Oversight in commit c2fe139c20.
* Remove unneeded vacuum_delay_point from heap_vac_scan_get_next_blockHeikki Linnakangas2024-03-11
| | | | | | | | | | heap_vac_scan_get_next_block() does relatively little work, so there is no need to call vacuum_delay_point(). A future commit will call heap_vac_scan_get_next_block() from a callback, and we would like to avoid calling vacuum_delay_point() in that callback. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_Yf3gvXGcCnqqfoq0Q8LX8UM-e-qbm_B1LeZh60f8WhWA%40mail.gmail.com
* Confine vacuum skip logic to lazy_scan_skip()Heikki Linnakangas2024-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename lazy_scan_skip() to heap_vac_scan_next_block() and move more code into the function, so that the caller doesn't need to know about ranges or skipping anymore. heap_vac_scan_next_block() returns the next block to process, and the logic for determining that block is all within the function. This makes the skipping logic easier to understand, as it's all in the same function, and makes the calling code easier to understand as it's less cluttered. The state variables needed to manage the skipping logic are moved to LVRelState. heap_vac_scan_next_block() now manages its own VM buffer separately from the caller's vmbuffer variable. The caller's vmbuffer holds the VM page for the current block its processing, while heap_vac_scan_next_block() keeps a pin on the VM page for the next unskippable block. Most of the time they are the same, so we hold two pins on the same buffer, but it's more convenient to manage them separately. For readability inside heap_vac_scan_next_block(), move the logic of finding the next unskippable block to separate function, and add some comments. This refactoring will also help future patches to switch to using a streaming read interface, and eventually AIO (https://postgr.es/m/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com) Author: Melanie Plageman, Heikki Linnakangas Reviewed-by: Andres Freund (older version) Discussion: https://postgr.es/m/CAAKRu_Yf3gvXGcCnqqfoq0Q8LX8UM-e-qbm_B1LeZh60f8WhWA%40mail.gmail.com
* Set all_visible_according_to_vm correctly with DISABLE_PAGE_SKIPPINGHeikki Linnakangas2024-03-11
| | | | | | | | | | | | | | | | | | | It's important for 'all_visible_according_to_vm' to correctly reflect whether the VM bit is set or not, even when we are not trusting the VM to skip pages, because contrary to what the comment said, lazy_scan_prune() relies on it. If it's incorrectly set to 'false', when the VM bit is in fact set, lazy_scan_prune() will try to set the VM bit again and dirty the page unnecessarily. As a result, if you used DISABLE_PAGE_SKIPPING, all heap pages were dirtied, even if there were no changes. We would also fail to clear any VM bits that were set incorrectly. This was broken in commit 980ae17310, so backpatch to v16. Backpatch-through: 16 Reviewed-by: Melanie Plageman, Peter Geoghegan Discussion: https://www.postgresql.org/message-id/3df2b582-dc1c-46b6-99b6-38eddd1b2784@iki.fi
* Add some checkpoint and redo LSNs to a couple of recovery errorsMichael Paquier2024-03-11
| | | | | | | | | | | | | | | | | | Two FATALs and one PANIC gain details about the LSNs they fail at: - When restoring from a backup_label, the FATAL log generated when not finding the checkpoint record now reports its LSN. - When restoring from a backup_label, the FATAL log generated when not finding the redo record referenced by a checkpoint record now shows both the redo and checkpoint record LSNs. - When not restoring from a backup_label, the PANIC error generated when not finding the checkpoint record now reports its LSN. This information is useful when debugging corruption issues, and these LSNs may not show up in the logs depending on the level of logging configured in the backend. Author: David Steele Discussion: https://postgr.es/m/0e90da89-77ca-4ccf-872c-9626d755e288@pgmasters.net
* Avoid stack overflow in ShowTransactionStateRec()Alexander Korotkov2024-03-08
| | | | | | | | | | | | | | | | | | | | | The function recurses, but didn't perform stack-depth checks. It's just a debugging aid, so instead of the usual check_stack_depth() call, stop the printing if we'd risk stack overflow. Here's an example of how to test this: (n=1000000; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT s$i;"; done; printf "SET log_min_messages = 'DEBUG5'; SAVEPOINT sp;") | psql >/dev/null In the passing, swap building the list of child XIDs and recursing to parent. That saves memory while recursing, reducing the risk of out of memory errors with lots of subtransactions. The saving is not very significant in practice, but this order seems more logical anyway. Report by Egor Chindyaskin and Alexander Lakhin. Discussion: https://www.postgresql.org/message-id/1672760457.940462079%40f306.i.mail.ru Author: Heikki Linnakangas Reviewed-by: Robert Haas, Andres Freund, Alexander Korotkov
* Turn tail recursion into iteration in CommitTransactionCommand()Alexander Korotkov2024-03-08
| | | | | | | | | | | | | | | Usually the compiler will optimize away the tail recursion anyway, but if it doesn't, you can drive the function into stack overflow. For example: (n=1000000; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT s$i;"; done; printf "ERROR; COMMIT;") | psql >/dev/null In order to get better readability and less changes to the existing code the recursion-replacing loop is implemented as a wrapper function. Report by Egor Chindyaskin and Alexander Lakhin. Discussion: https://postgr.es/m/1672760457.940462079%40f306.i.mail.ru Author: Alexander Korotkov, Heikki Linnakangas
* Fix misspelled assertionsAlvaro Herrera2024-03-05
| | | | | | | Remove an extra & operator, per Tom Lane. My bugs, introduced with commit 53c2a97a9266. Discussion: https://postgr.es/m/3885480.1709590472@sss.pgh.pa.us
* Rework redundant code in subtrans.cAlvaro Herrera2024-03-05
| | | | | | | | | When this code was written the duplicity didn't matter, but with all the SLRU-bank stuff we just added, it has become excessive. Turn it into a simpler loop with no code duplication. Also add a test so that this code becomes covered. Discussion: https://postgr.es/m/202403041517.3a35jw53os65@alvherre.pgsql
* Fix buildfarm failures from 2af07e2f74.Jeff Davis2024-03-04
| | | | | | | | | | Use GUC_ACTION_SAVE rather than GUC_ACTION_SET, necessary for working with parallel query. Now that the call requires more arguments, wrap the call in a new function to avoid code duplication and offer a place for a comment. Discussion: https://postgr.es/m/E1rhJpO-0027Wf-9L@gemulon.postgresql.org
* Fix search_path to a safe value during maintenance operations.Jeff Davis2024-03-04
| | | | | | | | | | | | | | | | | | | | | While executing maintenance operations (ANALYZE, CLUSTER, REFRESH MATERIALIZED VIEW, REINDEX, or VACUUM), set search_path to 'pg_catalog, pg_temp' to prevent inconsistent behavior. Functions that are used for functional indexes, in index expressions, or in materialized views and depend on a different search path must be declared with CREATE FUNCTION ... SET search_path='...'. This change was previously committed as 05e1737351, then reverted in commit 2fcc7ee7af because it was too late in the cycle. Preparation for the MAINTAIN privilege, which was previously reverted due to search_path manipulation hazards. Discussion: https://postgr.es/m/d4ccaf3658cb3c281ec88c851a09733cd9482f22.camel@j-davis.com Discussion: https://postgr.es/m/E1q7j7Y-000z1H-Hr%40gemulon.postgresql.org Discussion: https://postgr.es/m/e44327179e5c9015c8dda67351c04da552066017.camel%40j-davis.com Reviewed-by: Greg Stark, Nathan Bossart, Noah Misch
* Rework locking code in GetMultiXactIdMembersAlvaro Herrera2024-03-04
| | | | | | | | | | After commit 53c2a97a9266, the code flow around the "retry" goto label in GetMultiXactIdMembers was confused about what was possible: we never return there with a held lock, so there's no point in testing for one. This realization lets us simplify the code a bit. While at it, make the scope of a couple of local variables in the same function a bit tighter. Per Coverity.
* Simplify coding in slru.cAlvaro Herrera2024-03-04
| | | | | | | | | New code in 53c2a97a9266 uses direct array access to shared->bank_locks[bankno].lock which can be made a little bit more legible by using the SimpleLruGetBankLock helper function. Nothing terribly serious, but let's add some clarity. Discussion: https://postgr.es/m/202403041517.3a35jw53os65@alvherre.pgsql
* Remove unused #include's from backend .c filesPeter Eisentraut2024-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
* Remove unused 'countincludesself' argument to pq_sendcountedtext()Heikki Linnakangas2024-03-04
| | | | It has been unused since we removed support for protocol version 2.
* Remove unused ParallelWorkerInfo.pid fieldHeikki Linnakangas2024-03-04
| | | | | | | The pid was originally used in error context of messages propagated from parallel workers, but commit 292794f82b removed that. If the need arises in the future, you can also get the pid with "shm_mq_get_sender(pcxt->worker[i].error_mqh)->pid".
* Use MyBackendType in more places to check what process this isHeikki Linnakangas2024-03-04
| | | | | | | | | | Remove IsBackgroundWorker, IsAutoVacuumLauncherProcess(), IsAutoVacuumWorkerProcess(), and IsLogicalSlotSyncWorker() in favor of new Am*Process() macros that use MyBackendType. For consistency with the existing Am*Process() macros. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi
* Add regression test for restart points during promotionMichael Paquier2024-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | This test serves as a way to demonstrate how to use the features introduced in 37b369dc67bc, while providing coverage for 7863ee4def65 that caused the startup process to throw "PANIC: could not locate a valid checkpoint record" when starting recovery. The test checks that a node is able to properly restart following a crash when a restart point was finishing across a promotion, with an injection point added in the middle of CreateRestartPoint() to stop the restartpoint in flight. Note that this test fails when 7863ee4def65 is reverted. Kyotaro Horiguchi is the original author of this test, that has been originally posted on the thread where 7863ee4def65 was discussed. I have just upgraded and polished it to rely on injection points, making it much cheaper to reproduce the failure. This test requires injection points to be enabled in the builds, hence meson and ./configure need an update to pass this knowledge down to the test. The name of the new injection point follows the same naming convention as 6a1ea02c491d. The Makefile's EXTRA_INSTALL of recovery TAP tests is updated to include modules/injection_points. Author: Kyotaro Horiguchi, Michael Paquier Reviewed-by: Andrey Borodin, Bertrand Drouvot Discussion: https://postgr.es/m/ZdLuxBk5hGpol91B@paquier.xyz
* Replace BackendIds with 0-based ProcNumbersHeikki Linnakangas2024-03-03
| | | | | | | | | | | | | | | | | | Now that BackendId was just another index into the proc array, it was redundant with the 0-based proc numbers used in other places. Replace all usage of backend IDs with proc numbers. The only place where the term "backend id" remains is in a few pgstat functions that expose backend IDs at the SQL level. Those IDs are now in fact 0-based ProcNumbers too, but the documentation still calls them "backend ids". That term still seems appropriate to describe what the numbers are, so I let it be. One user-visible effect is that pg_temp_0 is now a valid temp schema name, for backend with ProcNumber 0. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
* Redefine backend ID to be an index into the proc arrayHeikki Linnakangas2024-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, backend ID was an index into the ProcState array, in the shared cache invalidation manager (sinvaladt.c). The entry in the ProcState array was reserved at backend startup by scanning the array for a free entry, and that was also when the backend got its backend ID. Things become slightly simpler if we redefine backend ID to be the index into the PGPROC array, and directly use it also as an index to the ProcState array. This uses a little more memory, as we reserve a few extra slots in the ProcState array for aux processes that don't need them, but the simplicity is worth it. Aux processes now also have a backend ID. This simplifies the reservation of BackendStatusArray and ProcSignal slots. You can now convert a backend ID into an index into the PGPROC array simply by subtracting 1. We still use 0-based "pgprocnos" in various places, for indexes into the PGPROC array, but the only difference now is that backend IDs start at 1 while pgprocnos start at 0. (The next commmit will get rid of the term "backend ID" altogether and make everything 0-based.) There is still a 'backendId' field in PGPROC, now part of 'vxid' which encapsulates the backend ID and local transaction ID together. It's needed for prepared xacts. For regular backends, the backendId is always equal to pgprocno + 1, but for prepared xact PGPROC entries, it's the ID of the original backend that processed the transaction. Reviewed-by: Andres Freund, Reid Thompson Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
* Convert unloggedLSN to an atomic variable.Nathan Bossart2024-02-29
| | | | | | | | | | | | | Currently, this variable is an XLogRecPtr protected by a spinlock. By converting it to an atomic variable, we can remove the spinlock, which saves a small amount of shared memory space. Since this code is not performance-critical, we use atomic operations with full barrier semantics to make it easy to reason about correctness. Author: John Morris Reviewed-by: Michael Paquier, Robert Haas, Andres Freund, Stephen Frost, Bharath Rupireddy Discussion: https://postgr.es/m/BYAPR13MB26772534335255E50318C574A0409%40BYAPR13MB2677.namprd13.prod.outlook.com Discussion: https://postgr.es/m/MN2PR13MB2688FD8B757316CB5C54C8A2A0DDA%40MN2PR13MB2688.namprd13.prod.outlook.com
* Fixups for commit 93db6cbda0.Amit Kapila2024-02-29
| | | | | | | | | | | | | | | Ensure to set always-secure search path for both local and remote connections during slot synchronization, so that malicious users can't redirect user code (e.g. operators). In the passing, improve the name of define, remove spurious return statement, and a minor change in one of the comments. Author: Bertrand Drouvot and Shveta Malik Reviewed-by: Amit Kapila, Peter Smith Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com Discussion: https://postgr.es/m/ZdcejBDCr+wlVGnO@ip-10-97-1-34.eu-west-3.compute.internal Discussion: https://postgr.es/m/CAJpy0uBNP=nrkNJkJSfF=jSocEh8vU2Owa8Rtpi=63fG=SvfVQ@mail.gmail.com
* Improve performance of subsystems on top of SLRUAlvaro Herrera2024-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More precisely, what we do here is make the SLRU cache sizes configurable with new GUCs, so that sites with high concurrency and big ranges of transactions in flight (resp. multixacts/subtransactions) can benefit from bigger caches. In order for this to work with good performance, two additional changes are made: 1. the cache is divided in "banks" (to borrow terminology from CPU caches), and algorithms such as eviction buffer search only affect one specific bank. This forestalls the problem that linear searching for a specific buffer across the whole cache takes too long: we only have to search the specific bank, whose size is small. This work is authored by Andrey Borodin. 2. Change the locking regime for the SLRU banks, so that each bank uses a separate LWLock. This allows for increased scalability. This work is authored by Dilip Kumar. (A part of this was previously committed as d172b717c6f4.) Special care is taken so that the algorithms that can potentially traverse more than one bank release one bank's lock before acquiring the next. This should happen rarely, but particularly clog.c's group commit feature needed code adjustment to cope with this. I (Álvaro) also added lots of comments to make sure the design is sound. The new GUCs match the names introduced by bcdfa5f2e2f2 in the pg_stat_slru view. The default values for these parameters are similar to the previous sizes of each SLRU. commit_ts, clog and subtrans accept value 0, which means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB of shared_buffers), with a cap of 8MB. (A new slru.c function SimpleLruAutotuneBuffers() was added to support this.) The cap was previously 1MB for clog, so for sites with more than 512MB of shared memory the total memory used increases, which is likely a good tradeoff. However, other SLRUs (notably multixact ones) retain smaller sizes and don't support a configured value of 0. These values based on shared_buffers may need to be revisited, but that's an easy change. There was some resistance to adding these new GUCs: it would be better to adjust to memory pressure automatically somehow, for example by stealing memory from shared_buffers (where the caches can grow and shrink naturally). However, doing that seems to be a much larger project and one which has made virtually no progress in several years, and because this is such a pain point for so many users, here we take the pragmatic approach. Author: Andrey Borodin <x4mmm@yandex-team.ru> Author: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova, Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra, Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev). Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
* Rename SLRU elements in view pg_stat_slruAlvaro Herrera2024-02-28
| | | | | | | | | | | | | | The new names are intended to match those in an upcoming patch that adds a few GUCs to configure the SLRU buffer sizes. Backwards compatibility concern: this changes the accepted names for function pg_stat_slru_rest(). Since this function recognizes "any other string" as a request to reset the entry for "other", this means that calling it with the old names would silently reset "other" instead of doing nothing or throwing an error. Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/202402261616.dlriae7b6emv@alvherre.pgsql
* Fix comment thinko in sequence.cMichael Paquier2024-02-27
| | | | | | | | One comment mentioned indexes, but the relation opened should be sequences. Reported-by: Matthias van de Meent Discussion: https://postgr.es/m/CAEze2WiMGNG9XK3NSUen-5BARhCnP=u=FXnf8pvpL2qDKeOsZg@mail.gmail.com
* slru.c: Reduce scope of variables in 'for' blocksAlvaro Herrera2024-02-26
| | | | Pretty boring.
* Introduce sequence_*() access functionsMichael Paquier2024-02-26
| | | | | | | | | | | | | | | | | | Similarly to tables and indexes, these functions are able to open relations with a sequence relkind, which is useful to make a distinction with the other relation kinds. Previously, commands/sequence.c used a mix of table_{close,open}() and relation_{close,open}() routines when manipulating sequence relations, so this clarifies the code. A direct effect of this change is to align the error messages produced when attempting DDLs for sequences on relations with an unexpected relkind, like a table or an index with ALTER SEQUENCE, providing an extra error detail about the relkind of the relation used in the DDL query. Author: Michael Paquier Reviewed-by: Tomas Vondra Discussion: https://postgr.es/m/ZWlohtKAs0uVVpZ3@paquier.xyz
* Promote assertion about !ReindexIsProcessingIndex to runtime error.Tom Lane2024-02-25
| | | | | | | | | | | | | | | | | | | | | When this assertion was installed (in commit d2f60a3ab), I thought it was only for catching server logic errors that caused accesses to catalogs that were undergoing index rebuilds. However, it will also fire in case of a user-defined index expression that attempts to access its own table. We occasionally see reports of people trying to do that, and typically getting unintelligible low-level errors as a result. We can provide a more on-point message by making this a regular runtime check. While at it, adjust the similar error check in systable_beginscan_ordered to use the same message text. That one is (probably) not reachable without a coding bug, but we might as well use a translatable message if we have one. Per bug #18363 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/18363-e3598a5a572d0699@postgresql.org
* Introduce a new smgr bulk loading facility.Heikki Linnakangas2024-02-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new facility makes it easier to optimize bulk loading, as the logic for buffering, WAL-logging, and syncing the relation only needs to be implemented once. It's also less error-prone: We have had a number of bugs in how a relation is fsync'd - or not - at the end of a bulk loading operation. By centralizing that logic to one place, we only need to write it correctly once. The new facility is faster for small relations: Instead of of calling smgrimmedsync(), we register the fsync to happen at next checkpoint, which avoids the fsync latency. That can make a big difference if you are e.g. restoring a schema-only dump with lots of relations. It is also slightly more efficient with large relations, as the WAL logging is performed multiple pages at a time. That avoids some WAL header overhead. The sorted GiST index build did that already, this moves the buffering to the new facility. The changes to pageinspect GiST test needs an explanation: Before this patch, the sorted GiST index build set the LSN on every page to the special GistBuildLSN value, not the LSN of the WAL record, even though they were WAL-logged. There was no particular need for it, it just happened naturally when we wrote out the pages before WAL-logging them. Now we WAL-log the pages first, like in B-tree build, so the pages are stamped with the record's real LSN. When the build is not WAL-logged, we still use GistBuildLSN. To make the test output predictable, use an unlogged index. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/30e8f366-58b3-b239-c521-422122dd5150%40iki.fi
* Add a new slot sync worker to synchronize logical slots.Amit Kapila2024-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | By enabling slot synchronization, all the failover logical replication slots on the primary (assuming configurations are appropriate) are automatically created on the physical standbys and are synced periodically. The slot sync worker on the standby server pings the primary server at regular intervals to get the necessary failover logical slots information and create/update the slots locally. The slots that no longer require synchronization are automatically dropped by the worker. The nap time of the worker is tuned according to the activity on the primary. The slot sync worker waits for some time before the next synchronization, with the duration varying based on whether any slots were updated during the last cycle. A new parameter sync_replication_slots enables or disables this new process. On promotion, the slot sync worker is shut down by the startup process to drop any temporary slots acquired by the slot sync worker and to prevent the worker from trying to fetch the failover slots. A functionality to allow logical walsenders to wait for the physical will be done in a subsequent commit. Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
* Remove superfluous 'pgprocno' field from PGPROCHeikki Linnakangas2024-02-22
| | | | | | | | | It was always just the index of the PGPROC entry from the beginning of the proc array. Introduce a macro to compute it from the pointer instead. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
* Use new overflow-safe integer comparison functions.Nathan Bossart2024-02-16
| | | | | | | | | | | | Commit 6b80394781 introduced integer comparison functions designed to be as efficient as possible while avoiding overflow. This commit makes use of these functions in many of the in-tree qsort() comparators to help ensure transitivity. Many of these comparator functions should also see a small performance boost. Author: Mats Kindahl Reviewed-by: Andres Freund, Fabrízio de Royes Mello Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com
* Pass correct count to WALRead().Jeff Davis2024-02-16
| | | | | | | | | | | | | | Previously, some callers requested XLOG_BLCKSZ bytes unconditionally. While this did not cause a problem, because the extra bytes are ignored, it's confusing and makes it harder to add safety checks. Additionally, the comment about zero padding was incorrect. With this commit, all callers request the number of bytes they actually need. Author: Bharath Rupireddy Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CALj2ACWBRFac2TingD3PE3w2EBHXUHY3=AEEZPJmqhpEOBGExg@mail.gmail.com
* Add assert to WALReadFromBuffers().Jeff Davis2024-02-16
| | | | | | Per suggestion from Andres. Discussion: https://postgr.es/m/20240214025508.6mcblauossthvaw3@awork3.anarazel.de
* Replace calls to pg_qsort() with the qsort() macro.Nathan Bossart2024-02-16
| | | | | | | | | Calls to this function might give the impression that pg_qsort() is somehow different than qsort(), when in fact there is a qsort() macro in port.h that expands all in-tree uses to pg_qsort(). Reviewed-by: Mats Kindahl Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com
* Followup fixes for transaction_timeoutAlexander Korotkov2024-02-16
| | | | | | | | | | | | | Don't deal with transaction timeout in PostgresMain(). Instead, release transaction timeout activated by StartTransaction() in CommitTransaction()/AbortTransaction()/PrepareTransaction(). Deal with both enabling and disabling transaction timeout in assign_transaction_timeout(). Also, remove potentially flaky timeouts-long isolation test, which has no guarantees to pass on slow/busy machines. Reported-by: Andres Freund Discussion: https://postgr.es/m/20240215230856.pc6k57tqxt7fhldm%40awork3.anarazel.de
* Introduce transaction_timeoutAlexander Korotkov2024-02-15
| | | | | | | | | | | | | | | | | | | This commit adds timeout that is expected to be used as a prevention of long-running queries. Any session within the transaction will be terminated after spanning longer than this timeout. However, this timeout is not applied to prepared transactions. Only transactions with user connections are affected. Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com Author: Andrey Borodin <amborodin@acm.org> Author: Japin Li <japinli@hotmail.com> Author: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Nikolay Samokhvalov <samokhvalov@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: bt23nguyent <bt23nguyent@oss.nttdata.com> Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
* Read WAL directly from WAL buffers.Jeff Davis2024-02-12
| | | | | | | | | | | | If available, read directly from WAL buffers, avoiding the need to go through the filesystem. Only for physical replication for now, but can be expanded to other callers. In preparation for replicating unflushed WAL data. Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACXKKK%3DwbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54%2BNa%3DQ%40mail.gmail.com Reviewed-by: Andres Freund, Alvaro Herrera, Nathan Bossart, Dilip Kumar, Nitin Jadhav, Melih Mutlu, Kyotaro Horiguchi
* Remove unnecessary smgropen() callsHeikki Linnakangas2024-02-12
| | | | | | | | Now that RelationCreateStorage() returns the SmgrRelation (since commit 5c1560606dc), use that. Author: Japin Li Discussion: https://www.postgresql.org/message-id/ME3P282MB316600FA62F6605477F26F6AB6742@ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM
* Set LSN for wbuf in _hash_freeovflpage() iff wbuf is modified.Amit Kapila2024-02-07
| | | | | | | | | | | Commit 861f86beea used REGBUF_NO_CHANGE at one of the places in the hash index to register the clean buffers but forgot to avoid setting LSN in that case. Reported-by: Michael Paquier Author: Kuroda Hayato Reviewed-by: Amit Kapila, Michael Paquier Discussion: https://postgr.es/m/ZbyVVG_7eW3YD5-A@paquier.xyz
* Change initial use of pg_atomic_write_u64 to initAlvaro Herrera2024-02-06
| | | | | | This only matters when using atomics emulation with semaphores. Per buildfarm member rorqual.
* Use atomic access for SlruShared->latest_page_numberAlvaro Herrera2024-02-06
| | | | | | | | | | | The new concurrency model proposed for slru.c to improve performance does not include any single lock that would coordinate processes doing concurrent reads/writes on SlruShared->latest_page_number. We can instead use atomic reads and writes for that variable. Author: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
* Give SMgrRelation pointers a well-defined lifetime.Heikki Linnakangas2024-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After calling smgropen(), it was not clear how long you could continue to use the result, because various code paths including cache invalidation could call smgrclose(), which freed the memory. Guarantee that the object won't be destroyed until the end of the current transaction, or in recovery, the commit/abort record that destroys the underlying storage. smgrclose() is now just an alias for smgrrelease(). It closes files and forgets all state except the rlocator, but keeps the SMgrRelation object valid. A new smgrdestroy() function is used by rare places that know there should be no other references to the SMgrRelation. The short version: * smgrclose() is now just an alias for smgrrelease(). It releases resources, but doesn't destroy until EOX * smgrdestroy() now frees memory, and should rarely be used. Existing code should be unaffected, but it is now possible for code that has an SMgrRelation object to use it repeatedly during a transaction as long as the storage hasn't been physically dropped. Such code would normally hold a lock on the relation. This also replaces the "ownership" mechanism of SMgrRelations with a pin counter. An SMgrRelation can now be "pinned", which prevents it from being destroyed at end of transaction. There can be multiple pins on the same SMgrRelation. In practice, the pin mechanism is only used by the relcache, so there cannot be more than one pin on the same SMgrRelation. Except with swap_relation_files XXX Author: Thomas Munro, Heikki Linnakangas Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJ8NTvqLHz6dqbQnt2c8XCki4r2QvXjBQcXpVwxTY_pvA@mail.gmail.com
* Fix locking when fixing an incomplete split of a GIN internal pageHeikki Linnakangas2024-01-29
| | | | | | | | | | | | | | | | | ginFinishSplit() expects the caller to hold an exclusive lock on the buffer, but when finishing an earlier "leftover" incomplete split of an internal page, the caller held a shared lock. That caused an assertion failure in MarkBufferDirty(). Without assertions, it could lead to corruption if two backends tried to complete the split at the same time. On master, add a test case using the new injection point facility. Report and analysis by Fei Changhong. Backpatch the fix to all supported versions. Reviewed-by: Fei Changhong, Michael Paquier Discussion: https://www.postgresql.org/message-id/tencent_A3CE810F59132D8E230475A5F0F7A08C8307@qq.com
* Combine FSM updates for prune and no-prune cases.Robert Haas2024-01-26
| | | | | | | | | | | | | | | | | lazy_scan_prune() and lazy_scan_noprune() update the freespace map with identical conditions; combine them. This consolidation is easier now that cb970240f13df2b63f0410f81f452179a2b78d6f moved visibility map updates into lazy_scan_prune(). While combining the FSM updates, simplify the logic for calling lazy_scan_new_or_empty() and lazy_scan_noprune(). Also update a few comemnts in this part of the code to make them, hopefully, clearer. Melanie Plageman and Robert Haas Discussion: https://postgr.es/m/CA%2BTgmoaLTvipm%3Dxx4rJLr07m908PCu%3DQH3uCjD1UOn8YaEuO2g%40mail.gmail.com