aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Fix comments in heaptuple.cMichael Paquier2024-06-28
| | | | | | | | | Since e27f4ee0a701, fastgetattr() and heap_getattr() are not macros, but inlined functions. Author: Junwang Zhao Reviewed-by: Stepan Neretin Discussion: https://postgr.es/m/CAEG8a3JS-JKWWyOcM7BU=vPqFXa3W7mZSHnvc3CBqx=tC+3SCA@mail.gmail.com
* Add an injection_points isolation test suite.Noah Misch2024-06-27
| | | | | | | | | Make the isolation harness recognize injection_points wait events as a type of blocked state. Test an extant inplace-update bug. Reviewed by Robert Haas and Michael Paquier. Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com
* Fix MVCC bug with prepared xact with subxacts on standbyHeikki Linnakangas2024-06-27
| | | | | | | | | | | | | | | | | | We did not recover the subtransaction IDs of prepared transactions when starting a hot standby from a shutdown checkpoint. As a result, such subtransactions were considered as aborted, rather than in-progress. That would lead to hint bits being set incorrectly, and the subtransactions suddenly becoming visible to old snapshots when the prepared transaction was committed. To fix, update pg_subtrans with prepared transactions's subxids when starting hot standby from a shutdown checkpoint. The snapshots taken from that state need to be marked as "suboverflowed", so that we also check the pg_subtrans. Backport to all supported versions. Discussion: https://www.postgresql.org/message-id/6b852e98-2d49-4ca1-9e95-db419a2696e0@iki.fi
* Fix bugs in MultiXact truncationHeikki Linnakangas2024-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. TruncateMultiXact() performs the SLRU truncations in a critical section. Deleting the SLRU segments calls ForwardSyncRequest(), which will try to compact the request queue if it's full (CompactCheckpointerRequestQueue()). That in turn allocates memory, which is not allowed in a critical section. Backtrace: TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981 postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e] postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d] postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e] postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb] postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a] postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1] postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b] postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3] postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66] postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d] postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead] postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e] postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb] postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e] /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45] postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31] To fix, bail out in CompactCheckpointerRequestQueue() without doing anything, if it's called in a critical section. That covers the above call path, as well as any other similar cases where RegisterSyncRequest might be called in a critical section. 2. After fixing that, another problem became apparent: Autovacuum process doing that truncation can deadlock with the checkpointer process. TruncateMultiXact() sets "MyProc->delayChkptFlags |= DELAY_CHKPT_START". If the sync request queue is full and cannot be compacted, the process will repeatedly sleep and retry, until there is room in the queue. However, if the checkpointer is trying to start a checkpoint at the same time, and is waiting for the DELAY_CHKPT_START processes to finish, the queue will never shrink. More concretely, the autovacuum process is stuck here: #0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570 #2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516 #3 0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:538 #4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614 #5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495 #6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566 #7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006 #8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201 #9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917 #10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760 #11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550 #12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569 and the checkpointer is stuck here: #0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50 #3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098 #4 0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464 To fix, add AbsorbSyncRequests() to the loops where the checkpointer waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to finish. Backpatch to v14. Before that, SLRU deletion didn't call RegisterSyncRequest, which avoided this failure. I'm not sure if there are other similar scenarios on older versions, but we haven't had any such reports. Discussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi
* Fix nbtree array unsatisfied inequality check.Peter Geoghegan2024-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | _bt_advance_array_keys didn't take sufficient care at the point where it decides whether to start a new primitive index scan based on a call to _bt_check_compare against finaltup (a call with the scan direction flipped around). The final decision was conditioned on rules about how the scan key offset sktrig that initially triggered array advancement (passed to _bt_advance_array_keys from its _bt_checkkeys caller) compared to the offset set by its own _bt_check_compare finaltup call. This approach was faulty, in that it allowed _bt_advance_array_keys to incorrectly start a new primitive index scan, that landed on the same leaf page (on assert-enabled builds it led to an assertion failure). In general, scans with array keys are expected to never have to read the same leaf page more than once (barring cases involving cursors, and cases where the scan restores a marked position for the inner side of a merge join). This principle was established by commit 5bf748b8. To fix, make the final decision based on whether the scan key offset set by the _bt_check_compare finaltup call is an offset to an inequality strategy scan key. An unsatisfied required inequality strategy scan key indicates that all of the scan's required equality strategy scan keys must also be satisfied by finaltup (not just by caller's tuple), and that there is a decent chance that _bt_first will be able to reposition the scan to a position many leaf pages ahead of the current leaf page. Oversight in commit 5bf748b8. Discussion: https://postgr.es/m/CAH2-Wz=DyHbcg7o6zXqzyiin8WE8vzk4tvU8Lrnh-a=EAvO0TQ@mail.gmail.com
* Fix insertion of SP-GiST REDIRECT tuples during REINDEX CONCURRENTLY.Tom Lane2024-06-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reconstruction of an SP-GiST index by REINDEX CONCURRENTLY may insert some REDIRECT tuples. This will typically happen in a transaction that lacks an XID, which leads either to assertion failure in spgFormDeadTuple or to insertion of a REDIRECT tuple with zero xid. The latter's not good either, since eventually VACUUM will apply GlobalVisTestIsRemovableXid() to the zero xid, resulting in either an assertion failure or a garbage answer. In practice, since REINDEX CONCURRENTLY locks out index scans till it's done, it doesn't matter whether it inserts REDIRECTs or PLACEHOLDERs; and likewise it doesn't matter how soon VACUUM reduces such a REDIRECT to a PLACEHOLDER. So in non-assert builds there's no observable problem here, other than perhaps a little index bloat. But it's not behaving as intended. To fix, remove the failing Assert in spgFormDeadTuple, acknowledging that we might sometimes insert a zero XID; and guard VACUUM's GlobalVisTestIsRemovableXid() call with a test for valid XID, ensuring that we'll reduce such a REDIRECT the first time VACUUM sees it. (Versions before v14 use TransactionIdPrecedes here, which won't fail on zero xid, so they really have no bug at all in non-assert builds.) Another solution could be to not create REDIRECTs at all during REINDEX CONCURRENTLY, making the relevant code paths treat that case like index build (which likewise knows that no concurrent index scans can be happening). That would allow restoring the Assert in spgFormDeadTuple, but we'd still need the VACUUM change because redirection tuples with zero xid may be out there already. But there doesn't seem to be a nice way for spginsert() to tell that it's being called in REINDEX CONCURRENTLY without some API changes, so we'll leave that as a possible future improvement. In HEAD, also rename the SpGistState.myXid field to redirectXid, which seems less misleading (since it might not in fact be our transaction's XID) and is certainly less uninformatively generic. Per bug #18499 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/18499-8a519c280f956480@postgresql.org
* Convert confusing macros in multixact.c to static inline functionsHeikki Linnakangas2024-06-16
| | | | | | | | | | | | | The macros were confused about the argument data types. All the arguments were called 'xid', and some of the macros included casts to TransactionId, even though the arguments were actually either MultiXactIds or MultiXactOffsets. It compiles to the same thing, because TransactionId, MultiXactId and MultiXactOffset are all typedefs of uint32, but it was highly misleading. Author: Maxim Orlov <orlovmg@gmail.com> Discussion: https://www.postgresql.org/message-id/CACG%3DezbLUG-OD1osAW3OchOMxZtdxHh2itYR9Zhh-a13wEBEQw%40mail.gmail.com Discussion: https://www.postgresql.org/message-id/ff143b24-a093-40da-9833-d36b83726bdf%40iki.fi
* Reintroduce dead tuple counter in pg_stat_progress_vacuum.Masahiko Sawada2024-06-14
| | | | | | | | | | | | | | | | | | Commit 667e65aac3 changed both num_dead_tuples and max_dead_tuples columns to dead_tuple_bytes and max_dead_tuple_bytes columns, respectively. But as per discussion, the number of dead tuples collected still provides meaningful insights for users. This commit reintroduces the column for the count of dead tuples, renamed as num_dead_item_ids. It avoids confusion with the number of dead tuples removed by VACUUM, which includes dead heap-only tuples but excludes any pre-existing LP_DEAD items left behind by opportunistic pruning. Bump catalog version. Reviewed-by: Peter Geoghegan, Álvaro Herrera, Andrey Borodin Discussion: https://postgr.es/m/CAD21AoBL5sJE9TRWPyv%2Bw7k5Ee5QAJqDJEDJBUdAaCzGWAdvZw%40mail.gmail.com
* Clamp result of MultiXactMemberFreezeThresholdHeikki Linnakangas2024-06-13
| | | | | | | | | | | | The purpose of the function is to reduce the effective autovacuum_multixact_freeze_max_age if the multixact members SLRU is approaching wraparound, to make multixid freezing more aggressive. The returned value should therefore never be greater than plain autovacuum_multixact_freeze_max_age. Reviewed-by: Robert Haas Discussion: https://www.postgresql.org/message-id/85fb354c-f89f-4d47-b3a2-3cbd461c90a3@iki.fi Backpatch-through: 12, all supported versions
* Harmonize function parameter names for Postgres 17.Peter Geoghegan2024-06-12
| | | | | | | | | | | | | Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced during Postgres 17 development. pg_bsd_indent still has a couple of similar inconsistencies, which I (pgeoghegan) have left untouched for now. This commit was written with help from clang-tidy, by mechanically applying the same rules as similar clean-up commits (the earliest such commit was commit 035ce1fe).
* Make RelationFlushRelation() work without ResourceOwner during abortHeikki Linnakangas2024-06-06
| | | | | | | | | | | | | | | | | | | | | ReorderBufferImmediateInvalidation() executes invalidation messages in an aborted transaction. However, RelationFlushRelation sometimes required a valid resource owner, to temporarily increment the refcount of the relache entry. Commit b8bff07daa worked around that in the main subtransaction abort function, AbortSubTransaction(), but missed this similar case in ReorderBufferImmediateInvalidation(). To fix, introduce a separate function to invalidate a relcache entry. It does the same thing as RelationClearRelation(rebuild==true) does when outside a transaction, but can be called without incrementing the refcount. Add regression test. Before this fix, it failed with: ERROR: ResourceOwnerEnlarge called after release started Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://www.postgresql.org/message-id/e56be7d9-14b1-664d-0bfc-00ce9772721c@gmail.com
* Revise GUC names quoting in messages againPeter Eisentraut2024-05-17
| | | | | | | | | | | | | | | After further review, we want to move in the direction of always quoting GUC names in error messages, rather than the previous (PG16) wildly mixed practice or the intermittent (mid-PG17) idea of doing this depending on how possibly confusing the GUC name is. This commit applies appropriate quotes to (almost?) all mentions of GUC names in error messages. It partially supersedes a243569bf65 and 8d9978a7176, which had moved things a bit in the opposite direction but which then were abandoned in a partial state. Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com
* BitmapHeapScan: Remove incorrect assert and reset fieldMelanie Plageman2024-05-16
| | | | | | | | | | | | | | | | | | | | | | 04e72ed617be pushed the skip fetch optimization (allowing bitmap heap scans to operate like index-only scans if none of the underlying data is needed) into heap AM implementations of bitmap table scan callbacks. 04e72ed617be added an assert that all tuples in blocks eligible for the optimization had been NULL-filled and emitted by the end of the scan. This assert is incorrect when not all tuples need be scanned to execute the query; for example: a join in which not all inner tuples need to be scanned before skipping to the next outer tuple. Remove the assert and reset the field on which it previously asserted to avoid incorrectly emitting NULL-filled tuples from a previous scan on rescan. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Michael Paquier, Alvaro Herrera Reported-by: Melanie Plageman Reproduced-by: Tomas Vondra, Richard Guo Discussion: https://postgr.es/m/CAMbWs48orzZVXa7-vP9Nt7vQWLTE04Qy4PePaLQYsVNQgo6qRg%40mail.gmail.com
* Revert temporal primary keys and foreign keysPeter Eisentraut2024-05-16
| | | | | | | | | | | | | | | | | | | | This feature set did not handle empty ranges correctly, and it's now too late for PostgreSQL 17 to fix it. The following commits are reverted: 6db4598fcb8 Add stratnum GiST support function 46a0cd4cefb Add temporal PRIMARY KEY and UNIQUE constraints 86232a49a43 Fix comment on gist_stratnum_btree 030e10ff1a3 Rename pg_constraint.conwithoutoverlaps to conperiod a88c800deb6 Use daterange and YMD in without_overlaps tests instead of tsrange. 5577a71fb0c Use half-open interval notation in without_overlaps tests 34768ee3616 Add temporal FOREIGN KEY contraints 482e108cd38 Add test for REPLICA IDENTITY with a temporal key c3db1f30cba doc: clarify PERIOD and WITHOUT OVERLAPS in CREATE TABLE 144c2ce0cc7 Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes Discussion: https://www.postgresql.org/message-id/d0b64a7a-dfe4-4b84-a906-c7dedfa40a3e@eisentraut.org
* Fix an assortment of typosDavid Rowley2024-05-04
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/ae9f2fcb-4b24-5bb0-4240-efbbbd944ca1@gmail.com
* Fix parallel vacuum buffer usage reporting.Masahiko Sawada2024-05-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A parallel worker's buffer usage is accumulated to its pgBufferUsage and then is accumulated into the leader's one at the end of the parallel vacuum. However, since the leader process used to use dedicated VacuumPage{Hit, Miss, Dirty} globals for the buffer usage reporting, the worker's buffer usage was not included, leading to an incorrect buffer usage report. To fix the problem, this commit makes vacuum use pgBufferUsage instruments for buffer usage reporting instead of VacuumPage{Hit, Miss, Dirty} globals. These global variables are still used by ANALYZE command and autoanalyze. This also fixes the buffer usage report of vacuuming on temporary tables, since the buffers dirtied by MarkLocalBufferDirty() were not tracked by the VacuumPageDirty variable. Parallel vacuum was introduced in 13, but the buffer usage reporting for VACUUM command with the VERBOSE option was implemented in 15. So backpatch to 15. Reported-by: Anthonin Bonnefoy Author: Anthonin Bonnefoy Reviewed-by: Alena Rybakina, Masahiko Sawada Discussion: https://postgr.es/m/CAO6_XqrQk+QZQcYs_C6nk0cMfHuUWk85vT9CrcA1NffFbAVE2A@mail.gmail.com Backpatch-through: 15
* Avoid repeating loads of frozen ID values.Noah Misch2024-04-29
| | | | | | | | | Repeating loads of inplace-updated fields tends to cause bugs like the one from the previous commit. While there's no bug to fix in these code sites, adopt the load-once style. This improves the chance of future copy/paste finding the safe style. Discussion: https://postgr.es/m/20240423003956.e7.nmisch@google.com
* Fix duplicated consecutive words in commentsDavid Rowley2024-04-28
| | | | | | | Also, fix a comment incorrectly referencing the "streaming read API". This was renamed to "read stream" shortly before being committed. Discussion: https://postgr.es/m/CAApHDvq-2Zdqytm_Hf3RmVf0qg5PS9jTFAJ5QTc9xH9pwvwDTA@mail.gmail.com
* Remove unneeded nbtree array preprocessing assert.Peter Geoghegan2024-04-22
| | | | | | | | | | | | | | | | | | | | | | | | Certain cases involving the use of cursors had assertion failures within _bt_preprocess_keys's recently added no-op return path. The assertion in question made the faulty assumption that a second or third call to _bt_preprocess_keys (within the same btrescan) could only happen when another scheduled primitive index scan was just about to begin. It would be possible to address the problem by only allowing scans that have array keys to take the new no-op path, forcing affected cases to perform redundant preprocessing work. It seems simpler to just remove the assertion, and reframe the no-op path as a more general mechanism. Take this simpler approach. The important underlying principle is that we only need to perform preprocessing once per btrescan (at most). This is expected regardless of whether or not the scan happens to have array keys. Oversight in commit 1b134ca5, which enhanced nbtree ScalarArrayOp execution. Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/ef0f7c8b-a6fa-362e-6fd6-054950f947ca@gmail.com
* Remove overzealous array element type assertion.Peter Geoghegan2024-04-21
| | | | | | | | | | | This led to spurious assertion failures in certain scenarios involving pseudo types. Oversight in commit 5bf748b8, which enhanced nbtree ScalarArrayOp execution. Reported-By: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAMbWs48f5rDOwxaT76Zd40m7n9iGZQcjEk7vG_5p3YWNh6oPfA@mail.gmail.com
* Add missing index_insert_cleanup callsTomas Vondra2024-04-19
| | | | | | | | | | | | | | | | | | | | | | | The optimization for inserts into BRIN indexes added by c1ec02be1d79 relies on a cache that needs to be explicitly released after calling index_insert(). The commit however failed to invoke the cleanup in validate_index(), which calls index_insert() indirectly through table_index_validate_scan(). After inspecting index_insert() callers, it seems unique_key_recheck() is missing the call too. Fixed by adding the two missing index_insert_cleanup() calls. The commit does two additional improvements. The aminsertcleanup() signature is modified to have the index as the first argument, to make it more like the other AM callbacks. And the aminsertcleanup() callback is invoked even if the ii_AmCache is NULL, so that it can decide if the cleanup is necessary. Author: Alvaro Herrera, Tomas Vondra Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/202401091043.e3nrqiad6gb7@alvherre.pgsql
* Fix a couple typos in BRIN codeTomas Vondra2024-04-19
| | | | | | | | Typos introduced by commits c1ec02be1d79, b43757171470 and dae761a87eda. Author: Alvaro Herrera Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/202401091043.e3nrqiad6gb7@alvherre.pgsql
* Fix typos and duplicate wordsDaniel Gustafsson2024-04-18
| | | | | | | | | | | | This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
* Don't try to fix eliminated nbtree array scan keys.Peter Geoghegan2024-04-18
| | | | | | | | | | | | | | | | | Preprocessing for nbtree index scans allowed array "input" scan keys already marked eliminated during array-specific preprocessing to be "fixed up" during preprocessing proper. This allowed eliminated scan keys on DESC index columns to spurious have their strategy commuted, causing assertion failures. To fix, teach _bt_fix_scankey_strategy to ignore these scan keys. This brings it in line with its only caller, _bt_preprocess_keys. Oversight in commit 5bf748b8, which enhanced nbtree ScalarArrayOp execution. Reported-By: Donghang Lin <donghanglin@gmail.com> Discussion: https://postgr.es/m/CAA=D8a2sHK6CAzZ=0CeafC-Y-MFXbYxnRSHvZTi=+JHu6kAa8Q@mail.gmail.com
* Refactoring for CommitTransactionCommand()/AbortCurrentTransaction()Alexander Korotkov2024-04-18
| | | | | | | | | | | | | | | fefd9a3fed turned tail recursion of CommitTransactionCommand() and AbortCurrentTransaction() into iteration. However, it splits the handling of cases between different functions. This commit puts the handling of all the cases into AbortCurrentTransactionInternal() and CommitTransactionCommandInternal(). Now CommitTransactionCommand() and AbortCurrentTransaction() are just doing the repeated calls of internal functions. Reported-by: Andres Freund Discussion: https://postgr.es/m/20240415224834.w6piwtefskoh32mv%40awork3.anarazel.de Author: Andres Freund
* Cleanup parallel BRIN index build codeTomas Vondra2024-04-17
| | | | | | | | | | | | | | | | | | Commit b43757171470 added support for parallel builds of BRIN indexes, using code similar to BTREE. But there were to be a couple unnecessary differences, particularly in how the leader waits for the workers, and merges the results. So remove these, to make the code more similar. The leader never waited on the workersdonecv condition variable, but simply called WaitForParallelWorkersToFinish() in _brin_end_parallel() and then merged the per-worker results. This worked correctly, but it seems better to do the wait and merge before _brin_end_parallel(). This commit moves the relevant code to _brin_parallel_heapscan/merge(), which means _brin_end_parallel() remains responsible only for exiting the parallel mode and accumulating WAL usage data. Discussion: https://postgr.es/m/3733d042-71e1-6ae6-5fac-00c12db62db6@enterprisedb.com
* Fix nbtree "deduce NOT NULL" scan key comment.Peter Geoghegan2024-04-16
| | | | Oversight in commit c9c0589fda.
* revert: Generalize relation analyze in table AM interfaceAlexander Korotkov2024-04-16
| | | | | | This commit reverts 27bc1772fc and dd1f6b0c17. Per review by Andres Freund. Discussion: https://postgr.es/m/20240415201057.khoyxbwwxfgzomeo%40awork3.anarazel.de
* Use the correct PG_DETOAST_DATUM macro in BRINTomas Vondra2024-04-14
| | | | | | | | | | | | Commit 6bcda4a721 replaced PG_DETOAST_DATUM with PG_DETOAST_DATUM_PACKED in two BRIN output functions, for minmax-multi and bloom opclasses. But this is incorrect - the code is accessing the data through structs that already include a 4B header, so the detoast needs to match that. But the PACKED macro may keep the 1B header, which means the struct fields will point to incorrect data. Backpatch-through: 16 Discussion: https://postgr.es/m/1df00a66-db5a-4e66-809a-99b386a06d86%40enterprisedb.com
* Update nbits_set in brin_bloom_unionTomas Vondra2024-04-14
| | | | | | | | | | | | | | | | | | | Properly update the number of bits set in the bitmap after merging the filters in brin_bloom_union. This is mostly harmless, as the counter is used only in the output function, which means pageinspect may show incorrect information about the BRIN summary. The counter does not affect correctness. Discovered while adding a regression test comparing indexes built with and without parallelism. The parallel index builds exercise the union procedure when merging results from workers, which is otherwise very hard to do in a test. Which is why this went unnoticed until now. Backpatch through 14, where the BRIN bloom opclasses were introduced. Backpatch-through: 14 Discussion: https://postgr.es/m/1df00a66-db5a-4e66-809a-99b386a06d86%40enterprisedb.com
* Revert: Implement pg_wal_replay_wait() stored procedureAlexander Korotkov2024-04-11
| | | | | | | This commit reverts 06c418e163, e37662f221, bf1e650806, 25f42429e2, ee79928441, and 74eaf66f98 per review by Heikki Linnakangas. Discussion: https://postgr.es/m/b155606b-e744-4218-bda5-29379779da1a%40iki.fi
* Revert: Allow table AM to store complex data structures in rd_amcacheAlexander Korotkov2024-04-11
| | | | | | This commit reverts 02eb07ea89 per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
* Revert: Allow table AM tuple_insert() method to return the different slotAlexander Korotkov2024-04-11
| | | | | | This commit reverts c35a3fb5e0 per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
* Revert: Allow locking updated tuples in tuple_update() and tuple_delete()Alexander Korotkov2024-04-11
| | | | | | This commit reverts 87985cc925 and 818861eb57 per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
* Revert: Let table AM insertion methods control index insertionAlexander Korotkov2024-04-11
| | | | | | This commit reverts b1484a3f19 per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
* Revert: Custom reloptions for table AMAlexander Korotkov2024-04-11
| | | | | | This commit reverts 9bd99f4c26 and 422041542f per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
* Fix inconsistency with replay of hash squeeze record for clean buffersMichael Paquier2024-04-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | aa5edbe379d6 has tweaked _hash_freeovflpage() so as the write buffer's LSN is updated only when necessary, when REGBUF_NO_CHANGE is not used. The replay code was not consistent with that, causing the write buffer's LSN to be updated and its page to be marked as dirty even if the buffer was registered in a "clean" state. This was possible for the case of a squeeze record when there are no tuples to add to the write buffer, for (is_prim_bucket_same_wrt && !is_prev_bucket_same_wrt). I have performed some validation of this commit with wal_consistency_checking and a change in WAL that logs REGBUF_NO_CHANGE to a new BKPIMAGE_*. Thanks to that, it is possible to know at replay if a buffer was clean when it was registered, then cross-checked the LSN of the "clean" page copy coming from WAL with the LSN of the block once the record has been replayed. This eats one bit in bimg_info, which is not acceptable to be integrated as-is, but it could become handy in the future. I didn't spot other areas than the one fixed by this commit at the extent of what the main regression test suite covers. As this is an oversight in aa5edbe379d6, no backpatch is required. Reported-by: Zubeyr Eryilmaz Author: Hayato Kuroda Reviewed-by: Amit Kapila, Michael Paquier Discussion: https://postgr.es/m/ZbyVVG_7eW3YD5-A@paquier.xyz
* Get rid of anonymous structJohn Naylor2024-04-09
| | | | | | | | | | | | | | This is a C11 feature, and we require C99. While at it, go the further step and get rid of the surrounding union (with uintptr_t) entirely, as there is currently no use case for this file to access the header of BlocktableEntry as a uintptr_t, and there are no additional alignment requirements. The least invasive way seems to be to transfer the old union name to this struct. Reported by Pavel Borisov and Andres Freund, per buildfarm member mylodon Reviewed by Pavel Borisov Discussion: https://postgr.es/m/CALT9ZEH11NYV8AOzKb1bWhCf6J0H=H31f0MgT9xX+HdqvcA1rw@mail.gmail.com
* Teach radix tree to embed values at runtimeJohn Naylor2024-04-08
| | | | | | | | | | | | | | | | | | | Previously, the decision to store values in leaves or within the child pointer was made at compile time, with variable length values using leaves by necessity. This commit allows introspecting the length of variable length values at runtime for that decision. This requires the ability to tell whether the last-level child pointer is actually a value, so we use a pointer tag in the lowest level bit. Use this in TID store. This entails adding a byte to the header to reserve space for the tag. Commit f35bd9bf3 stores up to three offsets within the header with no bitmap, and now the header can be embedded as above. This reduces worst-case memory usage when TIDs are sparse. Reviewed (in an earlier version) by Masahiko Sawada Discussion: https://postgr.es/m/CANWCAZYw+_KAaUNruhJfE=h6WgtBKeDG32St8vBJBEY82bGVRQ@mail.gmail.com Discussion: https://postgr.es/m/CAD21AoBci3Hujzijubomo1tdwH3XtQ9F89cTNQ4bsQijOmqnEw@mail.gmail.com
* Teach TID store to skip bitmap for small numbers of offsetsJohn Naylor2024-04-08
| | | | | | | | | | | | The header portion of BlocktableEntry has enough padding space for an array of 3 offsets (1 on 32-bit platforms). Use this space instead of having a sparse bitmap array. This will take up a constant amount of space no matter what the offsets are. Reviewed (in an earlier version) by Masahiko Sawada Discussion: https://postgr.es/m/CANWCAZYw+_KAaUNruhJfE=h6WgtBKeDG32St8vBJBEY82bGVRQ@mail.gmail.com Discussion: https://postgr.es/m/CAD21AoBci3Hujzijubomo1tdwH3XtQ9F89cTNQ4bsQijOmqnEw@mail.gmail.com
* Provide a way block-level table AMs could re-use acquire_sample_rows()Alexander Korotkov2024-04-08
| | | | | | | | | | While keeping API the same, this commit provides a way for block-level table AMs to re-use existing acquire_sample_rows() by providing custom callbacks for getting the next block and the next tuple. Reported-by: Andres Freund Discussion: https://postgr.es/m/20240407214001.jgpg5q3yv33ve6y3%40awork3.anarazel.de Reviewed-by: Pavel Borisov
* Fill CommonRdOptions with default values in extract_autovac_opts()Alexander Korotkov2024-04-08
| | | | | | Reported-by: Thomas Munro Reported-by: Pavel Borisov Discussion: https://postgr.es/m/CA%2BhUKGLZzLR50RBvuqOO3MZ%3DF54ETz-rTp1PDX9uDGP_GqyYqA%40mail.gmail.com
* Custom reloptions for table AMAlexander Korotkov2024-04-08
| | | | | | | | | | | | | | | | | | Let table AM define custom reloptions for its tables. This allows specifying AM-specific parameters by the WITH clause when creating a table. The reloptions, which could be used outside of table AM, are now extracted into the CommonRdOptions data structure. These options could be by decision of table AM directly specified by a user or calculated in some way. The new test module test_tam_options evaluates the ability to set up custom reloptions and calculate fields of CommonRdOptions on their base. The code may use some parts from prior work by Hao Wu. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Discussion: https://postgr.es/m/AMUA1wBBBxfc3tKRLLdU64rb.1.1683276279979.Hmail.wuhao%40hashdata.cn Reviewed-by: Reviewed-by: Pavel Borisov, Matthias van de Meent, Jess Davis
* Use bump context for TID bitmaps stored by vacuumJohn Naylor2024-04-08
| | | | | | | | | | | | | | | Vacuum does not pfree individual entries, and only frees the entire storage space when finished with it. This allows using a bump context, eliminating the chunk header in each leaf allocation. Most leaf allocations will be 16 to 32 bytes, so that's a significant savings. TidStoreCreateLocal gets a boolean parameter to indicate that the created store is insert-only. This requires a separate tree context for iteration, since we free the iteration state after iteration completes. Discussion: https://postgr.es/m/CANWCAZac%3DpBePg3rhX8nXkUuaLoiAJJLtmnCfZsPEAS4EtJ%3Dkg%40mail.gmail.com Discussion: https://postgr.es/m/CANWCAZZQFfxvzO8yZHFWtQV+Z2gAMv1ku16Vu7KWmb5kZQyd1w@mail.gmail.com
* Remove references to old function nameAndres Freund2024-04-07
| | | | | | | | | | | In a97bbe1f1df I accidentally referenced heapgetpage(), both in a function name and a comment. But since 44086b09753 the relevant function is named heap_prepare_pagescan(). Rename the new function to page_collect_tuples(). Reported-by: Melanie Plageman <melanieplageman@gmail.com> Reported-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20240407172615.cocrsvboqm3ttqe4@awork3.anarazel.de Discussion: https://postgr.es/m/CAApHDvp4SniHopTrVeKWcEvNXFtdki0utAvO=5R7H6TNhtULRQ@mail.gmail.com
* Fix alignment of stack variableJohn Naylor2024-04-08
| | | | | | | | Declare with union similar to PGAlignedBlock. Report and fix by Andres Freund Discussion: https://postgr.es/m/20240407190731.izm3mdazednrsiqk%40awork3.anarazel.de
* Remove redundant nbtree preprocessing assertions.Peter Geoghegan2024-04-07
| | | | | | | | One of the assertions was the subject of a false positive complaint from Coverity, but none of the assertions added much, so get rid of them. Reported-By: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3000247.1712537309@sss.pgh.pa.us
* Use streaming I/O in ANALYZE.Thomas Munro2024-04-08
| | | | | | | | | | | | | | The ANALYZE command prefetches and reads sample blocks chosen by a BlockSampler algorithm. Instead of calling [Prefetch|Read]Buffer() for each block, ANALYZE now uses the streaming API introduced in b5a9b18cd0. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
* Use conditional variable to wait for next MultiXact offsetAlvaro Herrera2024-04-07
| | | | | | | | | | | | | | | In one multixact.c edge case, we need a mechanism to wait for one multixact offset to be written before being allowed to read the next one. We used to handle this case by sleeping for one millisecond and retrying, but such sleeps have been reported as problematic in production cases. We can avoid the problem by using a condition variable: readers sleep on it and then every creator of multixacts broadcasts into the CV when creation is sufficiently far along. Author: Kyotaro Horiguchi <horikyotajntt@gmail.com> Reviewed-by: Andrey Borodin <amborodin@acm.org> Discussion: https://postgr.es/m/47A598F4-B4E7-4029-8FEC-A06A6C3CB4B5@yandex-team.ru Discussion: https://postgr.es/m/20200515.090333.24867479329066911.horikyota.ntt
* Avoid extra lookups with nbtree array inequalities.Peter Geoghegan2024-04-07
| | | | | | | | | | | | | | | | | | | nbtree index scans with SAOP inequalities (but no SAOP equalities) performed extra ORDER proc lookups for any remaining equality strategy scan keys. This could waste cycles, and caused assertion failures. Keeping around a separate ORDER proc is only necessary for a scan's non-array/non-SAOP equality scan keys when the scan has at least one other SAOP equality strategy key (a SAOP inequality shouldn't count). To fix, replace _bt_preprocess_array_keys_final's assertion with a test that makes the function return early when the scan has no SAOP equality scan keys. Oversight in commit 1b134ca5, which enhanced nbtree ScalarArrayOp execution. Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/0539d3d3-a402-0a49-ed5e-26429dffc4bd@gmail.com