aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
...
* Use TransactionXmin instead of RecentGlobalXmin in heap_abort_speculative().Andres Freund2020-04-05
| | | | | | | | | | | | | | | There's a very low risk that RecentGlobalXmin could be far enough in the past to be older than relfrozenxid, or even wrapped around. Luckily the consequences of that having happened wouldn't be too bad - the page wouldn't be pruned for a while. Avoid that risk by using TransactionXmin instead. As that's announced via MyPgXact->xmin, it is protected against wrapping around (see code comments for details around relfrozenxid). Author: Andres Freund Discussion: https://postgr.es/m/20200328213023.s4eyijhdosuc4vcj@alap3.anarazel.de Backpatch: 9.5-
* Fix bugs in gin_fuzzy_search_limit processing.Tom Lane2020-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | entryGetItem()'s three code paths each contained bugs associated with filtering the entries for gin_fuzzy_search_limit. The posting-tree path failed to advance "advancePast" after having decided to filter an item. If we ran out of items on the current page and needed to advance to the next, what would actually happen is that entryLoadMoreItems() would re-load the same page. Eventually, the random dropItem() test would accept one of the same items it'd previously rejected, and we'd move on --- but it could take awhile with small gin_fuzzy_search_limit. To add insult to injury, this case would inevitably cause entryLoadMoreItems() to decide it needed to re-descend from the root, making things even slower. The posting-list path failed to implement gin_fuzzy_search_limit filtering at all, so that all entries in the posting list would be returned. The bitmap-result path used a "gotitem" variable that it failed to update in the one place where it'd actually make a difference, ie at the one "continue" statement. I think this was unreachable in practice, because if we'd looped around then it shouldn't be the case that the entries on the new page are before advancePast. Still, the "gotitem" variable was contributing nothing to either clarity or correctness, so get rid of it. Refactor all three loops so that the termination conditions are more alike and less unreadable. The code coverage report showed that we had no coverage at all for the re-descend-from-root code path in entryLoadMoreItems(), which seems like a very bad thing, so add a test case that exercises it. We also had exactly no coverage for gin_fuzzy_search_limit, so add a simplistic test case that at least hits those code paths a little bit. Back-patch to all supported branches. Adé Heyward and Tom Lane Discussion: https://postgr.es/m/CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com
* Revert "Skip redundant anti-wraparound vacuums"Michael Paquier2020-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 2aa6e33, that added a fast path to skip anti-wraparound and non-aggressive autovacuum jobs (these have no sense as anti-wraparound implies aggressive). With a cluster using a high amount of relations with a portion of them being heavily updated, this could cause autovacuum to lock down, with autovacuum workers attempting repeatedly those jobs on the same relations for the same database, that just kept being skipped. This lock down can be solved with a manual VACUUM FREEZE. Justin King has reported one environment where the issue happened, and Julien Rouhaud and I have been able to reproduce it in a second environment. With a very aggressive autovacuum_freeze_max_age, triggering those jobs with pgbench is a matter of minutes, and hitting the lock down is a lot harder (my local tests failed to do that). Note that anti-wraparound and non-aggressive jobs can only be triggered on a subset of shared catalogs: - pg_auth_members - pg_authid - pg_database - pg_replication_origin - pg_shseclabel - pg_subscription - pg_tablespace While the lock down was possible down to v12, the root cause of those jobs is a much older issue, which needs more analysis. Bonus thanks to Andres Freund for the discussion. Reported-by: Justin King Discussion: https://postgr.es/m/CAE39h22zPLrkH17GrkDgAYL3kbjvySYD1io+rtnAUFnaJJVS4g@mail.gmail.com Backpatch-through: 12
* Consistently truncate non-key suffix columns.Peter Geoghegan2020-03-30
| | | | | | | | | | | | | | | INCLUDE indexes failed to have their non-key attributes physically truncated away in certain rare cases. This led to physically larger pivot tuples that contained useless non-key attribute values. The impact on users should be negligible, but this is still clearly a regression (Postgres 11 supports INCLUDE indexes, and yet was not affected). The bug appeared in commit dd299df8, which introduced "true" suffix truncation of key attributes. Discussion: https://postgr.es/m/CAH2-Wz=E8pkV9ivRSFHtv812H5ckf8s1-yhx61_WrJbKccGcrQ@mail.gmail.com Backpatch: 12-, where "true" suffix truncation was introduced.
* Revert "Skip WAL for new relfilenodes, under wal_level=minimal."Noah Misch2020-03-22
| | | | | | | | This reverts commit cb2fd7eac285b1b0a24eeb2b8ed4456b66c5a09f. Per numerous buildfarm members, it was incompatible with parallel query, and a test case assumed LP64. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
* Skip WAL for new relfilenodes, under wal_level=minimal.Noah Misch2020-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Back-patch to 9.5 (all supported versions). This introduces a new WAL record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As always, update standby systems before master systems. This changes sizeof(RelationData) and sizeof(IndexStmt), breaking binary compatibility for affected extensions. (The most recent commit to affect the same class of extensions was 089e4d405d0f3b94c74a2c6a54357a84a681754b.) Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
* In log_newpage_range(), heed forkNum and page_std arguments.Noah Misch2020-03-21
| | | | | | | | | The function assumed forkNum=MAIN_FORKNUM and page_std=true, ignoring the actual arguments. Existing callers passed exactly those values, so there's no live bug. Back-patch to v12, where the function first appeared, because another fix needs this. Discussion: https://postgr.es/m/20191118045434.GA1173436@rfd.leadboat.com
* During heap rebuild, lock any TOAST index until end of transaction.Noah Misch2020-03-21
| | | | | | | | | | | | swap_relation_files() calls toast_get_valid_index() to find and lock this index, just before swapping with the rebuilt TOAST index. The latter function releases the lock before returning. Potential for mischief is low; a concurrent session can issue ALTER INDEX ... SET (fillfactor = ...), which is not alarming. Nonetheless, changing pg_class.relfilenode without a lock is unconventional. Back-patch to 9.5 (all supported versions), because another fix needs this. Discussion: https://postgr.es/m/20191226001521.GA1772687@rfd.leadboat.com
* Avoid assertion failure with targeted recovery in standby mode.Fujii Masao2020-03-09
| | | | | | | | | | | | | | | | | | | | | | | | At the end of recovery, standby mode is turned off to re-fetch the last valid record from archive or pg_wal. Previously, if recovery target was reached and standby mode was turned off while the current WAL source was stream, recovery could try to retrieve WAL file containing the last valid record unexpectedly from stream even though not in standby mode. This caused an assertion failure. That is, the assertion test confirms that WAL file should not be retrieved from stream if standby mode is not true. This commit moves back the current WAL source to archive if it's stream even though not in standby mode, to avoid that assertion failure. This issue doesn't cause the server to crash when built with assertion disabled. In this case, the attempt to retrieve WAL file from stream not in standby mode just fails. And then recovery tries to retrieve WAL file from archive or pg_wal. Back-patch to all supported branches. Author: Kyotaro Horiguchi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20200227.124830.2197604521555566121.horikyota.ntt@gmail.com
* Fix mesurement of elapsed time during truncating heap in VACUUM.Fujii Masao2020-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | VACUUM may truncate heap in several batches. The activity report is logged for each batch, and contains the number of pages in the table before and after the truncation, and also the elapsed time during the truncation. Previously the elapsed time reported in each batch was the total elapsed time since starting the truncation until finishing each batch. For example, if the truncation was processed dividing into three batches, the second batch reported the accumulated time elapsed during both first and second batches. This is strange and confusing because the number of pages in the table reported together is not total. Instead, each batch should report the time elapsed during only that batch. The cause of this issue was that the resource usage snapshot was initialized only at the beginning of the truncation and was never reset later. This commit fixes the issue by changing VACUUM so that the resource usage snapshot is reset at each batch. Back-patch to all supported branches. Reported-by: Tatsuhito Kasahara Author: Tatsuhito Kasahara Reviewed-by: Masahiko Sawada, Fujii Masao Discussion: https://postgr.es/m/CAP0=ZVJsf=NvQuy+QXQZ7B=ZVLoDV_JzsVC1FRsF1G18i3zMGg@mail.gmail.com
* Store the deletion horizon XID for a deleted GIN page on the right page.Tom Lane2020-02-09
| | | | | | | | | | | | | | | | | Commit b10714080 moved the GinPageSetDeleteXid() call to a spot where the "page" variable was pointing to the wrong page, causing the XID to be inserted on a page that's not being deleted, thus allowing later GinPageIsRecyclable tests to recycle the deleted page too soon. It might be a good idea to stop using the single "page" variable for multiple purposes in this function. But for the moment I just moved the GinPageSetDeleteXid() call down beside the GinPageSetDeleted() call, which seems like a more logical place for it anyway. Back-patch to v11, as the faulty patch was. (Fortunately, the bug hasn't made it into any release yet.) Discussion: https://postgr.es/m/21620.1581098806@sss.pgh.pa.us
* Force tuple conversion when the source has missing attributes.Andrew Gierth2020-02-05
| | | | | | | | | | | | | | | | | | | | | Tuple conversion incorrectly concluded that no conversion was needed as long as all the attributes lined up. But if the source tuple has a missing attribute (from addition of a column with default), then the destination tupdesc might not reflect the same default. The typical symptom was that the affected columns would be unexpectedly NULL. Repair by always forcing conversion if the source has missing attributes, which will be filled in by the deform operation. (In theory we could optimize for when the destination has the same default, but that seemed overkill.) Backpatch to 11 where missing attributes were added. Per bug #16242. Vik Fearing (discovery, code, testing) and me (analysis, testcase). Discussion: https://postgr.es/m/16242-d1c9fca28445966b@postgresql.org
* Handle lack of DSM slots in parallel btree build, take 2.Thomas Munro2020-02-05
| | | | | | | | | | Commit 74618e77 added a new check intended to fix a bug, but put it in the wrong place so that parallel btree build was always disabled. Do the check after we've actually tried to create a DSM segment. Back-patch to 11, like the earlier commit. Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-WzmDABkJzrNnvf%2BOULK-_A_j9gkYg_Dz-H62jzNv4eKQTw%40mail.gmail.com
* Handle lack of DSM slots in parallel btree build.Thomas Munro2020-01-31
| | | | | | | | | | | | | If no DSM slots are available, a ParallelContext can still be created, but its seg pointer is NULL. Teach parallel btree build to cope with that by falling back to a regular non-parallel build, to avoid crashing with a segmentation fault. Back-patch to 11, where parallel CREATE INDEX landed. Reported-by: Nicola Contu Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/CA%2BhUKGJgJEBnkuODBVomyK3MWFvDBbMVj%3Dgdt6DnRPU-5sQ6UQ%40mail.gmail.com
* Fix crash in BRIN inclusion op functions, due to missing datum copy.Heikki Linnakangas2020-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | The BRIN add_value() and union() functions need to make a longer-lived copy of the argument, if they want to store it in the BrinValues struct also passed as argument. The functions for the "inclusion operator classes" used with box, range and inet types didn't take into account that the union helper function might return its argument as is, without making a copy. Check for that case, and make a copy if necessary. That case arises at least with the range_union() function, when one of the arguments is an 'empty' range: CREATE TABLE brintest (n numrange); CREATE INDEX brinidx ON brintest USING brin (n); INSERT INTO brintest VALUES ('empty'); INSERT INTO brintest VALUES (numrange(0, 2^1000::numeric)); INSERT INTO brintest VALUES ('(-1, 0)'); SELECT brin_desummarize_range('brinidx', 0); SELECT brin_summarize_range('brinidx', 0); Backpatch down to 9.5, where BRIN was introduced. Discussion: https://www.postgresql.org/message-id/e6e1d6eb-0a67-36aa-e779-bcca59167c14%40iki.fi Reviewed-by: Emre Hasegeli, Tom Lane, Alvaro Herrera
* Reimplement nullification of walsender timestampAlvaro Herrera2020-01-08
| | | | | | | | | | | | | | | Make the value null only at pg_stat_activity-output time, as suggested by Tom Lane, instead of messing with the internal state. This should appease buildfarm members with force_parallel_mode=regress, which are running parallel queries on logical replication walsenders. The fact that walsenders can run parallel queries should perhaps be studied more carefully, but for the moment let's get rid of the red blots in buildfarm. Backpatch to pg10, like the previous commit. Discussion: https://postgr.es/m/30804.1578438763@sss.pgh.pa.us
* pg_stat_activity: show NULL stmt start time for walsendersAlvaro Herrera2020-01-07
| | | | | | | | | | | | | Returning a non-NULL time is pointless, sinc a walsender is not a process that would be running normal transactions anyway, but the code was unintentionally exposing the process start time intermittently, which was not only bogus but it also confused monitoring systems looking for idle transactions. Fix by avoiding all updates in walsenders. Backpatch to 11, where walsenders started appearing in pg_stat_activity. Reported-by: Tomas Vondra Discussion: https://postgr.es/m/20191209234409.exe7osmyalwkt5j4@development
* Remove shadow variables linked to RedoRecPtr in xlog.cMichael Paquier2019-12-18
| | | | | | | | | | | | | | | This changes the routines in charge of recycling WAL segments past the last redo LSN to not use anymore "RedoRecPtr" as a local variable, which is also available in the context of the session as a static declaration, replacing it with "lastredoptr". This confusion has been introduced by d9fadbf, so backpatch down to v11 like the other commit. Thanks to Tom Lane, Robert Haas, Alvaro Herrera, Mark Dilger and Kyotaro Horiguchi for the input provided. Author: Ranier Vilela Discussion: https://postgr.es/m/MN2PR18MB2927F7B5F690065E1194B258E35D0@MN2PR18MB2927.namprd18.prod.outlook.com Backpatch-through: 11
* Change overly strict Assert in TransactionGroupUpdateXidStatus.Amit Kapila2019-12-17
| | | | | | | | | | | | | | | | | | This Assert thought that an overflowed transaction can never get registered for the group update.  But that is not true, because even when the number of children for a transaction got reduced, the overflow flag is not changed. And, for group update, we only care about the current number of children for a transaction that is being committed. Based on comments by Andres Freund, remove a redundant Assert in TransactionIdSetPageStatus as we already had a static Assert for the same condition a few lines earlier. Reported-by: Vignesh C Author: Dilip Kumar Reviewed-by: Amit Kapila Backpatch-through: 11 Discussion: https://postgr.es/m/CAFiTN-s5=uJw-Z6JC9gcqtBSjXsrHnU63PXBrA=pnBjqnkm5UA@mail.gmail.com
* Fix yet another crash in page split during GiST index creation.Heikki Linnakangas2019-12-16
| | | | | | | | | | | | | | | | Commit a7ee7c8513 fixed a bug in GiST page split during index creation, where we failed to re-find the position of a downlink after the page containing it was split. However, that fix was incomplete; the other call to gistinserttuples() in the same function needs to also clear 'downlinkoffnum'. Fixes bug #16134 reported by Alexander Lakhin, for real this time. The previous fix was enough to fix the crash with the reproducer script for bug #16162, but the original script for #16134 was still crashing. Backpatch to v12, like the previous incomplete fix. Discussion: https://www.postgresql.org/message-id/d869f537-abe4-d2ea-0510-38cd053f5152%40gmail.com
* Fix crash when a page was split during GiST index creation.Heikki Linnakangas2019-12-13
| | | | | | | | | | | | | | | | | | | | | The bug was similar to the one that was fixed in commit 22251686f0. When we split page X and insert the downlink for the new page, the parent page might also need to be split. When that happens, the downlink offset number we remembered for X is no longer valid. We correctly called gistFindCorrectParent() to re-find it, but gistFindCorrectParent() doesn't do anything if the LSN of the page hasn't changed, and we stopped updating LSNs during index build in commit 9155580fd5. The buggy codepath was taken if the page was split into three or more pages, and inserting the downlink caused the parent page to split. To fix, explicitly mark the downlink offset number as invalid, to force gistFindCorrectParent() to re-find it. Fixes bug #16134 reported by Alexander Lakhin, reported again as #16162 by Andreas Kunert. Thanks to Jeff Janes, Tom Lane and Tomas Vondra for debugging. Backpatch to v12, where we stopped WAL-logging during index build. Discussion: https://www.postgresql.org/message-id/16134-0423f729671dec64%40postgresql.org Discussion: https://www.postgresql.org/message-id/16162-45d21b7b6c1a3105%40postgresql.org
* Avoid assertion failure with LISTEN in a serializable transaction.Tom Lane2019-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If LISTEN is the only action in a serializable-mode transaction, and the session was not previously listening, and the notify queue is not empty, predicate.c reported an assertion failure. That happened because we'd acquire the transaction's initial snapshot during PreCommit_Notify, which was called *after* predicate.c expects any such snapshot to have been established. To fix, just swap the order of the PreCommit_Notify and PreCommit_CheckForSerializationFailure calls during CommitTransaction. This will imply holding the notify-insertion lock slightly longer, but the difference could only be meaningful in serializable mode, which is an expensive option anyway. It appears that this is just an assertion failure, with no consequences in non-assert builds. A snapshot used only to scan the notify queue could not have been involved in any serialization conflicts, so there would be nothing for PreCommit_CheckForSerializationFailure to do except assign it a prepareSeqNo and set the SXACT_FLAG_PREPARED flag. And given no conflicts, neither of those omissions affect the behavior of ReleasePredicateLocks. This admittedly once-over-lightly analysis is backed up by the lack of field reports of trouble. Per report from Mark Dilger. The bug is old, so back-patch to all supported branches; but the new test case only goes back to 9.6, for lack of adequate isolationtester infrastructure before that. Discussion: https://postgr.es/m/3ac7f397-4d5f-be8e-f354-440020675694@gmail.com Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
* Fix page modification outside of critical section in GINAlexander Korotkov2019-11-20
| | | | | | | | By oversight 52ac6cd2d0 makes ginDeletePage() sets pd_prune_xid of page to be deleted before entering the critical section. It appears that only versions 11 and later were affected by this oversight. Backpatch-through: 11
* Revise GIN READMEAlexander Korotkov2019-11-20
| | | | | | | | | | | | | | | | | We find GIN concurrency bugs from time to time. One of the problems here is that concurrency of GIN isn't well-documented in README. So, it might be even hard to distinguish design bugs from implementation bugs. This commit revised concurrency section in GIN README providing more details. Some examples are illustrated in ASCII art. Also, this commit add the explanation of how is tuple layout in internal GIN B-tree page different in comparison with nbtree. Discussion: https://postgr.es/m/CAPpHfduXR_ywyaVN4%2BOYEGaw%3DcPLzWX6RxYLBncKw8de9vOkqw%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Peter Geoghegan Backpatch-through: 9.4
* Fix traversing to the deleted GIN page via downlinkAlexander Korotkov2019-11-20
| | | | | | | | | | | | | | | Current GIN code appears to don't handle traversing to the deleted page via downlink. This commit fixes that by stepping right from the delete page like we do in nbtree. This commit also fixes setting 'deleted' flag to the GIN pages. Now other page flags are not erased once page is deleted. That helps to keep our assertions true if we arrive deleted page via downlink. Discussion: https://postgr.es/m/CAPpHfdvMvsw-NcE5bRS7R1BbvA4BxoDnVVjkXC5W0Czvy9LVrg%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Peter Geoghegan Backpatch-through: 9.4
* Fix deadlock between ginDeletePage() and ginStepRight()Alexander Korotkov2019-11-20
| | | | | | | | | | | | | | | | | When ginDeletePage() is about to delete page it locks its left sibling to revise the rightlink. So, it locks pages in right to left manner. Int he same time ginStepRight() locks pages in left to right manner, and that could cause a deadlock. This commit makes ginScanToDelete() keep exclusive lock on left siblings of currently investigated path. That elimites need to relock left sibling in ginDeletePage(). Thus, deadlock with ginStepRight() can't happen anymore. Reported-by: Chen Huajun Discussion: https://postgr.es/m/5c332bd1.87b6.16d7c17aa98.Coremail.chjischj%40163.com Author: Alexander Korotkov Reviewed-by: Peter Geoghegan Backpatch-through: 10
* Fix assertion failure when running pgbench -s.Fujii Masao2019-11-07
| | | | | | | | | | | | | | | | | | If there is the WAL page that the continuation WAL record just fits within (i.e., the continuation record ends just at the end of the page) and the LSN in such page is specified with -s option, previously pg_waldump caused an assertion failure. The cause of this assertion failure was that XLogFindNextRecord() that pg_waldump -s calls mistakenly handled such special WAL page. This commit changes XLogFindNextRecord() so that it can handle such WAL page correctly. Back-patch to all supported versions. Author: Andrey Lepikhov Reviewed-by: Fujii Masao, Michael Paquier Discussion: https://postgr.es/m/99303554-5dd5-06e6-f943-b3005ccd6edd@postgrespro.ru
* Fix initialization of fake LSN for unlogged relationsMichael Paquier2019-10-27
| | | | | | | | | | | | | | 9155580 has changed the value of the first fake LSN for unlogged relations from 1 to FirstNormalUnloggedLSN (aka 1000), GiST requiring a non-zero LSN on some pages to allow an interlocking logic to work, but its value was still initialized to 1 at the beginning of recovery or after running pg_resetwal. This fixes the initialization for both code paths. Author: Takayuki Tsunakawa Reviewed-by: Dilip Kumar, Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/OSBPR01MB2503CE851940C17DE44AE3D9FE6F0@OSBPR01MB2503.jpnprd01.prod.outlook.com Backpatch-through: 12
* Fix memory leak introduced in commit 7df159a620.Amit Kapila2019-10-21
| | | | | | | | | | | | | We memorize all internal and empty leaf pages in the 1st vacuum stage for gist indexes. They are used in the 2nd stage, to delete all the empty pages. There was a memory context page_set_context for this purpose, but we never used it. Reported-by: Amit Kapila Author: Dilip Kumar Reviewed-by: Amit Kapila Backpatch-through: 12, where it got introduced Discussion: https://postgr.es/m/CAA4eK1LGr+MN0xHZpJ2dfS8QNQ1a_aROKowZB+MPNep8FVtwAA@mail.gmail.com
* Fix failure of archive recovery with recovery_min_apply_delay enabled.Fujii Masao2019-10-18
| | | | | | | | | | | | | | | | | | | | | | | | recovery_min_apply_delay parameter is intended for use with streaming replication deployments. However, the document clearly explains that the parameter will be honored in all cases if it's specified. So it should take effect even if in archive recovery. But, previously, archive recovery with recovery_min_apply_delay enabled always failed, and caused assertion failure if --enable-caasert is enabled. The cause of this problem is that; the ownership of recoveryWakeupLatch that recovery_min_apply_delay uses was taken only when standby mode is requested. So unowned latch could be used in archive recovery, and which caused the failure. This commit changes recovery code so that the ownership of recoveryWakeupLatch is taken even in archive recovery. Which prevents archive recovery with recovery_min_apply_delay from failing. Back-patch to v9.4 where recovery_min_apply_delay was added. Author: Fujii Masao Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAHGQGwEyD6HdZLfdWc+95g=VQFPR4zQL4n+yHxQgGEGjaSVheQ@mail.gmail.com
* Make crash recovery ignore recovery_min_apply_delay setting.Fujii Masao2019-10-18
| | | | | | | | | | | | | | | | | | | In v11 or before, this setting could not take effect in crash recovery because it's specified in recovery.conf and crash recovery always starts without recovery.conf. But commit 2dedf4d9a8 integrated recovery.conf into postgresql.conf and which unexpectedly allowed this setting to take effect even in crash recovery. This is definitely not good behavior. To fix the issue, this commit makes crash recovery always ignore recovery_min_apply_delay setting. Back-patch to v12 where the issue was added. Author: Fujii Masao Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAHGQGwEyD6HdZLfdWc+95g=VQFPR4zQL4n+yHxQgGEGjaSVheQ@mail.gmail.com Discussion: https://postgr.es/m/e445616d-023e-a268-8aa1-67b8b335340c@pgmasters.net
* Make crash recovery ignore restore_command and recovery_end_command settings.Fujii Masao2019-10-11
| | | | | | | | | | | | | | | | | | In v11 or before, those settings could not take effect in crash recovery because they are specified in recovery.conf and crash recovery always starts without recovery.conf. But commit 2dedf4d9a8 integrated recovery.conf into postgresql.conf and which unexpectedly allowed those settings to take effect even in crash recovery. This is definitely not good behavior. To fix the issue, this commit makes crash recovery always ignore restore_command and recovery_end_command settings. Back-patch to v12 where the issue was added. Author: Fujii Masao Reviewed-by: Peter Eisentraut Discussion: https://postgr.es/m/e445616d-023e-a268-8aa1-67b8b335340c@pgmasters.net
* Flush logical mapping files with fd opened for read/write at checkpointMichael Paquier2019-10-09
| | | | | | | | | | | | | | | The file descriptor was opened with read-only to fsync a regular file, which would cause EBADFD errors on some platforms. This is similar to the recent fix done by a586cc4b (which was broken by me with 82a5649), except that I noticed this issue while monitoring the backend code for similar mistakes. Backpatch to 9.4, as this has been introduced since logical decoding exists as of b89e151. Author: Michael Paquier Reviewed-by: Andres Freund Discussion: https://postgr.es/m/20191006045548.GA14532@paquier.xyz Backpatch-through: 9.4
* Remove temporary WAL and history files at the end of archive recoveryMichael Paquier2019-10-02
| | | | | | | | | | | | | | | cbc55da has reworked the order of some actions at the end of archive recovery. Unfortunately this overlooked the fact that the startup process needs to remove RECOVERYXLOG (for temporary WAL segment newly recovered from archives) and RECOVERYHISTORY (for temporary history file) at this step, leaving the files around even after recovery ended. Backpatch to 9.5, like the previous commit. Author: Sawada Masahiko Reviewed-by: Fujii Masao, Michael Paquier Discussion: https://postgr.es/m/CAD21AoBO_eDQub6zojFnWtnmutRBWvYf7=cW4Hsqj+U_R26w3Q@mail.gmail.com Backpatch-through: 9.5
* Make crash recovery ignore recovery target settings.Fujii Masao2019-09-30
| | | | | | | | | | | | | | | | | | In v11 or before, recovery target settings could not take effect in crash recovery because they are specified in recovery.conf and crash recovery always starts without recovery.conf. But commit 2dedf4d9a8 integrated recovery.conf into postgresql.conf and which unexpectedly allowed recovery target settings to take effect even in crash recovery. This is definitely not good behavior. To fix the issue, this commit makes crash recovery always ignore recovery target settings. Back-patch to v12. Author: Peter Eisentraut Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/e445616d-023e-a268-8aa1-67b8b335340c@pgmasters.net
* Fix failure with lock mode used for custom relation optionsMichael Paquier2019-09-25
| | | | | | | | | | | | | | | | | | | | | In-core relation options can use a custom lock mode since 47167b7, that has lowered the lock available for some autovacuum parameters. However it forgot to consider custom relation options. This causes failures with ALTER TABLE SET when changing a custom relation option, as its lock is not defined. The existing APIs to define a custom reloption does not allow to define a custom lock mode, so enforce its initialization to AccessExclusiveMode which should be safe enough in all cases. An upcoming patch will extend the existing APIs to allow a custom lock mode to be defined. The problem can be reproduced with bloom indexes, so add a test there. Reported-by: Nikolay Sharplov Analyzed-by: Thomas Munro, Michael Paquier Author: Michael Paquier Reviewed-by: Kuntal Ghosh Discussion: https://postgr.es/m/20190920013831.GD1844@paquier.xyz Backpatch-through: 9.6
* Fix bug in pairingheap_SpGistSearchItem_cmp()Alexander Korotkov2019-09-25
| | | | | | | | | Our item contains only so->numberOfNonNullOrderBys of distances. Reflect that in the loop upper bound. Discussion: https://postgr.es/m/53536807-784c-e029-6e92-6da802ab8d60%40postgrespro.ru Author: Nikita Glukhov Backpatch-through: 12
* Message style fixesPeter Eisentraut2019-09-23
|
* Fix freeing old values in index_store_float8_orderby_distances()Alexander Korotkov2019-09-20
| | | | | | | | | | | | | | | 6cae9d2c10 has added an error in freeing old values in index_store_float8_orderby_distances() function. It looks for old value in scan->xs_orderbynulls[i] after setting a new value there. This commit fixes that. Also it removes short-circuit in handling distances == NULL situation. Now distances == NULL will be treated the same way as array with all null distances. That is, previous values will be freed if any. Reported-by: Tom Lane, Nikita Glukhov Discussion: https://postgr.es/m/CAPpHfdu2wcoAVAm3Ek66rP%3Duo_C-D84%2B%2Buf1VEcbyi_caBXWCA%40mail.gmail.com Discussion: https://postgr.es/m/426580d3-a668-b9d1-7b8e-f74d1a6524e0%40postgrespro.ru Backpatch-through: 12
* Improve handling of NULLs in KNN-GiST and KNN-SP-GiSTAlexander Korotkov2019-09-19
| | | | | | | | | | | | | | | | | | | | | This commit improves subject in two ways: * It removes ugliness of 02f90879e7, which stores distance values and null flags in two separate arrays after GISTSearchItem struct. Instead we pack both distance value and null flag in IndexOrderByDistance struct. Alignment overhead should be negligible, because we typically deal with at most few "col op const" expressions in ORDER BY clause. * It fixes handling of "col op NULL" expression in KNN-SP-GiST. Now, these expression are not passed to support functions, which can't deal with them. Instead, NULL result is implicitly assumed. It future we may decide to teach support functions to deal with NULL arguments, but current solution is bugfix suitable for backpatch. Reported-by: Nikita Glukhov Discussion: https://postgr.es/m/826f57ee-afc7-8977-c44c-6111d18b02ec%40postgrespro.ru Author: Nikita Glukhov Reviewed-by: Alexander Korotkov Backpatch-through: 9.4
* Fix nbtree page split rmgr desc routine.Peter Geoghegan2019-09-12
| | | | | | | | | | | | Include newitemoff in rmgr desc output for nbtree page split records. In passing, correct an obsolete comment that claimed that newitemoff is only logged for _L variant nbtree page split WAL records. Both issues were oversights in commit 2c03216d831, which revamped the WAL format. Author: Peter Geoghegan Backpatch: 9.5-, where the WAL format was revamped.
* Fix handling of NULL distances in KNN-GiSTAlexander Korotkov2019-09-08
| | | | | | | | | | | | | | In order to implement NULL LAST semantic GiST previously assumed distance to the NULL value to be Inf. However, our distance functions can return Inf and NaN for non-null values. In such cases, NULL LAST semantic appears to be broken. This commit fixes that by introducing separate array of null flags for distances. Backpatch to all supported versions. Discussion: https://postgr.es/m/CAPpHfdsNvNdA0DBS%2BwMpFrgwT6C3-q50sFVGLSiuWnV3FqOJuQ%40mail.gmail.com Author: Alexander Korotkov Backpatch-through: 9.4
* Fix handling Inf and Nan values in GiST pairing heap comparatorAlexander Korotkov2019-09-08
| | | | | | | | | | | | | | | | Previously plain float comparison was used in GiST pairing heap. Such comparison doesn't provide proper ordering for value sets containing Inf and Nan values. This commit fixes that by usage of float8_cmp_internal(). Note, there is remaining problem with NULL distances, which are represented as Inf in pairing heap. It would be fixes in subsequent commit. Backpatch to all supported versions. Reported-by: Andrey Borodin Discussion: https://postgr.es/m/CAPpHfdsNvNdA0DBS%2BwMpFrgwT6C3-q50sFVGLSiuWnV3FqOJuQ%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Heikki Linnakangas Backpatch-through: 9.4
* Fix behavior of AND CHAIN outside of explicit transaction blocksPeter Eisentraut2019-09-08
| | | | | | | | | | | | | When using COMMIT AND CHAIN or ROLLBACK AND CHAIN not in an explicit transaction block, the previous implementation would leave a transaction block active in the ROLLBACK case but not the COMMIT case. To fix for now, error out when using these commands not in an explicit transaction block. This restriction could be lifted if a sensible definition and implementation is found. Bug: #15977 Author: fn ln <emuser20140816@gmail.com> Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
* Make pg_promote() detect postmaster death while waiting for promotion to end.Fujii Masao2019-09-06
| | | | | | | | | | | | | | | | Previously even if postmaster died and WaitLatch() woke up with that event while pg_promote() was waiting for the standby promotion to finish, pg_promote() did nothing special and kept waiting until timeout occurred. This could cause a busy loop. This patch make pg_promote() return false immediately when postmaster dies, to avoid such a busy loop. Back-patch to v12 where pg_promote() was added. Author: Fujii Masao Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAHGQGwEs9ROgSp+QF+YdDU+xP8W=CY1k-_Ov-d_Z3JY+to3eXA@mail.gmail.com
* Better error messages for short reads/writes in SLRUPeter Eisentraut2019-09-03
| | | | | | | | | | | | This avoids getting a Could not read from file ...: Success. for a short read or write (since errno is not set in that case). Instead, report a more specific error messages. Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/5de61b6b-8be9-7771-0048-860328efe027%402ndquadrant.com
* Avoid touching replica identity index in ExtractReplicaIdentity().Tom Lane2019-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In what seems like a fit of misplaced optimization, ExtractReplicaIdentity() accessed the relation's replica-identity index without taking any lock on it. Usually, the surrounding query already holds some lock so this is safe enough ... but in the case of a previously-planned delete, there might be no existing lock. Given a suitable test case, this is exposed in v12 and HEAD by an assertion added by commit b04aeb0a0. The whole thing's rather poorly thought out anyway; rather than looking directly at the index, we should use the index-attributes bitmap that's held by the parent table's relcache entry, as the caller functions do. This is more consistent and likely a bit faster, since it avoids a cache lookup. Hence, change to doing it that way. While at it, rather than blithely assuming that the identity columns are non-null (with catastrophic results if that's wrong), add assertion checks that they aren't null. Possibly those should be actual test-and-elog, but I'll leave it like this for now. In principle, this is a bug that's been there since this code was introduced (in 9.4). In practice, the risk seems quite low, since we do have a lock on the index's parent table, so concurrent changes to the index's catalog entries seem unlikely. Given the precedent that commit 9c703c169 wasn't back-patched, I won't risk back-patching this further than v12. Per report from Hadi Moshayedi. Discussion: https://postgr.es/m/CAK=1=Wrek44Ese1V7LjKiQS-Nd-5LgLi_5_CskGbpggKEf3tKQ@mail.gmail.com
* Fix overflow check and comment in GIN posting list encoding.Heikki Linnakangas2019-08-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment did not match what the code actually did for integers with the 43rd bit set. You get an integer like that, if you have a posting list with two adjacent TIDs that are more than 2^31 blocks apart. According to the comment, we would store that in 6 bytes, with no continuation bit on the 6th byte, but in reality, the code encodes it using 7 bytes, with a continuation bit on the 6th byte as normal. The decoding routine also handled these 7-byte integers correctly, except for an overflow check that assumed that one integer needs at most 6 bytes. Fix the overflow check, and fix the comment to match what the code actually does. Also fix the comment that claimed that there are 17 unused bits in the 64-bit representation of an item pointer. In reality, there are 64-32-11=21. Fitting any item pointer into max 6 bytes was an important property when this was written, because in the old pre-9.4 format, item pointers were stored as plain arrays, with 6 bytes for every item pointer. The maximum of 6 bytes per integer in the new format guaranteed that we could convert any page from the old format to the new format after upgrade, so that the new format was never larger than the old format. But we hardly need to worry about that anymore, and running into that problem during upgrade, where an item pointer is expanded from 6 to 7 bytes such that the data doesn't fit on a page anymore, is implausible in practice anyway. Backpatch to all supported versions. This also includes a little test module to test these large distances between item pointers, without requiring a 16 TB table. It is not backpatched, I'm including it more for the benefit of future development of new posting list formats. Discussion: https://www.postgresql.org/message-id/33bfc20a-5c86-f50c-f5a5-58e9925d05ff%40iki.fi Reviewed-by: Masahiko Sawada, Alexander Korotkov
* Fix bogus commentAlvaro Herrera2019-08-20
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/20190819072244.GE18166@paquier.xyz
* Fix predicate-locking of HOT updated rows.Heikki Linnakangas2019-08-07
| | | | | | | | | | | | | | | | | | | | | | | In serializable mode, heap_hot_search_buffer() incorrectly acquired a predicate lock on the root tuple, not the returned tuple that satisfied the visibility checks. As explained in README-SSI, the predicate lock does not need to be copied or extended to other tuple versions, but for that to work, the correct, visible, tuple version must be locked in the first place. The original SSI commit had this bug in it, but it was fixed back in 2013, in commit 81fbbfe335. But unfortunately, it was reintroduced a few months later in commit b89e151054. Wising up from that, add a regression test to cover this, so that it doesn't get reintroduced again. Also, move the code that sets 't_self', so that it happens at the same time that the other HeapTuple fields are set, to make it more clear that all the code in the loop operate on the "current" tuple in the chain, not the root tuple. Bug spotted by Andres Freund, analysis and original fix by Thomas Munro, test case and some additional changes to the fix by Heikki Linnakangas. Backpatch to all supported versions (9.4). Discussion: https://www.postgresql.org/message-id/20190731210630.nqhszuktygwftjty%40alap3.anarazel.de