aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Adjust bytea get_bit/set_bit to cope with bytea strings > 256MB.Tom Lane2020-04-07
| | | | | | | | | | | | | | | | | | | Since the existing bit number argument can't exceed INT32_MAX, it's not possible for these functions to manipulate bits beyond the first 256MB of a bytea value. However, it'd be good if they could do at least that much, and not fall over entirely for longer bytea values. Adjust the comparisons to be done in int64 arithmetic so that works. Also tweak the error reports to show sane values in case of overflow. Also add some test cases to improve the miserable code coverage of these functions. Apply patch to back branches only; HEAD has a better solution as of commit 26a944cf2. Extracted from a much larger patch by Movead Li Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
* Preserve clustered index after rewrites with ALTER TABLEMichael Paquier2020-04-06
| | | | | | | | | | | | | A table rewritten by ALTER TABLE would lose tracking of an index usable for CLUSTER. This setting is tracked by pg_index.indisclustered and is controlled by ALTER TABLE, so some extra work was needed to restore it properly. Note that ALTER TABLE only marks the index that can be used for clustering, and does not do the actual operation. Author: Amit Langote, Justin Pryzby Reviewed-by: Ibrar Ahmed, Michael Paquier Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com Backpatch-through: 9.5
* Use TransactionXmin instead of RecentGlobalXmin in heap_abort_speculative().Andres Freund2020-04-05
| | | | | | | | | | | | | | | There's a very low risk that RecentGlobalXmin could be far enough in the past to be older than relfrozenxid, or even wrapped around. Luckily the consequences of that having happened wouldn't be too bad - the page wouldn't be pruned for a while. Avoid that risk by using TransactionXmin instead. As that's announced via MyPgXact->xmin, it is protected against wrapping around (see code comments for details around relfrozenxid). Author: Andres Freund Discussion: https://postgr.es/m/20200328213023.s4eyijhdosuc4vcj@alap3.anarazel.de Backpatch: 9.5-
* Save errno across LWLockRelease() callsPeter Eisentraut2020-04-05
| | | | | | Fixup for "Drop slot's LWLock before returning from SaveSlotToPath()" Reported-by: Michael Paquier <michael@paquier.xyz>
* Fix bugs in gin_fuzzy_search_limit processing.Tom Lane2020-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | entryGetItem()'s three code paths each contained bugs associated with filtering the entries for gin_fuzzy_search_limit. The posting-tree path failed to advance "advancePast" after having decided to filter an item. If we ran out of items on the current page and needed to advance to the next, what would actually happen is that entryLoadMoreItems() would re-load the same page. Eventually, the random dropItem() test would accept one of the same items it'd previously rejected, and we'd move on --- but it could take awhile with small gin_fuzzy_search_limit. To add insult to injury, this case would inevitably cause entryLoadMoreItems() to decide it needed to re-descend from the root, making things even slower. The posting-list path failed to implement gin_fuzzy_search_limit filtering at all, so that all entries in the posting list would be returned. The bitmap-result path used a "gotitem" variable that it failed to update in the one place where it'd actually make a difference, ie at the one "continue" statement. I think this was unreachable in practice, because if we'd looped around then it shouldn't be the case that the entries on the new page are before advancePast. Still, the "gotitem" variable was contributing nothing to either clarity or correctness, so get rid of it. Refactor all three loops so that the termination conditions are more alike and less unreadable. The code coverage report showed that we had no coverage at all for the re-descend-from-root code path in entryLoadMoreItems(), which seems like a very bad thing, so add a test case that exercises it. We also had exactly no coverage for gin_fuzzy_search_limit, so add a simplistic test case that at least hits those code paths a little bit. Back-patch to all supported branches. Adé Heyward and Tom Lane Discussion: https://postgr.es/m/CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com
* Fix bogus CALLED_AS_TRIGGER() defenses.Tom Lane2020-04-03
| | | | | | | | | | | | | | | | | | | | | contrib/lo's lo_manage() thought it could use trigdata->tg_trigger->tgname in its error message about not being called as a trigger. That naturally led to a core dump. unique_key_recheck() figured it could Assert that fcinfo->context is a TriggerData node in advance of having checked that it's being called as a trigger. That's harmless in production builds, and perhaps not that easy to reach in any case, but it's logically wrong. The first of these per bug #16340 from William Crowell; the second from manual inspection of other CALLED_AS_TRIGGER call sites. Back-patch the lo.c change to all supported branches, the other to v10 where the thinko crept in. Discussion: https://postgr.es/m/16340-591c7449dc7c8c47@postgresql.org
* Check equality semantics for unique indexes on partitioned tables.Tom Lane2020-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We require the partition key to be a subset of the set of columns being made unique, so that physically-separate indexes on the different partitions are sufficient to enforce the uniqueness constraint. The existing code checked that the listed columns appear, but did not inquire into the index semantics, which is a serious oversight given that different index opclasses might enforce completely different notions of uniqueness. Ideally, perhaps, we'd just match the partition key opfamily to the index opfamily. But hash partitioning uses hash opfamilies which we can't directly match to btree opfamilies. Hence, look up the equality operator in each family, and accept if it's the same operator. This should be okay in a fairly general sense, since the equality operator ought to precisely represent the opfamily's notion of uniqueness. A remaining weak spot is that we don't have a cross-index-AM notion of which opfamily member is "equality". But we know which one to use for hash and btree AMs, and those are the only two that are relevant here at present. (Any non-core AMs that know how to enforce equality are out of luck, for now.) Back-patch to v11 where this feature was introduced. Guancheng Luo, revised a bit by me Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
* Fix race condition in statext_store().Tom Lane2020-03-31
| | | | | | | | | | | | | Must hold some lock on the pg_statistic_ext_data catalog *before* we look up the tuple we aim to replace. Otherwise a concurrent VACUUM FULL or similar operation could move it to a different TID, leaving us trying to replace the wrong tuple. Back-patch to v12 where this got broken. Credit goes to Dean Rasheed; I'm just doing the clerical work. Discussion: https://postgr.es/m/CAEZATCU0zHMDiQV0g8P2U+YSP9C1idUPrn79DajsbonwkN0xvQ@mail.gmail.com
* Teach pg_ls_dir_files() to ignore ENOENT failures from stat().Tom Lane2020-03-31
| | | | | | | | | | | | | | | | Buildfarm experience shows that this function can fail with ENOENT if some other process unlinks a file between when we read the directory entry and when we try to stat() it. The problem is old but we had not noticed it until 085b6b667 added regression test coverage. To fix, just ignore ENOENT failures. There is one other case that this might hide: a symlink that points to nowhere. That seems okay though, at least better than erroring. Back-patch to v10 where this function was added, since the regression test cases were too. Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
* Revert "Skip redundant anti-wraparound vacuums"Michael Paquier2020-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 2aa6e33, that added a fast path to skip anti-wraparound and non-aggressive autovacuum jobs (these have no sense as anti-wraparound implies aggressive). With a cluster using a high amount of relations with a portion of them being heavily updated, this could cause autovacuum to lock down, with autovacuum workers attempting repeatedly those jobs on the same relations for the same database, that just kept being skipped. This lock down can be solved with a manual VACUUM FREEZE. Justin King has reported one environment where the issue happened, and Julien Rouhaud and I have been able to reproduce it in a second environment. With a very aggressive autovacuum_freeze_max_age, triggering those jobs with pgbench is a matter of minutes, and hitting the lock down is a lot harder (my local tests failed to do that). Note that anti-wraparound and non-aggressive jobs can only be triggered on a subset of shared catalogs: - pg_auth_members - pg_authid - pg_database - pg_replication_origin - pg_shseclabel - pg_subscription - pg_tablespace While the lock down was possible down to v12, the root cause of those jobs is a much older issue, which needs more analysis. Bonus thanks to Andres Freund for the discussion. Reported-by: Justin King Discussion: https://postgr.es/m/CAE39h22zPLrkH17GrkDgAYL3kbjvySYD1io+rtnAUFnaJJVS4g@mail.gmail.com Backpatch-through: 12
* Consistently truncate non-key suffix columns.Peter Geoghegan2020-03-30
| | | | | | | | | | | | | | | INCLUDE indexes failed to have their non-key attributes physically truncated away in certain rare cases. This led to physically larger pivot tuples that contained useless non-key attribute values. The impact on users should be negligible, but this is still clearly a regression (Postgres 11 supports INCLUDE indexes, and yet was not affected). The bug appeared in commit dd299df8, which introduced "true" suffix truncation of key attributes. Discussion: https://postgr.es/m/CAH2-Wz=E8pkV9ivRSFHtv812H5ckf8s1-yhx61_WrJbKccGcrQ@mail.gmail.com Backpatch: 12-, where "true" suffix truncation was introduced.
* Ensure snapshot is registered within ScanPgRelation().Andres Freund2020-03-28
| | | | | | | | | | | | | | | | | | | | | In 9.4 I added support to use a historical snapshot in ScanPgRelation(), while adding logical decoding. Unfortunately a conflict with the concurrent removal of SnapshotNow was incorrectly resolved, leading to an unregistered snapshot being used. It is not correct to use an unregistered (or non-active) snapshot for anything non-trivial, because catalog invalidations can cause the snapshot to be invalidated. Luckily it seems unlikely to actively cause problems in practice, as ScanPgRelation() requires that we already have a lock on the relation, we only look for a single row, and we don't appear to rely on the result's tid to be correct. It however is clearly wrong and potential negative consequences would likely be hard to find. So it seems worth backpatching the fix, even without a concrete hazard. Discussion: https://postgr.es/m/20200229052459.wzhqnbhrriezg4v2@alap3.anarazel.de Backpatch: 9.5-
* Drop slot's LWLock before returning from SaveSlotToPath()Peter Eisentraut2020-03-26
| | | | | | | | | | | | | When SaveSlotToPath() is called with elevel=LOG, the early exits didn't release the slot's io_in_progress_lock. This could result in a walsender being stuck on the lock forever. A possible way to get into this situation is if the offending code paths are triggered in a low disk space situation. Author: Pavan Deolasee <pavan.deolasee@2ndquadrant.com> Reported-by: Craig Ringer <craig@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/56a138c5-de61-f553-7e8f-6789296de785%402ndquadrant.com
* Re-implement the ereport() macro using __VA_ARGS__.Tom Lane2020-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we require C99, we can depend on __VA_ARGS__ to work, and revising ereport() to use it has several significant benefits: * The extra parentheses around the auxiliary function calls are now optional. Aside from being a bit less ugly, this removes a common gotcha for new contributors, because in some cases the compiler errors you got from forgetting them were unintelligible. * The auxiliary function calls are now evaluated as a comma expression list rather than as extra arguments to errfinish(). This means that compilers can be expected to warn about no-op expressions in the list, allowing detection of several other common mistakes such as forgetting to add errmsg(...) when converting an elog() call to ereport(). * Unlike the situation with extra function arguments, comma expressions are guaranteed to be evaluated left-to-right, so this removes platform dependency in the order of the auxiliary function calls. While that dependency hasn't caused us big problems in the past, this change does allow dropping some rather shaky assumptions around errcontext() domain handling. There's no intention to make wholesale changes of existing ereport calls, but as proof-of-concept this patch removes the extra parens from a couple of calls in postgres.c. While new code can be written either way, code intended to be back-patched will need to use extra parens for awhile yet. It seems worth back-patching this change into v12, so as to reduce the window where we have to be careful about that by one year. Hence, this patch is careful to preserve ABI compatibility; a followup HEAD-only patch will make some additional simplifications. Andres Freund and Tom Lane Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
* Revert "Skip WAL for new relfilenodes, under wal_level=minimal."Noah Misch2020-03-22
| | | | | | | | This reverts commit cb2fd7eac285b1b0a24eeb2b8ed4456b66c5a09f. Per numerous buildfarm members, it was incompatible with parallel query, and a test case assumed LP64. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
* Skip WAL for new relfilenodes, under wal_level=minimal.Noah Misch2020-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Back-patch to 9.5 (all supported versions). This introduces a new WAL record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As always, update standby systems before master systems. This changes sizeof(RelationData) and sizeof(IndexStmt), breaking binary compatibility for affected extensions. (The most recent commit to affect the same class of extensions was 089e4d405d0f3b94c74a2c6a54357a84a681754b.) Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
* In log_newpage_range(), heed forkNum and page_std arguments.Noah Misch2020-03-21
| | | | | | | | | The function assumed forkNum=MAIN_FORKNUM and page_std=true, ignoring the actual arguments. Existing callers passed exactly those values, so there's no live bug. Back-patch to v12, where the function first appeared, because another fix needs this. Discussion: https://postgr.es/m/20191118045434.GA1173436@rfd.leadboat.com
* During heap rebuild, lock any TOAST index until end of transaction.Noah Misch2020-03-21
| | | | | | | | | | | | swap_relation_files() calls toast_get_valid_index() to find and lock this index, just before swapping with the rebuilt TOAST index. The latter function releases the lock before returning. Potential for mischief is low; a concurrent session can issue ALTER INDEX ... SET (fillfactor = ...), which is not alarming. Nonetheless, changing pg_class.relfilenode without a lock is unconventional. Back-patch to 9.5 (all supported versions), because another fix needs this. Discussion: https://postgr.es/m/20191226001521.GA1772687@rfd.leadboat.com
* Fix cosmetic blemishes involving rd_createSubid.Noah Misch2020-03-21
| | | | | | | Remove an obsolete comment from AtEOXact_cleanup(). Restore formatting of a comment in struct RelationData, mangled by the pgindent run in commit 9af4159fce6654aa0e081b00d02bca40b978745c. Back-patch to 9.5 (all supported versions), because another fix stacks on this.
* Fix comment related to concurrent index swapping in index.cMichael Paquier2020-03-19
| | | | | | | | | | | | A comment about switching indisvalid of the new and old indexes swapped in REINDEX CONCURRENTLY got this backwards. Issue introduced by 5dc92b8, the original commit of REINDEX CONCURRENTLY. Author: Julien Rouhaud Discussion: https://postgr.es/m/20200318143340.GA46897@nol Backpatch-through: 12
* Add missing errcode() in a few ereport calls.Amit Kapila2020-03-18
| | | | | | | | | | This will allow to specifying SQLSTATE error code for the errors in the missing places. Reported-by: Sawada Masahiko Author: Sawada Masahiko Backpatch-through: 9.5 Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
* Fix typo in indexcmds.cMichael Paquier2020-03-18
| | | | | | Introduced by 61d7c7b. Backpatch-through: 12
* Fix consistency issues with replication slot copyAlvaro Herrera2020-03-17
| | | | | | | | | | | | | | | | | | | Commit 9f06d79ef831's replication slot copying failed to properly reserve the WAL that the slot is expecting to see during DecodingContextFindStartpoint (to set the confirmed_flush LSN), so concurrent activity could remove that WAL and cause the copy process to error out. But it doesn't actually *need* that WAL anyway: instead of running decode to find confirmed_flush, it can be copied from the source slot. Fix this by rearranging things to avoid DecodingContextFindStartpoint() (leaving the target slot's confirmed_flush_lsn to invalid), and set that up afterwards by copying from the target slot's value. Also ensure the source slot's confirmed_flush_lsn is valid. Reported-by: Arseny Sher Author: Masahiko Sawada, Arseny Sher Discussion: https://postgr.es/m/871rr3ohbo.fsf@ars-thinkpad
* Avoid holding a directory FD open across assorted SRF calls.Tom Lane2020-03-16
| | | | | | | | | | | | | | | | | This extends the fixes made in commit 085b6b667 to other SRFs with the same bug, namely pg_logdir_ls(), pgrowlocks(), pg_timezone_names(), pg_ls_dir(), and pg_tablespace_databases(). Also adjust various comments and documentation to warn against expecting to clean up resources during a ValuePerCall SRF's final call. Back-patch to all supported branches, since these functions were all born broken. Justin Pryzby, with cosmetic tweaks by me Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
* Restructure polymorphic-type resolution in funcapi.c.Tom Lane2020-03-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | resolve_polymorphic_tupdesc() and resolve_polymorphic_argtypes() failed to cover the case of having to resolve anyarray given only an anyrange input. The bug was masked if anyelement was also used (as either input or output), which probably helps account for our not having noticed. While looking at this I noticed that resolve_generic_type() would produce the wrong answer if asked to make that same resolution. ISTM that resolve_generic_type() is confusingly defined and overly complex, so rather than fix it, let's just make funcapi.c do the specific lookups it requires for itself. With this change, resolve_generic_type() is not used anywhere, so remove it in HEAD. In the back branches, leave it alone (complete with bug) just in case any external code is using it. While we're here, make some other refactoring adjustments in funcapi.c with an eye to upcoming future expansion of the set of polymorphic types: * Simplify quick-exit tests by adding an overall have_polymorphic_result flag. This is about a wash now but will be a win when there are more flags. * Reduce duplication of code between resolve_polymorphic_tupdesc() and resolve_polymorphic_argtypes(). * Don't bother to validate correct matching of anynonarray or anyenum; the parser should have done that, and even if it didn't, just doing "return false" here would lead to a very confusing, off-point error message. (Really, "return false" in these two functions should only occur if the call_expr isn't supplied or we can't obtain data type info from it.) * For the same reason, throw an elog rather than "return false" if we fail to resolve a polymorphic type. The bug's been there since we added anyrange, so back-patch to all supported branches. Discussion: https://postgr.es/m/6093.1584202130@sss.pgh.pa.us
* Preserve replica identity index across ALTER TABLE rewritePeter Eisentraut2020-03-13
| | | | | | | | | | | | If an index was explicitly set as replica identity index, this setting was lost when a table was rewritten by ALTER TABLE. Because this setting is part of pg_index but actually controlled by ALTER TABLE (not part of CREATE INDEX, say), we have to do some extra work to restore it. Based-on-patch-by: Quan Zongliang <quanzongliang@gmail.com> Reviewed-by: Euler Taveira <euler.taveira@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/c70fcab2-4866-0d9f-1d01-e75e189db342@gmail.com
* Avoid holding a directory FD open across pg_ls_dir_files() calls.Tom Lane2020-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This coding technique is undesirable because (a) it leaks the FD for the rest of the transaction if the SRF is not run to completion, and (b) allocated FDs are a scarce resource, but multiple interleaved uses of the relevant functions could eat many such FDs. In v11 and later, a query such as "SELECT pg_ls_waldir() LIMIT 1" yields a warning about the leaked FD, and the only reason there's no warning in earlier branches is that fd.c didn't whine about such leaks before commit 9cb7db3f0. Even disregarding the warning, it wouldn't be too hard to run a backend out of FDs with careless use of these SQL functions. Hence, rewrite the function so that it reads the directory within a single call, returning the results as a tuplestore rather than via value-per-call mode. There are half a dozen other built-in SRFs with similar problems, but let's fix this one to start with, just to see if the buildfarm finds anything wrong with the code. In passing, fix bogus error report for stat() failure: it was whining about the directory when it should be fingering the individual file. Doubtless a copy-and-paste error. Back-patch to v10 where this function was added. Justin Pryzby, with cosmetic tweaks and test cases by me Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
* Avoid duplicates in ALTER ... DEPENDS ON EXTENSIONAlvaro Herrera2020-03-11
| | | | | | | | | | | | | | | If the command is attempted for an extension that the object already depends on, silently do nothing. In particular, this means that if a database containing multiple such entries is dumped, the restore will silently do the right thing and record just the first one. (At least, in a world where pg_dump does dump such entries -- which it doesn't currently, but it will.) Backpatch to 9.6, where this kind of dependency was introduced. Reviewed-by: Ibrar Ahmed, Tom Lane (offlist) Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql
* Prevent reindex of invalid indexes on TOAST tablesMichael Paquier2020-03-10
| | | | | | | | | | | | | | | | | | | Such indexes can only be duplicated leftovers of a previously failed REINDEX CONCURRENTLY command, and a valid equivalent is guaranteed to exist. As toast indexes can only be dropped if invalid, reindexing these would lead to useless duplicated indexes that can't be dropped anymore, except if the parent relation is dropped. Thanks to Justin Pryzby for reminding that this problem was reported long ago during the review of the original patch of REINDEX CONCURRENTLY, but the issue was never addressed. Reported-by: Sergei Kornilov, Justin Pryzby Author: Julien Rouhaud Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/36712441546604286%40sas1-890ba5c2334a.qloud-c.yandex.net Discussion: https://postgr.es/m/20200216190835.GA21832@telsasoft.com Backpatch-through: 12
* Fix bug that causes to report waiting in PS display twice, in hot standby.Fujii Masao2020-03-10
| | | | | | | | | | | | | | | | | | | | | Previously "waiting" could appear twice via PS in case of lock conflict in hot standby mode. Specifically this issue happend when the delay in WAL application determined by max_standby_archive_delay and max_standby_streaming_delay had passed but it took more than 500 msec to cancel all the conflicting transactions. Especially we can observe this easily by setting those delay parameters to -1. The cause of this issue was that WaitOnLock() and ResolveRecoveryConflictWithVirtualXIDs() added "waiting" to the process title in that case. This commit prevents ResolveRecoveryConflictWithVirtualXIDs() from reporting waiting in case of lock conflict, to fix the bug. Back-patch to all back branches. Author: Masahiko Sawada Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com
* Avoid assertion failure with targeted recovery in standby mode.Fujii Masao2020-03-09
| | | | | | | | | | | | | | | | | | | | | | | | At the end of recovery, standby mode is turned off to re-fetch the last valid record from archive or pg_wal. Previously, if recovery target was reached and standby mode was turned off while the current WAL source was stream, recovery could try to retrieve WAL file containing the last valid record unexpectedly from stream even though not in standby mode. This caused an assertion failure. That is, the assertion test confirms that WAL file should not be retrieved from stream if standby mode is not true. This commit moves back the current WAL source to archive if it's stream even though not in standby mode, to avoid that assertion failure. This issue doesn't cause the server to crash when built with assertion disabled. In this case, the attempt to retrieve WAL file from stream not in standby mode just fails. And then recovery tries to retrieve WAL file from archive or pg_wal. Back-patch to all supported branches. Author: Kyotaro Horiguchi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20200227.124830.2197604521555566121.horikyota.ntt@gmail.com
* Fix typoPeter Eisentraut2020-03-07
|
* Fix more issues with dependency handling at swap phase of REINDEX CONCURRENTLYMichael Paquier2020-03-05
| | | | | | | | | | | | | | | | | | | | | | When canceling a REINDEX CONCURRENTLY operation after swapping is done, a drop of the parent table would leave behind old indexes. This is a consequence of 68ac9cf, which fixed the case of pg_depend bloat when repeating REINDEX CONCURRENTLY on the same relation. In order to take care of the problem without breaking the previous fix, this uses a different strategy, possible even with the exiting set of routines to handle dependency changes. The dependencies of/on the new index are additionally switched to the old one, allowing an old invalid index remaining around because of a cancellation or a failure to use the dependency links of the concurrently-created index. This ensures that dropping any objects the old invalid index depends on also drops the old index automatically. Reported-by: Julien Rouhaud Author: Michael Paquier Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/20200227080735.l32fqcauy73lon7o@nol Backpatch-through: 12
* Fix assertion failure with ALTER TABLE ATTACH PARTITION and indexesMichael Paquier2020-03-03
| | | | | | | | | | | | | | | | | | Using ALTER TABLE ATTACH PARTITION causes an assertion failure when attempting to work on a partitioned index, because partitioned indexes cannot have partition bounds. The grammar of ALTER TABLE ATTACH PARTITION requires partition bounds, but not ALTER INDEX, so mixing ALTER TABLE with partitioned indexes is confusing. Hence, on HEAD, prevent ALTER TABLE to attach a partition if the relation involved is a partitioned index. On back-branches, as applications may rely on the existing behavior, just remove the culprit assertion. Reported-by: Alexander Lakhin Author: Amit Langote, Michael Paquier Discussion: https://postgr.es/m/16276-5cd1dcc8fb8be7b5@postgresql.org Backpatch-through: 11
* Preserve pg_index.indisclustered across REINDEX CONCURRENTLYMichael Paquier2020-03-03
| | | | | | | | | | If the flag value is lost, a CLUSTER query following REINDEX CONCURRENTLY could fail. Non-concurrent REINDEX is already handling this case consistently. Author: Justin Pryzby Discussion: https://postgr.es/m/20200229024202.GH29456@telsasoft.com Backpatch-through: 12
* Correctly re-use hash tables in buildSubPlanHash().Tom Lane2020-02-29
| | | | | | | | | | | | | | | | | | | | | | Commit 356687bd8 omitted to remove leftover code for destroying a hashed subplan's hash tables, with the result that the tables were always rebuilt not reused; this leads to severe memory leakage if a hashed subplan is re-executed enough times. Moreover, the code for reusing the hashnulls table had a typo that would have made it do the wrong thing if it were reached. Looking at the code coverage report shows severe under-coverage of the potential callers of ResetTupleHashTable, so add some test cases that exercise them. Andreas Karlsson and Tom Lane, per reports from Ranier Vilela and Justin Pryzby. Backpatch to v11, as the faulty commit was. Discussion: https://postgr.es/m/edb62547-c453-c35b-3ed6-a069e4d6b937@proxel.se Discussion: https://postgr.es/m/CAEudQAo=DCebm1RXtig9OH+QivpS97sMkikt0A9qHmMUs+g6ZA@mail.gmail.com Discussion: https://postgr.es/m/20200210032547.GA1412@telsasoft.com
* Avoid failure if autovacuum tries to access a just-dropped temp namespace.Tom Lane2020-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Such an access became possible when commit 246a6c8f7 added more aggressive cleanup of orphaned temp relations by autovacuum. Since autovacuum's snapshot might be slightly stale, it could attempt to access an already-dropped temp namespace, resulting in an assertion failure or null-pointer dereference. (In practice, since we don't drop temp namespaces automatically but merely recycle them, this situation could only arise if a superuser does a manual drop of a temp namespace. Still, that should be allowed.) The core of the bug, IMO, is that isTempNamespaceInUse and its callers failed to think hard about whether to treat "temp namespace isn't there" differently from "temp namespace isn't in use". In hopes of forestalling future mistakes of the same ilk, replace that function with a new one checkTempNamespaceStatus, which makes the same tests but returns a three-way enum rather than just a bool. isTempNamespaceInUse is gone entirely in HEAD; but just in case some external code is relying on it, keep it in the back branches, as a bug-compatible wrapper around the new function. Per report originally from Prabhat Kumar Sahu, investigated by Mahendra Singh and Michael Paquier; the final form of the patch is my fault. This replaces the failed fix attempt in a052f6cbb. Backpatch as far as v11, as 246a6c8f7 was. Discussion: https://postgr.es/m/CAKYtNAr9Zq=1-ww4etHo-VCC-k120YxZy5OS01VkaLPaDbv2tg@mail.gmail.com
* Suppress unnecessary RelabelType nodes in more cases.Tom Lane2020-02-26
| | | | | | | | | | | | | | | | | | | | | | | | | | eval_const_expressions sometimes produced RelabelType nodes that were useless because they just relabeled an expression to the same exposed type it already had. This is worth avoiding because it can cause two equivalent expressions to not be equal(), preventing recognition of useful optimizations. In the test case added here, an unpatched planner fails to notice that the "sqli = constant" clause renders a sort step unnecessary, because one code path produces an extra RelabelType and another doesn't. Fix by ensuring that eval_const_expressions_mutator's T_RelabelType case will not add in an unnecessary RelabelType. Also save some code by sharing a subroutine with the effectively-equivalent cases for CollateExpr and CoerceToDomain. (CollateExpr had no bug, and I think that the case couldn't arise with CoerceToDomain, but it seems prudent to do the same check for all three cases.) Back-patch to v12. In principle this has been wrong all along, but I haven't seen a case where it causes visible misbehavior before v12, so refrain from changing stable branches unnecessarily. Per investigation of a report from Eric Gillum. Discussion: https://postgr.es/m/CAMmjdmvAZsUEskHYj=KT9sTukVVCiCSoe_PBKOXsncFeAUDPCQ@mail.gmail.com
* Add prefix checks in exclude lists for pg_rewind, pg_checksums and base backupsMichael Paquier2020-02-24
| | | | | | | | | | | | | | | | | | An instance of PostgreSQL crashing with a bad timing could leave behind temporary pg_internal.init files, potentially causing failures when verifying checksums. As the same exclusion lists are used between pg_rewind, pg_checksums and basebackup.c, all those tools are extended with prefix checks to keep everything in sync, with dedicated checks added for pg_internal.init. Backpatch down to 11, where pg_checksums (pg_verify_checksums in 11) and checksum verification for base backups have been introduced. Reported-by: Michael Banck Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, David Steele Discussion: https://postgr.es/m/62031974fd8e941dd8351fbc8c7eff60d59c5338.camel@credativ.de Backpatch-through: 11
* Remove extra word from comment.Etsuro Fujita2020-02-20
|
* Fix mesurement of elapsed time during truncating heap in VACUUM.Fujii Masao2020-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | VACUUM may truncate heap in several batches. The activity report is logged for each batch, and contains the number of pages in the table before and after the truncation, and also the elapsed time during the truncation. Previously the elapsed time reported in each batch was the total elapsed time since starting the truncation until finishing each batch. For example, if the truncation was processed dividing into three batches, the second batch reported the accumulated time elapsed during both first and second batches. This is strange and confusing because the number of pages in the table reported together is not total. Instead, each batch should report the time elapsed during only that batch. The cause of this issue was that the resource usage snapshot was initialized only at the beginning of the truncation and was never reset later. This commit fixes the issue by changing VACUUM so that the resource usage snapshot is reset at each batch. Back-patch to all supported branches. Reported-by: Tatsuhito Kasahara Author: Tatsuhito Kasahara Reviewed-by: Masahiko Sawada, Fujii Masao Discussion: https://postgr.es/m/CAP0=ZVJsf=NvQuy+QXQZ7B=ZVLoDV_JzsVC1FRsF1G18i3zMGg@mail.gmail.com
* Stop demanding that top xact must be seen before subxact in decoding.Amit Kapila2020-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Manifested as ERROR: subtransaction logged without previous top-level txn record this check forbids legit behaviours like - First xl_xact_assignment record is beyond reading, i.e. earlier restart_lsn. - After restart_lsn there is some change of a subxact. - After that, there is second xl_xact_assignment (for another subxact) revealing the relationship between top and first subxact. Such a transaction won't be streamed anyway because we hadn't seen it in full. Saying for sure whether xact of some record encountered after the snapshot was deserialized can be streamed or not requires to know whether it wrote something before deserialization point --if yes, it hasn't been seen in full and can't be decoded. Snapshot doesn't have such info, so there is no easy way to relax the check. Reported-by: Hsu, John Diagnosed-by: Arseny Sher Author: Arseny Sher, Amit Kapila Reviewed-by: Amit Kapila, Dilip Kumar Backpatch-through: 9.5 Discussion: https://postgr.es/m/AB5978B2-1772-4FEE-A245-74C91704ECB0@amazon.com
* Fill in extraUpdatedCols in logical replicationPeter Eisentraut2020-02-17
| | | | | | | | | | | | | | The extraUpdatedCols field of the target RTE records which generated columns are affected by an update. This is used in a variety of places, including per-column triggers and foreign data wrappers. When an update was initiated by a logical replication subscription, this field was not filled in, so such an update would not affect generated columns in a way that is consistent with normal updates. To fix, factor out some code from analyze.c to fill in extraUpdatedCols in the logical replication worker as well. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b05e781a-fa16-6b52-6738-761181204567@2ndquadrant.com
* Add description about GSSOpenServer wait event into document.Fujii Masao2020-02-17
| | | | | | | | | | | | | This commit also updates wait event enum into alphabetical order. Previously the enum entry for GSSOpenServer was added out-of-order. Back-patch to v12 where commit b0b39f72b9 introduced GSSOpenServer wait event. In v12, the commit doesn't include the update of wait event enum, not to break ABI. Author: Fujii Masao Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/949931aa-4ed4-d867-a7b5-de9c02b2292b@oss.nttdata.com
* Avoid a performance regression in float overflow/underflow detection.Tom Lane2020-02-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6bf0bc842 replaced float.c's CHECKFLOATVAL() macro with static inline subroutines, but that wasn't too well thought out. In the original coding, the unlikely condition (isinf(result) or result == 0) was checked first, and the inf_is_valid or zero_is_valid condition only afterwards. The inline-subroutine coding caused that to be swapped around, which is pretty horrid for performance because (a) in common cases the is_valid condition is twice as expensive to evaluate (e.g., requiring two isinf() calls not one) and (b) in common cases the is_valid condition is false, requiring us to perform the unlikely-condition check anyway. Net result is that one isinf() call becomes two or three, resulting in visible performance loss as reported by Keisuke Kuroda. The original fix proposal was to revert the replacement of the macro, but on second thought, that macro was just a bad idea from the beginning: if anything it's a net negative for readability of the code. So instead, let's just open-code all the overflow/underflow tests, being careful to test the unlikely condition first (and mark it unlikely() to help the compiler get the point). Also, rather than having N copies of the actual ereport() calls, collapse those into out-of-line error subroutines to save some code space. This does mean that the error file/line numbers won't be very helpful for figuring out where the issue really is --- but we'd already burned that bridge by putting the ereports into static inlines. In HEAD, check_float[48]_val() are gone altogether. In v12, leave them present in float.h but unused in the core code, just in case some extension is depending on them. Emre Hasegeli, with some kibitzing from me and Andres Freund Discussion: https://postgr.es/m/CANDwggLe1Gc1OrRqvPfGE=kM9K0FSfia0hbeFCEmwabhLz95AA@mail.gmail.com
* Fix priv checks for ALTER <object> DEPENDS ON EXTENSIONAlvaro Herrera2020-02-10
| | | | | | | | | | | | | | Marking an object as dependant on an extension did not have any privilege check whatsoever; this allowed any user to mark objects as droppable by anyone able to DROP EXTENSION, which could be used to cause system-wide havoc. Disallow by checking that the calling user owns the mentioned object. (No constraints are placed on the extension.) Security: CVE-2020-1720 Reported-by: Tom Lane Discussion: 31605.1566429043@sss.pgh.pa.us
* Translation updatesPeter Eisentraut2020-02-10
| | | | | Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: bcdfb83b81a7aa3c3948c0a5221f9c68d7010ac5
* Store the deletion horizon XID for a deleted GIN page on the right page.Tom Lane2020-02-09
| | | | | | | | | | | | | | | | | Commit b10714080 moved the GinPageSetDeleteXid() call to a spot where the "page" variable was pointing to the wrong page, causing the XID to be inserted on a page that's not being deleted, thus allowing later GinPageIsRecyclable tests to recycle the deleted page too soon. It might be a good idea to stop using the single "page" variable for multiple purposes in this function. But for the moment I just moved the GinPageSetDeleteXid() call down beside the GinPageSetDeleted() call, which seems like a more logical place for it anyway. Back-patch to v11, as the faulty patch was. (Fortunately, the bug hasn't made it into any release yet.) Discussion: https://postgr.es/m/21620.1581098806@sss.pgh.pa.us
* Fix failure to create FKs correctly in partitionsAlvaro Herrera2020-02-07
| | | | | | | | | | | | | | | | | | | On a multi-level partioned table, when adding a partition not directly connected to the root table, foreign key constraints referencing the root were not cloned to the new partition, leading to the FK being possibly inadvertently violated later on. This was caused by fuzzy thinking in CloneFkReferenced (commit f56f8f8da6af): it was skipping constraints marked as having parents on the theory that cloning those would create duplicates; but that's only correct for the top level of the partitioning hierarchy. For levels below that one, such constraints must still be considered and only skipped if later on we see that we'd create duplicates. Apparently, I (Álvaro) wrote the comments right but the code implemented something slightly different. Author: Jehan-Guillaume de Rorthais Discussion: https://postgr.es/m/20200206004948.238352db@firost
* Fix TRUNCATE .. CASCADE on partitionsAlvaro Herrera2020-02-07
| | | | | | | | | | | | | | | | When running TRUNCATE CASCADE on a child of a partitioned table referenced by another partitioned table, the truncate was not applied to partitions of the referencing table; this could leave rows violating the constraint in the referencing partitioned table. Repair by walking the pg_constraint chain all the way up to the topmost referencing table. Note: any partitioned tables containing FKs that reference other partitioned tables should be checked for possible violating rows, if TRUNCATE has occurred in partitions of the referenced table. Reported-by: Christophe Courtois Author: Jehan-Guillaume de Rorthais Discussion: https://postgr.es/m/20200204183906.115f693e@firost