aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Blind attempt to silence SSL compile failures on hamerkop.Tom Lane2021-11-02
| | | | | | | | | | Buildfarm member hamerkop has been failing for the last few days with errors that look like OpenSSL's X509-related symbols have not been imported into be-secure-openssl.c. It's unclear why this should be, but let's try adding an explicit #include of <openssl/x509v3.h>, as there has long been in fe-secure-openssl.c. Discussion: https://postgr.es/m/1051867.1635720347@sss.pgh.pa.us
* Don't overlook indexes during parallel VACUUM.Peter Geoghegan2021-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b4af70cb, which simplified state managed by VACUUM, performed refactoring of parallel VACUUM in passing. Confusion about the exact details of the tasks that the leader process is responsible for led to code that made it possible for parallel VACUUM to miss a subset of the table's indexes entirely. Specifically, indexes that fell under the min_parallel_index_scan_size size cutoff were missed. These indexes are supposed to be vacuumed by the leader (alongside any parallel unsafe indexes), but weren't vacuumed at all. Affected indexes could easily end up with duplicate heap TIDs, once heap TIDs were recycled for new heap tuples. This had generic symptoms that might be seen with almost any index corruption involving structural inconsistencies between an index and its table. To fix, make sure that the parallel VACUUM leader process performs any required index vacuuming for indexes that happen to be below the size cutoff. Also document the design of parallel VACUUM with these below-size-cutoff indexes. It's unclear how many users might be affected by this bug. There had to be at least three indexes on the table to hit the bug: a smaller index, plus at least two additional indexes that themselves exceed the size cutoff. Cases with just one additional index would not run into trouble, since the parallel VACUUM cost model requires two larger-than-cutoff indexes on the table to apply any parallel processing. Note also that autovacuum was not affected, since it never uses parallel processing. Test case based on tests from a larger patch to test parallel VACUUM by Masahiko Sawada. Many thanks to Kamigishi Rei for her invaluable help with tracking this problem down. Author: Peter Geoghegan <pg@bowt.ie> Author: Masahiko Sawada <sawada.mshk@gmail.com> Reported-By: Kamigishi Rei <iijima.yun@koumakan.jp> Reported-By: Andrew Gierth <andrew@tao11.riddles.org.uk> Diagnosed-By: Andres Freund <andres@anarazel.de> Bug: #17245 Discussion: https://postgr.es/m/17245-ddf06aaf85735f36@postgresql.org Discussion: https://postgr.es/m/20211030023740.qbnsl2xaoh2grq3d@alap3.anarazel.de Backpatch: 14-, where the refactoring commit appears.
* Ensure consistent logical replication of datetime and float8 values.Tom Lane2021-11-02
| | | | | | | | | | | | | | | | | | | In walreceiver, set the publisher's relevant GUCs (datestyle, intervalstyle, extra_float_digits) to the same values that pg_dump uses, and for the same reason: we need the output to be read the same way regardless of the receiver's settings. Without this, it's possible for subscribers to misinterpret transmitted values. Although this is clearly a bug fix, it's not without downsides: subscribers that are storing values into some other datatype, such as text, could get different results than before, and perhaps be unhappy about that. Given the lack of previous complaints, it seems best to change this only in HEAD, and to call it out as an incompatible change in v15. Japin Li, per report from Sadhuprasad Patro Discussion: https://postgr.es/m/CAFF0-CF=D7pc6st-3A9f1JnOt0qmc+BcBPVzD6fLYisKyAjkGA@mail.gmail.com
* Fix variable lifespan in ExecInitCoerceToDomain().Tom Lane2021-11-02
| | | | | | | | | | | | This undoes a mistake in 1ec7679f1: domainval and domainnull were meant to live across loop iterations, but they were incorrectly moved inside the loop. The effect was only to emit useless extra EEOP_MAKE_READONLY steps, so it's not a big deal; nonetheless, back-patch to v13 where the mistake was introduced. Ranier Vilela Discussion: https://postgr.es/m/CAEudQAqXuhbkaAp-sGH6dR6Nsq7v28_0TPexHOm6FiDYqwQD-w@mail.gmail.com
* Avoid O(N^2) behavior in SyncPostCheckpoint().Tom Lane2021-11-02
| | | | | | | | | | | | | | | | | | | | | As in commits 6301c3ada and e9d9ba2a4, avoid doing repetitive list_delete_first() operations, since that would be expensive when there are many files waiting to be unlinked. This is a slightly larger change than in those cases. We have to keep the list state valid for calls to AbsorbSyncRequests(), so it's necessary to invent a "canceled" field instead of immediately deleting PendingUnlinkEntry entries. Also, because we might not be able to process all the entries, we need a new list primitive list_delete_first_n(). list_delete_first_n() is almost list_copy_tail(), but it modifies the input List instead of making a new copy. I found a couple of existing uses of the latter that could profitably use the new function. (There might be more, but the other callers look like they probably shouldn't overwrite the input List.) As before, back-patch to v13. Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
* Move MarkCurrentTransactionIdLoggedIfAny() out of the critical section.Amit Kapila2021-11-02
| | | | | | | | | | | We don't modify any shared state in this function which could cause problems for any concurrent session. This will make it look similar to the other updates for the same structure (TransactionState) which avoids confusion for future readers of code. Author: Dilip Kumar Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org
* Replace XLOG_INCLUDE_XID flag with a more localized flag.Amit Kapila2021-11-02
| | | | | | | | | | | | | | | | Commit 0bead9af484c introduced XLOG_INCLUDE_XID flag to indicate that the WAL record contains subXID-to-topXID association. It uses that flag later to mark in CurrentTransactionState that top-xid is logged so that we should not try to log it again with the next WAL record in the current subtransaction. However, we can use a localized variable to pass that information. In passing, change the related function and variable names to make them consistent with what the code is actually doing. Author: Dilip Kumar Reviewed-by: Alvaro Herrera, Amit Kapila Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org
* Replace unicode characters in comments with asciiDaniel Gustafsson2021-11-01
| | | | | | | | | | | | | | | | | | | The unicode characters, while in comments and not code, caused MSVC to emit compiler warning C4819: The file contains a character that cannot be represented in the current code page (number). Save the file in Unicode format to prevent data loss. Fix by replacing the characters in print.c with descriptive comments containing the codepoints and symbol names, and remove the character in brin_bloom.c which was a footnote reference copied from the paper citation. Per report from hamerkop in the buildfarm. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/340E4118-0D0C-4E85-8141-8C40EB22DA3A@yesql.se
* Avoid some other O(N^2) hazards in list manipulation.Tom Lane2021-11-01
| | | | | | | | | | | | In the same spirit as 6301c3ada, fix some more places where we were using list_delete_first() in a loop and thereby risking O(N^2) behavior. It's not clear that the lists manipulated in these spots can get long enough to be really problematic ... but it's not clear that they can't, either, and the fixes are simple enough. As before, back-patch to v13. Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
* Handle XLOG_OVERWRITE_CONTRECORD in DecodeXLogOpAlvaro Herrera2021-11-01
| | | | | | | | | Failing to do so results in inability of logical decoding to process the WAL stream. Handle it by doing nothing. Backpatch all the way back. Reported-by: Petr Jelínek <petr.jelinek@enterprisedb.com>
* Preserve opclass parameters across REINDEX CONCURRENTLYMichael Paquier2021-11-01
| | | | | | | | | | | | | | | | | | | The opclass parameter Datums from the old index are fetched in the same way as for predicates and expressions, by grabbing them directly from the system catalogs. They are then copied into the new IndexInfo that will be used for the creation of the new copy. This caused the new index to be rebuilt with default parameters rather than the ones pre-defined by a user. The only way to get back a new index with correct opclass parameters would be to recreate a new index from scratch. The issue has been introduced by 911e702. Author: Michael Paquier Reviewed-by: Zhihong Yu Discussion: https://postgr.es/m/YX0CG/QpLXcPr8HJ@paquier.xyz Backpatch-through: 13
* Avoid O(N^2) behavior when the standby process releases many locks.Tom Lane2021-10-31
| | | | | | | | | | | | | | | | | When replaying a transaction that held many exclusive locks on the primary, a standby server's startup process would expend O(N^2) effort on manipulating the list of locks. This code was fine when written, but commit 1cff1b95a made repetitive list_delete_first() calls inefficient, as explained in its commit message. Fix by just iterating the list normally, and releasing storage only when done. (This'd be inadequate if we needed to recover from an error occurring partway through; but we don't.) Back-patch to v13 where 1cff1b95a came in. Nathan Bossart Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
* Fix race condition in startup progress reporting.Robert Haas2021-10-29
| | | | | | | | | | | | | | Commit 9ce346eabf350a130bba46be3f8c50ba28506969 added startup progress reporting, but begin_startup_progress_phase has a race condition: the timeout for the previous phase might fire just before we reschedule the interrupt for the next phase. To avoid the race, disable the timeout, clear the flag, and then re-enable the timeout. Patch by me, reviewed by Nitin Jadhav. Discussion: https://postgr.es/m/CA+TgmoYq38i6iAzfRLVxA6Cm+wMCf4WM8wC3o_a+X_JvWC8bJg@mail.gmail.com
* When fetching WAL for a basebackup, report errors with a sensible TLI.Robert Haas2021-10-29
| | | | | | | | | | | | | | | | | | | | The previous code used ThisTimeLineID, which need not even be initialized here, although it usually was in practice, because pg_basebackup issues IDENTIFY_SYSTEM before calling BASE_BACKUP, and that initializes ThisTimeLineID as a side effect. That's not really good enough, though, not only because we shoudn't be counting on side effects like that, but also because the TLI could change meanwhile. Fortunately, we have convenient access to more meaningful TLI values, so use those instead. Because of the way this logic is coded, the consequences of using a possibly-incorrect TLI here are no worse than a slightly confusing error message, I don't want to take any risk here, so no back-patch at least for now. Patch by me, reviewed by Kyotaro Horiguchi and Michael Paquier Discussion: http://postgr.es/m/CA+TgmoZRNWGWYDX9RgTXMG6_nwSdB=PB-PPRUbvMUTGfmL2sHQ@mail.gmail.com
* Demote pg_unreachable() in heapam to an assertion.Peter Geoghegan2021-10-29
| | | | | | | | | | | | | | Commit d168b66682, which overhauled index deletion, added a pg_unreachable() to the end of a sort comparator used when sorting heap TIDs from an index page. This allows the compiler to apply optimizations that assume that the heap TIDs from the index AM must always be unique. That doesn't seem like a good idea now, given recent reports of corruption involving duplicate TIDs in indexes on Postgres 14. Demote to an assertion, just in case. Backpatch: 14-, where index deletion was overhauled.
* Remove obsolete nbtree LP_DEAD item comments.Peter Geoghegan2021-10-27
| | | | | | | | Comments above _bt_findinsertloc() that talk about LP_DEAD items are now out of place. We already discuss index tuple deletion at an earlier point in the same comment block. Oversight in commit d168b666.
* Grant memory views to pg_read_all_stats.Jeff Davis2021-10-27
| | | | | | | | | | Grant privileges on views pg_backend_memory_contexts and pg_shmem_allocations to the role pg_read_all_stats. Also grant on the underlying functions that those views depend on. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Nathan Bossart <bossartn@amazon.com> Discussion: https://postgr.es/m/CALj2ACWAZo3Ar_EVsn2Zf9irG+hYK3cmh1KWhZS_Od45nd01RA@mail.gmail.com
* Fix typos in commentsDaniel Gustafsson2021-10-27
| | | | | Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/CAHut+PsN_gmKu-KfeEb9NDARoTPbs4AN4PPu=6LZXFZRJ13SEw@mail.gmail.com
* Fix ordering of items in nbtree error message.Peter Geoghegan2021-10-27
| | | | | | Oversight in commit a5213adf. Backpatch: 13-, just like commit a5213adf.
* Further harden nbtree posting split code.Peter Geoghegan2021-10-27
| | | | | | | | | | | Add more defensive checks around posting list split code. These should detect corruption involving duplicate table TIDs earlier and more reliably than any existing check. Follow up to commit 8f72bbac. Discussion: https://postgr.es/m/CAH2-WzkrSY_kjyd1_M5xJK1uM0govJXMxPn8JUSvwcUOiHuWVw@mail.gmail.com Backpatch: 13-, where nbtree deduplication was introduced.
* Allow publishing the tables of schema.Amit Kapila2021-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | A new option "FOR ALL TABLES IN SCHEMA" in Create/Alter Publication allows one or more schemas to be specified, whose tables are selected by the publisher for sending the data to the subscriber. The new syntax allows specifying both the tables and schemas. For example: CREATE PUBLICATION pub1 FOR TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2; OR ALTER PUBLICATION pub1 ADD TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2; A new system table "pg_publication_namespace" has been added, to maintain the schemas that the user wants to publish through the publication. Modified the output plugin (pgoutput) to publish the changes if the relation is part of schema publication. Updates pg_dump to identify and dump schema publications. Updates the \d family of commands to display schema publications and \dRp+ variant will now display associated schemas if any. Author: Vignesh C, Hou Zhijie, Amit Kapila Syntax-Suggested-by: Tom Lane, Alvaro Herrera Reviewed-by: Greg Nancarrow, Masahiko Sawada, Hou Zhijie, Amit Kapila, Haiying Tang, Ajin Cherian, Rahila Syed, Bharath Rupireddy, Mark Dilger Tested-by: Haiying Tang Discussion: https://www.postgresql.org/message-id/CALDaNm0OANxuJ6RXqwZsM1MSY4s19nuH3734j4a72etDwvBETQ@mail.gmail.com
* Allow GRANT on pg_log_backend_memory_contexts().Jeff Davis2021-10-26
| | | | | | | | | | | | | Remove superuser check, allowing any user granted permissions on pg_log_backend_memory_contexts() to log the memory contexts of any backend. Note that this could allow a privileged non-superuser to log the memory contexts of a superuser backend, but as discussed, that does not seem to be a problem. Reviewed-by: Nathan Bossart, Bharath Rupireddy, Michael Paquier, Kyotaro Horiguchi, Andres Freund Discussion: https://postgr.es/m/e5cf6684d17c8d1ef4904ae248605ccd6da03e72.camel@j-davis.com
* Improve HINT message that FDW reports when there are no valid options.Fujii Masao2021-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The foreign data wrapper's validator function provides a HINT message with list of valid options for the object specified in CREATE or ALTER command, when the option given in the command is invalid. Previously postgresql_fdw_validator() and the validator functions for postgres_fdw and dblink_fdw worked in that way even there were no valid options in the object, which could lead to the HINT message with empty list (because there were no valid options). For example, ALTER FOREIGN DATA WRAPPER postgres_fdw OPTIONS (format 'csv') reported the following ERROR and HINT messages. This behavior was confusing. ERROR: invalid option "format" HINT: Valid options in this context are: There is no such issue in file_fdw. The validator function for file_fdw reports the HINT message "There are no valid options in this context." instead in that case. This commit improves postgresql_fdw_validator() and the validator functions for postgres_fdw and dblink_fdw so that they do likewise. For example, this change causes the above ALTER FOREIGN DATA WRAPPER command to report the following messages. ERROR: invalid option "nonexistent" HINT: There are no valid options in this context. Author: Kosei Masumura Reviewed-by: Bharath Rupireddy, Fujii Masao Discussion: https://postgr.es/m/557d06cebe19081bfcc83ee2affc98d3@oss.nttdata.com
* Ensure that slots are zeroed before useDaniel Gustafsson2021-10-26
| | | | | | | | | | | | | The previous coding relied on the memory for the slots being zeroed elsewhere, which while it was true in this case is not an contract which is guaranteed to hold. Explicitly clear the tts_isnull array to ensure that the slots are filled from a known state. Backpatch to v14 where the catalog multi-inserts were introduced. Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAJ7c6TP0AowkUgNL6zcAK-s5HYsVHVBRWfu69FRubPpfwZGM9A@mail.gmail.com Backpatch-through: 14
* Reject huge_pages=on if shared_memory_type=sysv.Thomas Munro2021-10-26
| | | | | | | | | It doesn't work (it could, but hasn't been implemented). Back-patch to 12, where shared_memory_type arrived. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/163271880203.22789.1125998876173795966@wrigleys.postgresql.org
* Initialize variable to placate compiler.Robert Haas2021-10-25
| | | | | | Per Nathan Bossart. Discussion: http://postgr.es/m/FECEE7FC-CB74-45A9-BB24-89FEE52A9585@amazon.com
* Report progress of startup operations that take a long time.Robert Haas2021-10-25
| | | | | | | | | | | | | | | | | | | | | | | | Users sometimes get concerned whe they start the server and it emits a few messages and then doesn't emit any more messages for a long time. Generally, what's happening is either that the system is taking a long time to apply WAL, or it's taking a long time to reset unlogged relations, or it's taking a long time to fsync the data directory, but it's not easy to tell which is the case. To fix that, add a new 'log_startup_progress_interval' setting, by default 10s. When an operation that is known to be potentially long-running takes more than this amount of time, we'll log a status update each time this interval elapses. To avoid undesirable log chatter, don't log anything about WAL replay when in standby mode. Nitin Jadhav and Robert Haas, reviewed by Amul Sul, Bharath Rupireddy, Justin Pryzby, Michael Paquier, and Álvaro Herrera. Discussion: https://postgr.es/m/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com Discussion: https://postgr.es/m/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com
* Add enable_timeout_every() to fire the same timeout repeatedly.Robert Haas2021-10-25
| | | | | | | | | | enable_timeout_at() and enable_timeout_after() can still be used when you want to fire a timeout just once. Patch by me, per a suggestion from Tom Lane. Discussion: http://postgr.es/m/2992585.1632938816@sss.pgh.pa.us Discussion: http://postgr.es/m/CA+TgmoYqSF5sCNrgTom9r3Nh=at4WmYFD=gsV-omStZ60S0ZUQ@mail.gmail.com
* Remove useless code from CreateReplicationSlot.Robert Haas2021-10-25
| | | | | | | | | | | | | | | | | | | | | According to the comments, we initialize sendTimeLineIsHistoric and sendTimeLine here for the benefit of WalSndSegmentOpen. However, the only way that can happen is if logical_read_xlog_page calls WALRead. And since logical_read_xlog_page initializes the same global variables internally, we don't need to also do it here. These initializations have been here since replication slots were introduced in commit 858ec11858a914d4c380971985709b6d6b7dd6fc. They were certainly useless at that time, too, because logical decoding didn't yet exist then, and physical replication doesn't examine any WAL at the time of slot creation. I haven't checked all the intermediate versions, but I suspect there's no point at which this code ever did anything useful. To reduce future confusion, remove the code. Since there's no functional defect, no back-patch. Discussion: http://postgr.es/m/CA+TgmobSWzacEs+r6C-7DrOPDHoDar4i9gzxB3SCBr5qjnLmVQ@mail.gmail.com
* StartupXLOG: Don't repeatedly disable/enable local xlog insertion.Robert Haas2021-10-25
| | | | | | | | | | | | | | | | | | | All the code that runs in the startup process to write WAL records before that's allowed generally is now consecutive, so there's no reason to shut the facility to write WAL locally off and then turn it on again three times in a row. Unfortunately, this requires a slight kludge in the checkpointer, which needs to separately enable writing WAL in order to write the checkpoint record. Because that code might run in the same process as StartupXLOG() if we are in single-user mode, we must save/restore the state of the LocalXLogInsertAllowed flag. Hopefully, we'll be able to eliminate this wart in further refactoring, but it's not too bad anyway. Amul Sul, with modifications by me. Discussion: http://postgr.es/m/CAAJ_b97fysj6sRSQEfOHj-y8Jfd5uPqOgO74qast89B4WfD+TA@mail.gmail.com
* StartupXLOG: Call CleanupAfterArchiveRecovery after XLogReportParameters.Robert Haas2021-10-25
| | | | | | | | | | | | | | | This does a better job grouping related operations together, since all of the WAL records that we need to write prior to allowing WAL writes generally and written by a single uninterrupted stretch of code. Since CleanupAfterArchiveRecovery() just (1) runs recovery_end_command, (2) removes non-parent xlog files, and (3) archives any final partial segment, this should be safe, because all of those things are pretty much unrelated to the WAL record written by XLogReportParameters(). Amul Sul, per a suggestion from me Discussion: http://postgr.es/m/CAAJ_b97fysj6sRSQEfOHj-y8Jfd5uPqOgO74qast89B4WfD+TA@mail.gmail.com
* Clarify the logic in a few places in the new balanced merge code.Heikki Linnakangas2021-10-25
| | | | | | | | | | | | | | | | | | | | | | | In selectnewtape(), use 'nOutputTapes' rather than 'nOutputRuns' in the check for whether to start a new tape or to append a new run to an existing tape. Until 'maxTapes' is reached, nOutputTapes is always equal to nOutputRuns, so it doesn't change the logic, but it seems more logical to compare # of tapes with # of tapes. Also, currently maxTapes is never modified after the merging begins, but written this way, the code would still work if it was. (Although the nOutputRuns == nOutputTapes assertion would need to be removed and using nOutputRuns % nOutputTapes to distribute the runs evenly across the tapes wouldn't do a good job anymore). Similarly in mergeruns(), change to USEMEM(state->tape_buffer_mem) to account for the memory used for tape buffers. It's equal to availMem currently, but tape_buffer_mem is more direct and future-proof. For example, if we changed the logic to only allocate half of the remaining memory to tape buffers, USEMEM(state->tape_buffer_mem) would still be correct. Coverity complained about these. Hopefully this patch helps it to understand the logic better. Thanks to Tom Lane for initial analysis.
* Add replication command READ_REPLICATION_SLOTMichael Paquier2021-10-25
| | | | | | | | | | | | | | | | | The command is supported for physical slots for now, and returns the type of slot, its restart_lsn and its restart_tli. This will be useful for an upcoming patch related to pg_receivewal, to allow the tool to be able to stream from the position of a slot, rather than the last WAL position flushed by the backend (as reported by IDENTIFY_SYSTEM) if the archive directory is found as empty, which would be an advantage in the case of switching to a different archive locations with the same slot used to avoid holes in WAL segment archives. Author: Ronan Dunklau Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy Discussion: https://postgr.es/m/18708360.4lzOvYHigE@aivenronan
* Fix CREATE INDEX CONCURRENTLY for the newest prepared transactions.Noah Misch2021-10-23
| | | | | | | | | | | | | | | | | | | The purpose of commit 8a54e12a38d1545d249f1402f66c8cde2837d97c was to fix this, and it sufficed when the PREPARE TRANSACTION completed before the CIC looked for lock conflicts. Otherwise, things still broke. As before, in a cluster having used CIC while having enabled prepared transactions, queries that use the resulting index can silently fail to find rows. It may be necessary to reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices. Fix this for future index builds by making CIC wait for arbitrarily-recent prepared transactions and for ordinary transactions that may yet PREPARE TRANSACTION. As part of that, have PREPARE TRANSACTION transfer locks to its dummy PGPROC before it calls ProcArrayClearTransaction(). Back-patch to 9.6 (all supported versions). Andrey Borodin, reviewed (in earlier versions) by Andres Freund. Discussion: https://postgr.es/m/01824242-AA92-4FE9-9BA7-AEBAFFEA3D0C@yandex-team.ru
* Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.Noah Misch2021-10-23
| | | | | | | | | | | | | | | | | CIC and REINDEX CONCURRENTLY assume backends see their catalog changes no later than each backend's next transaction start. That failed to hold when a backend absorbed a relevant invalidation in the middle of running RelationBuildDesc() on the CIC index. Queries that use the resulting index can silently fail to find rows. Fix this for future index builds by making RelationBuildDesc() loop until it finishes without accepting a relevant invalidation. It may be necessary to reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices. Back-patch to 9.6 (all supported versions). Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres Freund. Discussion: https://postgr.es/m/20210730022548.GA1940096@gust.leadboat.com
* Remove unused wait events.Amit Kapila2021-10-21
| | | | | | | | | Commit 464824323e introduced the wait events which were neither used by that commit nor by follow-up commits for that work. Author: Masahiro Ikeda Backpatch-through: 14, where it was introduced Discussion: https://postgr.es/m/ff077840-3ab2-04dd-bbe4-4f5dfd2ad481@oss.nttdata.com
* Fix corruption of pg_shdepend when copying deps from template databaseMichael Paquier2021-10-21
| | | | | | | | | | | | | | | | Using for a new database a template database with shared dependencies that need to be copied over was causing a corruption of pg_shdepend because of an off-by-one computation error of the index number used for the values inserted with a slot. Issue introduced by e3931d0. Monitoring the rest of the code, there are no similar mistakes. Reported-by: Sven Klemm Author: Aleksander Alekseev Reviewed-by: Daniel Gustafsson, Michael Paquier Discussion: https://postgr.es/m/CAJ7c6TP0AowkUgNL6zcAK-s5HYsVHVBRWfu69FRubPpfwZGM9A@mail.gmail.com Backpatch-through: 14
* Ensure correct lock level is used in ALTER ... RENAMEAlvaro Herrera2021-10-19
| | | | | | | | | | | | | | | | Commit 1b5d797cd4f7 intended to relax the lock level used to rename indexes, but inadvertently allowed *any* relation to be renamed with a lowered lock level, as long as the command is spelled ALTER INDEX. That's undesirable for other relation types, so retry the operation with the higher lock if the relation turns out not to be an index. After this fix, ALTER INDEX <sometable> RENAME will require access exclusive lock, which it didn't before. Author: Nathan Bossart <bossartn@amazon.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Onder Kalaci <onderk@microsoft.com> Discussion: https://postgr.es/m/PH0PR21MB1328189E2821CDEC646F8178D8AE9@PH0PR21MB1328.namprd21.prod.outlook.com
* Fix assignment to array of domain over composite.Tom Lane2021-10-19
| | | | | | | | | | | | | | | An update such as "UPDATE ... SET fld[n].subfld = whatever" failed if the array elements were domains rather than plain composites. That's because isAssignmentIndirectionExpr() failed to cope with the CoerceToDomain node that would appear in the expression tree in this case. The result would typically be a crash, and even if we accidentally didn't crash, we'd not correctly preserve other fields of the same array element. Per report from Onder Kalaci. Back-patch to v11 where arrays of domains came in. Discussion: https://postgr.es/m/PH0PR21MB132823A46AA36F0685B7A29AD8BD9@PH0PR21MB1328.namprd21.prod.outlook.com
* Remove bogus assertion in transformExpressionList().Tom Lane2021-10-19
| | | | | | | | | | | | | | | | | I think when I added this assertion (in commit 8f889b108), I was only thinking of the use of transformExpressionList at top level of INSERT and VALUES. But it's also called by transformRowExpr(), which can certainly occur in an UPDATE targetlist, so it's inappropriate to suppose that p_multiassign_exprs must be empty. Besides, since the input is not expected to contain ResTargets, there's no reason it should contain MultiAssignRefs either. Hence this code need not be concerned about the state of p_multiassign_exprs, and we should just drop the assertion. Per bug #17236 from ocean_li_996. It's been wrong for years, so back-patch to all supported branches. Discussion: https://postgr.es/m/17236-3210de9bcba1d7ca@postgresql.org
* Block ALTER INDEX/TABLE index_name ALTER COLUMN colname SET (options)Michael Paquier2021-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The grammar of this command run on indexes with column names has always been authorized by the parser, and it has never been documented. Since 911e702, it is possible to define opclass parameters as of CREATE INDEX, which actually broke the old case of ALTER INDEX/TABLE where relation-level parameters n_distinct and n_distinct_inherited could be defined for an index (see 76a47c0 and its thread where this point has been touched, still remained unused). Attempting to do that in v13~ would cause the index to become unusable, as there is a new dedicated code path to load opclass parameters instead of the relation-level ones previously available. Note that it is possible to fix things with a manual catalog update to bring the relation back online. This commit disables this command for now as the use of column names for indexes does not make sense anyway, particularly when it comes to index expressions where names are automatically computed. One way to properly support this case properly in the future would be to use column numbers when it comes to indexes, in the same way as ALTER INDEX .. ALTER COLUMN .. SET STATISTICS. Partitioned indexes were already blocked, but not indexes. Some tests are added for both cases. There was some code in ANALYZE to enforce n_distinct to be used for an index expression if the parameter was defined, but just remove it for now until/if there is support for this (note that index-level parameters never had support in pg_dump either, previously), so this was just dead code. Reported-by: Matthijs van der Vleuten Author: Nathan Bossart, Michael Paquier Reviewed-by: Vik Fearing, Dilip Kumar Discussion: https://postgr.es/m/17220-15d684c6c2171a83@postgresql.org Backpatch-through: 13
* Invalidate partitions of table being attached/detachedAlvaro Herrera2021-10-18
| | | | | | | | | | | | | | | | Failing to do that, any direct inserts/updates of those partitions would fail to enforce the correct constraint, that is, one that considers the new partition constraint of their parent table. Backpatch to 10. Reported by: Hou Zhijie <houzj.fnst@fujitsu.com> Author: Amit Langote <amitlangote09@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com> Discussion: https://postgr.es/m/OS3PR01MB5718DA1C4609A25186D1FBF194089%40OS3PR01MB5718.jpnprd01.prod.outlook.com
* Fix parallel sort, broken by the balanced merge patch.Heikki Linnakangas2021-10-18
| | | | | | | | | | | The code for initializing the tapes on each merge iteration was skipped in a parallel worker. I put the !WORKER(state) check in wrong place while rebasing the patch. That caused failures in the index build in 'multiple-row-versions' isolation test, in multiple buildfarm members. On my laptop it was easier to reproduce by building an index on a larger table, so that you got a parallel sort more reliably.
* Fix duplicate typedef LogicalTape.Heikki Linnakangas2021-10-18
| | | | To make buildfarm member locust happy.
* Fix format modifier used in elog.Heikki Linnakangas2021-10-18
| | | | | | | | The previous commit 65014000b3 changed the variable passed to elog from an int64 to a size_t variable, but neglected to change the modifier in the format string accordingly. Per failure on buildfarm member lapwing.
* Replace polyphase merge algorithm with a simple balanced k-way merge.Heikki Linnakangas2021-10-18
| | | | | | | | | | | | | | The advantage of polyphase merge is that it can reuse the input tapes as output tapes efficiently, but that is irrelevant on modern hardware, when we can easily emulate any number of tape drives. The number of input tapes we can/should use during merging is limited by work_mem, but output tapes that we are not currently writing to only cost a little bit of memory, so there is no need to skimp on them. This makes sorts that need multiple merge passes faster. Discussion: https://www.postgresql.org/message-id/420a0ec7-602c-d406-1e75-1ef7ddc58d83%40iki.fi Reviewed-by: Peter Geoghegan, Zhihong Yu, John Naylor
* Refactor LogicalTapeSet/LogicalTape interface.Heikki Linnakangas2021-10-18
| | | | | | | | | | | | All the tape functions, like LogicalTapeRead and LogicalTapeWrite, now take a LogicalTape as argument, instead of LogicalTapeSet+tape number. You can create any number of LogicalTapes in a single LogicalTapeSet, and you don't need to decide the number upfront, when you create the tape set. This makes the tape management in hash agg spilling in nodeAgg.c simpler. Discussion: https://www.postgresql.org/message-id/420a0ec7-602c-d406-1e75-1ef7ddc58d83%40iki.fi Reviewed-by: Peter Geoghegan, Zhihong Yu, John Naylor
* Reset properly snapshot export state during transaction abortMichael Paquier2021-10-18
| | | | | | | | | | | | | | | | | | | | | | | | During a replication slot creation, an ERROR generated in the same transaction as the one creating a to-be-exported snapshot would have left the backend in an inconsistent state, as the associated static export snapshot state was not being reset on transaction abort, but only on the follow-up command received by the WAL sender that created this snapshot on replication slot creation. This would trigger inconsistency failures if this session tried to export again a snapshot, like during the creation of a replication slot. Note that a snapshot export cannot happen in a transaction block, so there is no need to worry resetting this state for subtransaction aborts. Also, this inconsistent state would very unlikely show up to users. For example, one case where this could happen is an out-of-memory error when building the initial snapshot to-be-exported. Dilip found this problem while poking at a different patch, that caused an error in this code path for reasons unrelated to HEAD. Author: Dilip Kumar Reviewed-by: Michael Paquier, Zhihong Yu Discussion: https://postgr.es/m/CAFiTN-s0zA1Kj0ozGHwkYkHwa5U0zUE94RSc_g81WrpcETB5=w@mail.gmail.com Backpatch-through: 9.6
* Remove obsolete nbtree deduplication comments.Peter Geoghegan2021-10-15
| | | | Follow up to commit 2903f140.
* shm_mq: Update mq_bytes_written less often.Robert Haas2021-10-14
| | | | | | | | | | | | | | Do not update shm_mq's mq_bytes_written until we have written an amount of data greater than 1/4th of the ring size, unless the caller of shm_mq_send(v) requests a flush at the end of the message. This reduces the number of calls to SetLatch(), and also the number of CPU cache misses, considerably, and thus makes shm_mq significantly faster. Dilip Kumar, reviewed by Zhihong Yu and Tomas Vondra. Some minor cosmetic changes by me. Discussion: http://postgr.es/m/CAFiTN-tVXqn_OG7tHNeSkBbN+iiCZTiQ83uakax43y1sQb2OBA@mail.gmail.com