aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Send NOTIFY signals during CommitTransaction.Tom Lane2021-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Formerly, we sent signals for outgoing NOTIFY messages within ProcessCompletedNotifies, which was also responsible for sending relevant ones of those messages to our connected client. It therefore had to run during the main-loop processing that occurs just before going idle. This arrangement had two big disadvantages: * Now that procedures allow intra-command COMMITs, it would be useful to send NOTIFYs to other sessions immediately at COMMIT (though, for reasons of wire-protocol stability, we still shouldn't forward them to our client until end of command). * Background processes such as replication workers would not send NOTIFYs at all, since they never execute the client communication loop. We've had requests to allow triggers running in replication workers to send NOTIFYs, so that's a problem. To fix these things, move transmission of outgoing NOTIFY signals into AtCommit_Notify, where it will happen during CommitTransaction. Also move the possible call of asyncQueueAdvanceTail there, to ensure we don't bloat the async SLRU if a background worker sends many NOTIFYs with no one listening. We can also drop the call of asyncQueueReadAllNotifications, allowing ProcessCompletedNotifies to go away entirely. That's because commit 790026972 added a call of ProcessNotifyInterrupt adjacent to PostgresMain's call of ProcessCompletedNotifies, and that does its own call of asyncQueueReadAllNotifications, meaning that we were uselessly doing two such calls (inside two separate transactions) whenever inbound notify signals coincided with an outbound notify. We need only set notifyInterruptPending to ensure that ProcessNotifyInterrupt runs, and we're done. The existing documentation suggests that custom background workers should call ProcessCompletedNotifies if they want to send NOTIFY messages. To avoid an ABI break in the back branches, reduce it to an empty routine rather than removing it entirely. Removal will occur in v15. Although the problems mentioned above have existed for awhile, I don't feel comfortable back-patching this any further than v13. There was quite a bit of churn in adjacent code between 12 and 13. At minimum we'd have to also backpatch 51004c717, and a good deal of other adjustment would also be needed, so the benefit-to-risk ratio doesn't look attractive. Per bug #15293 from Michael Powers (and similar gripes from others). Artur Zakirov and Tom Lane Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
* Clean up some code using "(expr) ? true : false"Michael Paquier2021-09-08
| | | | | | | | | | All the code paths simplified here were already using a boolean or used an expression that led to zero or one, making the extra bits unnecessary. Author: Justin Pryzby Reviewed-by: Tom Lane, Michael Paquier, Peter Smith Discussion: https://postgr.es/m/20210428182936.GE27406@telsasoft.com
* Further portability tweaks for float4/float8 hash functions.Tom Lane2021-09-04
| | | | | | | | | | Attempting to make hashfloat4() look as much as possible like hashfloat8(), I'd figured I could replace NaNs with get_float4_nan() before widening to float8. However, results from protosciurus and topminnow show that on some platforms that produces a different bit-pattern from get_float8_nan(), breaking the intent of ce773f230. Rearrange so that we use the result of get_float8_nan() for all NaN cases. As before, back-patch.
* Revert "Avoid creating archive status ".ready" files too early"Alvaro Herrera2021-09-04
| | | | | | | | | | This reverts commit 515e3d84a0b5 and equivalent commits in back branches. This solution to the problem has a number of problems, so we'll try again with a different approach. Per note from Andres Freund Discussion: https://postgr.es/m/20210831042949.52eqp5xwbxgrfank@alap3.anarazel.de
* Fix float4/float8 hash functions to produce uniform results for NaNs.Tom Lane2021-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IEEE 754 standard allows a wide variety of bit patterns for NaNs, of which at least two ("NaN" and "-NaN") are pretty easy to produce from SQL on most machines. This is problematic because our btree comparison functions deem all NaNs to be equal, but our float hash functions know nothing about NaNs and will happily produce varying hash codes for them. That causes unexpected results from queries that hash a column containing different NaN values. It could also produce unexpected lookup failures when using a hash index on a float column, i.e. "WHERE x = 'NaN'" will not find all the rows it should. To fix, special-case NaN in the float hash functions, not too much unlike the existing special case that forces zero and minus zero to hash the same. I arranged for the most vanilla sort of NaN (that coming from the C99 NAN constant) to still have the same hash code as before, to reduce the risk to existing hash indexes. I dithered about whether to back-patch this into stable branches, but ultimately decided to do so. It's a clear improvement for queries that hash internally. If there is anybody who has -NaN in a hash index, they'd be well advised to re-index after applying this patch ... but the misbehavior if they don't will not be much worse than the misbehavior they had before. Per bug #17172 from Ma Liangzhu. Discussion: https://postgr.es/m/17172-7505bea9e04e230f@postgresql.org
* Fix incorrect format placeholdersPeter Eisentraut2021-09-01
|
* VACUUM VERBOSE: Don't report "pages removed".Peter Geoghegan2021-08-31
| | | | | | | | | | | | | | It doesn't make any sense to report this information, since VACUUM VERBOSE reports on heap relation truncation directly. This was an oversight in commit 7ab96cf6, which made VACUUM VERBOSE output a little more consistent with nearby autovacuum-specific log output. Adjust comments that describe how this is supposed to work in passing. Also bring truncation-related VACUUM VERBOSE output in line with the convention established for VACUUM VERBOSE output by commit f4f4a649. Author: Peter Geoghegan <pg@bowt.ie> Backpatch: 14-, where VACUUM VERBOSE's output changed.
* Remove obsolete nbtree relation extension comment.Peter Geoghegan2021-08-31
| | | | | | Commit 0d1fe9f7 improved the approach that vacuumlazy.c takes when it encounters an empty heap page. It no acquires the relation extension lock.
* vacuumlazy.c: Correct prune state comment.Peter Geoghegan2021-08-31
| | | | Oversight in commit 7ab96cf6b3.
* Remove unneeded old_rel_pages VACUUM state field.Peter Geoghegan2021-08-31
| | | | | | | | | | The field hasn't been used since commit 3d351d91, which redefined pg_class.reltuples to be -1 before the first VACUUM or ANALYZE. Also rename a local variable of the same name ("old_rel_pages"). This is used by relation truncation to represent the original relation size at the start of the ongoing VACUUM operation. Rename it to orig_rel_pages, since that's a lot clearer. (This name matches similar nearby code.)
* Report tuple address in data-corruption error messageAlvaro Herrera2021-08-30
| | | | | | | | | | | Most data-corruption reports mention the location of the problem, but this one failed to. Add it. Backpatch all the way back. In 12 and older, also assign the ERRCODE_DATA_CORRUPTED error code as was done in commit fd6ec93bf890 for 13 and later. Discussion: https://postgr.es/m/202108191637.oqyzrdtnheir@alvherre.pgsql
* Count SP-GiST index scans in pg_stat statistics.Tom Lane2021-08-27
| | | | | | | | | | | | | | | | | | | Somehow, spgist overlooked the need to call pgstat_count_index_scan(). Hence, pg_stat_all_indexes.idx_scan and equivalent columns never became nonzero for an SP-GiST index, although the related per-tuple counters worked fine. This fix works a bit differently from other index AMs, in that the counter increment occurs in spgrescan not spggettuple/spggetbitmap. It looks like this won't make the user-visible semantics noticeably different, so I won't go to the trouble of introducing an is-this- the-first-call flag just to make the counter bumps happen in the same places. Per bug #17163 from Christian Quest. Back-patch to all supported versions. Discussion: https://postgr.es/m/17163-b8c5cc88322a5e92@postgresql.org
* track_io_timing logging: Don't special case 0 ms.Peter Geoghegan2021-08-27
| | | | | | | | | | | | | | | Adjust track_io_timing related logging code added by commit 94d13d474d. Make it consistent with other nearby autovacuum and autoanalyze logging code by removing logic that suppressed zero millisecond outputs. log_autovacuum_min_duration log output now reliably shows "read:" and "write:" millisecond-based values in its report (when track_io_timing is enabled). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Stephen Frost <sfrost@snowman.net> Discussion: https://postgr.es/m/CAH2-WznW0FNxSVQMSRazAMYNfZ6DR_gr5WE78hc6E1CBkkJpzw@mail.gmail.com Backpatch: 14-, where the track_io_timing logging was introduced.
* Reorder log_autovacuum_min_duration log output.Peter Geoghegan2021-08-27
| | | | | | | | | | This order seems more natural. It starts with details that are particular to heap and index data structures, and ends with system-level costs incurred during the autovacuum worker's VACUUM/ANALYZE operation. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkzxK6ahA9xxsOftRtBX_R0swuHZsvo4QUbak1Bz7hb7Q@mail.gmail.com Backpatch: 14-, which enhanced the log output in various ways.
* vacuumlazy.c: Remove unnecessary parentheses.Peter Geoghegan2021-08-27
| | | | | | This was arguably a minor oversight in commit b4af70cb, which cleaned up the function signatures of functions that modify IndexBulkDeleteResult variables.
* Fix broken snapshot handling in parallel workers.Robert Haas2021-08-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | Pengchengliu reported an assertion failure in a parallel woker while performing a parallel scan using an overflowed snapshot. The proximate cause is that TransactionXmin was set to an incorrect value. The underlying cause is incorrect snapshot handling in parallel.c. In particular, InitializeParallelDSM() was unconditionally calling GetTransactionSnapshot(), because I (rhaas) mistakenly thought that was always retrieving an existing snapshot whereas, at isolation levels less than REPEATABLE READ, it's actually taking a new one. So instead do this only at higher isolation levels where there actually is a single snapshot for the whole transaction. By itself, this is not a sufficient fix, because we still need to guarantee that TransactionXmin gets set properly in the workers. The easiest way to do that seems to be to install the leader's active snapshot as the transaction snapshot if the leader did not serialize a transaction snapshot. This doesn't affect the results of future GetTrasnactionSnapshot() calls since those have to take a new snapshot anyway; what we care about is the side effect of setting TransactionXmin. Report by Pengchengliu. Patch by Greg Nancarrow, except for some comment text which I supplied. Discussion: https://postgr.es/m/002f01d748ac$eaa781a0$bff684e0$@tju.edu.cn
* Avoid creating archive status ".ready" files too earlyAlvaro Herrera2021-08-23
| | | | | | | | | | | | | | | | | | | | | | | WAL records may span multiple segments, but XLogWrite() does not wait for the entire record to be written out to disk before creating archive status files. Instead, as soon as the last WAL page of the segment is written, the archive status file is created, and the archiver may process it. If PostgreSQL crashes before it is able to write and flush the rest of the record (in the next WAL segment), the wrong version of the first segment file lingers in the archive, which causes operations such as point-in-time restores to fail. To fix this, keep track of records that span across segments and ensure that segments are only marked ready-for-archival once such records have been completely written to disk. This has always been wrong, so backpatch all the way back. Author: Nathan Bossart <bossartn@amazon.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Ryo Matsumura <matsumura.ryo@fujitsu.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/CBDDFA01-6E40-46BB-9F98-9340F4379505@amazon.com
* Revert analyze support for partitioned tablesAlvaro Herrera2021-08-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts the following commits: 1b5617eb844cd2470a334c1d2eec66cf9b39c41a Describe (auto-)analyze behavior for partitioned tables 0e69f705cc1a3df273b38c9883fb5765991e04fe Set pg_class.reltuples for partitioned tables 41badeaba8beee7648ebe7923a41c04f1f3cb302 Document ANALYZE storage parameters for partitioned tables 0827e8af70f4653ba17ed773f123a60eadd9f9c9 autovacuum: handle analyze for partitioned tables There are efficiency issues in this code when handling databases with large numbers of partitions, and it doesn't look like there isn't any trivial way to handle those. There are some other issues as well. It's now too late in the cycle for nontrivial fixes, so we'll have to let Postgres 14 users continue to manually deal with ANALYZE their partitioned tables, and hopefully we can fix the issues for Postgres 15. I kept [most of] be280cdad298 ("Don't reset relhasindex for partitioned tables on ANALYZE") because while we added it due to 0827e8af70f4, it is a good bugfix in its own right, since it affects manual analyze as well as autovacuum-induced analyze, and there's no reason to revert it. I retained the addition of relkind 'p' to tables included by pg_stat_user_tables, because reverting that would require a catversion bump. Also, in pg14 only, I keep a struct member that was added to PgStat_TabStatEntry to avoid breaking compatibility with existing stat files. Backpatch to 14. Discussion: https://postgr.es/m/20210722205458.f2bug3z6qzxzpx2s@alap3.anarazel.de
* Emit namespace in the post-copy errmsgDaniel Gustafsson2021-08-16
| | | | | | | | | | | | | | During a VACUUM or CLUSTER command, the initial output emits a fully qualified relation path with namespace. The post-action errmsg only emitted the relation name however, which may lead to hard to parse output when using multiple jobs with vacuumdb as the output from different jobs may be interleaved. Include the full path in the post-action errmsg to be consistent with the initial errmsg. Author: Mike Fiedler <miketheman@gmail.com> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://postgr.es/m/CAMerE0oz+8G-aORZL_BJcPxnBqewZAvND4bSUysjz+r-oT1BxQ@mail.gmail.com
* Refresh apply delay on reload of recovery_min_apply_delay at recoveryMichael Paquier2021-08-16
| | | | | | | | | | | | | | | | | This commit ensures that the wait interval in the replay delay loop waiting for an amount of time defined by recovery_min_apply_delay is correctly handled on reload, recalculating the delay if this GUC value is updated, based on the timestamp of the commit record being replayed. The previous behavior would be problematic for example with replay still waiting even if the delay got reduced or just cancelled. If the apply delay was increased to a larger value, the wait would have just respected the old value set, finishing earlier. Author: Soumyadeep Chakraborty, Ashwin Agrawal Reviewed-by: Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/CAE-ML+93zfr-HLN8OuxF0BjpWJ17O5dv1eMvSE5jsj9jpnAXZA@mail.gmail.com Backpatch-through: 9.6
* Fix grammar mistake in hash index READMEJohn Naylor2021-08-12
| | | | | | Dilip Kumar Discussion: https://www.postgresql.org/message-id/CAFiTN-tjZbuY6vy7kZZ6xO%2BD4mVcO5wOPB5KiwJ3AHhpytd8fg%40mail.gmail.com
* Avoid unnecessary shared invalidations in ROLLBACK PREPAREDMichael Paquier2021-08-12
| | | | | | | | | | The performance gain is minimal, but this makes the logic more consistent with AtEOXact_Inval(). No other invalidation is needed in this case as PREPARE takes already care of sending any local ones. Author: Liu Huailing Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/OSZPR01MB6215AA84D71EF2B3D354CF86BE139@OSZPR01MB6215.jpnprd01.prod.outlook.com
* Remove some unnecessary casts in format argumentsPeter Eisentraut2021-08-08
| | | | | | We can use %zd or %zu directly, no need to cast to int. Conversely, some code was casting away from int when it could be using %d directly.
* Message style improvementsPeter Eisentraut2021-08-07
|
* Make parallel worker shutdown complete entirely via before_shmem_exit().Andres Freund2021-08-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a step toward storing stats in dynamic shared memory. As dynamic shared memory segments are detached from just after before_shmem_exit() callbacks are processed, but before on_shmem_exit() callbacks are, no stats can be collected after before_shmem_exit() callbacks have been processed. Parallel worker shutdown can cause stats to be emitted during DSM detach callbacks, e.g. for SharedFileSet (which closes its files, which can causes fd.c to emit stats about temporary files). Therefore parallel worker shutdown needs to complete during the processing of before_shmem_exit callbacks. One might think this problem could instead be solved by carefully ordering the attaching to DSM segments, so that the pgstats segments get detached from later than the parallel query ones. That turns out to not work because the stats hash might need to grow which can cause new segments to be allocated, which then will be detached from earlier. There are two code changes: First, call ParallelWorkerShutdown() via before_shmem_exit. That's a good idea on its own, because other shutdown callbacks like ShutdownPostgres and ShutdownAuxiliaryProcess are called via before_*. Second, explicitly detach from the parallel query DSM segment, thereby ensuring all stats are emitted during ParallelWorkerShutdown(). There are nicer solutions to these problems, but it's not obvious which of those solutions is the correct one. As the shared memory stats work already is a huge amount of work... Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de Discussion: https://postgr.es/m/20210803023612.iziacxk5syn2r4ut@alap3.anarazel.de
* pgstat: split reporting/fetching of bgwriter and checkpointer stats.Andres Freund2021-08-04
| | | | | | | | | | | | | | | | These have been unrelated since bgwriter and checkpointer were split into two processes in 806a2aee379. As there several pending patches (shared memory stats, extending the set of tracked IO / buffer statistics) that are made a bit more awkward by the grouping, split them. Done separately to make reviewing easier. This does *not* change the contents of pg_stat_bgwriter or move fields out of bgwriter/checkpointer stats that arguably do not belong in either. However pgstat_fetch_global() was renamed and split into pgstat_fetch_stat_checkpointer() and pgstat_fetch_stat_bgwriter(). Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de
* Make vacuum_index_cleanup reloption RELOPT_TYPE_ENUM.Peter Geoghegan2021-08-03
| | | | | | | | | | Oversight in commit 3499df0d, which generalized the reloption as a way of giving users a way to consistently avoid VACUUM's index bypass optimization. Per off-list report from Nikolay Shaplov. Backpatch: 14-, where index cleanup reloption was extended.
* Further simplify a bit of logic in StartupXLOG().Thomas Munro2021-08-03
| | | | | | | | Commit 7ff23c6d277d1d90478a51f0dd81414d343f3850 left us with two identical cases. Collapse them. Author: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGJ8NRsqgkZEnsnRc2MFROBV-jCnacbYvtpptK2A9YYp9Q%40mail.gmail.com
* Run checkpointer and bgwriter in crash recovery.Thomas Munro2021-08-02
| | | | | | | | | | | | | Start up the checkpointer and bgwriter during crash recovery (except in --single mode), as we do for replication. This wasn't done back in commit cdd46c76 out of caution. Now it seems like a better idea to make the environment as similar as possible in both cases. There may also be some performance advantages. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> Discussion: https://postgr.es/m/CA%2BhUKGJ8NRsqgkZEnsnRc2MFROBV-jCnacbYvtpptK2A9YYp9Q%40mail.gmail.com
* Move InRecovery and standbyState global vars to xlogutils.c.Heikki Linnakangas2021-07-31
| | | | | | | | | | They are used in code that runs both during normal operation and during WAL replay, and needs to behave differently during replay. Move them to xlogutils.c, because that's where we have other helper functions used by redo routines. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/b3b71061-4919-e882-4857-27e370ab134a%40iki.fi
* Extract code to describe recovery stop reason to a function.Heikki Linnakangas2021-07-31
| | | | | | | StartupXLOG() is very long, this makes it a little bit more readable. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/b3b71061-4919-e882-4857-27e370ab134a%40iki.fi
* Remove unnecessary 'restoredFromArchive' global variable.Heikki Linnakangas2021-07-31
| | | | | | | | It might've been useful for debugging purposes, but meh. There's 'readSource' which does almost the same thing. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/b3b71061-4919-e882-4857-27e370ab134a%40iki.fi
* Don't use O_SYNC or similar when opening signal file to fsync it.Heikki Linnakangas2021-07-31
| | | | | | | | | No need to use get_sync_bit() when we're calling pg_fsync() on the file. We're not writing to the files, so it doesn't make any difference in practice, but seems less surprising this way. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/b3b71061-4919-e882-4857-27e370ab134a%40iki.fi
* Remove unnecessary call to ReadCheckpointRecord().Robert Haas2021-07-30
| | | | | | | | | | | | | | | | It should always be the case that the last checkpoint record is still readable, because otherwise, a crash would leave us in a situation from which we can't recover. Therefore the test removed by this patch should always succeed. For it to fail, either there has to be a serious bug in the code someplace, or the user has to be manually modifying pg_wal while crash recovery is running. If it's the first one, we should fix the bug. If it's the second one, they should stop, or anyway they're doing so at their own risk. In neither case does a full checkpoint instead of an end-of-recovery record seem like a clear winner. Furthermore, rarely-taken code paths are particularly vulnerable to bugs, so let's simplify by getting rid of this one. Discussion: http://postgr.es/m/CA+TgmoYmw==TOJ6EzYb_vcjyS09NkzrVKSyBKUUyo1zBEaJASA@mail.gmail.com
* Update obsolete comment that still referred to CheckpointLockHeikki Linnakangas2021-07-30
| | | | | | CheckpointLock was removed in commit d18e75664a, and commit ce197e91d0 updated a leftover comment in CreateCheckPoint, but there was another copy of it in CreateRestartPoint still.
* Close yet another race condition in replication slot test codeAlvaro Herrera2021-07-29
| | | | | | | | | | | | | | | Buildfarm shows that this test has a further failure mode when a checkpoint starts earlier than expected, so we detect a "checkpoint completed" line that's not the one we want. Change the config to try and prevent this. Per buildfarm While at it, update one comment that was forgotten in commit d18e75664a2f. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20210729.162038.534808353849568395.horikyota.ntt@gmail.com
* Update minimum recovery point on truncation during WAL replay of abort record.Fujii Masao2021-07-29
| | | | | | | | | | | | | | | | If a file is truncated, we must update minRecoveryPoint. Once a file is truncated, there's no going back; it would not be safe to stop recovery at a point earlier than that anymore. Commit 7bffc9b7bf changed xact_redo_commit() so that it updates minRecoveryPoint on truncation, but forgot to change xact_redo_abort(). Back-patch to all supported versions. Reported-by: mengjuan.cmj@alibaba-inc.com Author: Fujii Masao Reviewed-by: Heikki Linnakangas Discussion: https://postgr.es/m/b029fce3-4fac-4265-968e-16f36ff4d075.mengjuan.cmj@alibaba-inc.com
* Make XLOG_FPI_FOR_HINT records honor full_page_writes setting.Fujii Masao2021-07-21
| | | | | | | | | | | | | | | | Commit 2c03216d83 changed XLOG_FPI_FOR_HINT records so that they always included full-page images even when full_page_writes was disabled. However, in this setting, they don't need to do that because hint bit updates don't need to be protected from torn writes. Therefore, this commit makes XLOG_FPI_FOR_HINT records honor full_page_writes setting. That is, XLOG_FPI_FOR_HINT records may include no full-page images if full_page_writes is disabled, and WAL replay of them does nothing. Reported-by: Zhang Wenjie Author: Kyotaro Horiguchi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/tencent_60F11973A111EED97A8596FFECC4A91ED405@qq.com
* Advance old-segment horizon properly after slot invalidationAlvaro Herrera2021-07-16
| | | | | | | | | | | | | | | | When some slots are invalidated due to the max_slot_wal_keep_size limit, the old segment horizon should move forward to stay within the limit. However, in commit c6550776394e we forgot to call KeepLogSeg again to recompute the horizon after invalidating replication slots. In cases where other slots remained, the limits would be recomputed eventually for other reasons, but if all slots were invalidated, the limits would not move at all afterwards. Repair. Backpatch to 13 where the feature was introduced. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reported-by: Marcin Krupowicz <mk@071.ovh> Discussion: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org
* Replace explicit PIN entries in pg_depend with an OID range test.Tom Lane2021-07-15
| | | | | | | | | | | | | | | | As of v14, pg_depend contains almost 7000 "pin" entries recording the OIDs of built-in objects. This is a fair amount of bloat for every database, and it adds time to pg_depend lookups as well as initdb. We can get rid of all of those entries in favor of an OID range check, i.e. "OIDs below FirstUnpinnedObjectId are pinned". (template1 and the public schema are exceptions. Those exceptions are now wired into IsPinnedObject() instead of initdb's code for filling pg_depend, but it's the same amount of cruft either way.) The contents of pg_shdepend are modified likewise. Discussion: https://postgr.es/m/3737988.1618451008@sss.pgh.pa.us
* Add support for prepared transactions to built-in logical replication.Amit Kapila2021-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To add support for streaming transactions at prepare time into the built-in logical replication, we need to do the following things: * Modify the output plugin (pgoutput) to implement the new two-phase API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle two-phase transactions by replaying them on prepare. * Add a new SUBSCRIPTION option "two_phase" to allow users to enable two-phase transactions. We enable the two_phase once the initial data sync is over. We however must explicitly disable replication of two-phase transactions during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover, we don't have a replication connection open so we don't know where to send the data anyway. The streaming option is not allowed with this new two_phase option. This can be done as a separate patch. We don't allow to toggle two_phase option of a subscription because it can lead to an inconsistent replica. For the same reason, we don't allow to refresh the publication once the two_phase is enabled for a subscription unless copy_data option is false. Author: Peter Smith, Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Sawada Masahiko, Vignesh C, Dilip Kumar, Takamichi Osumi, Greg Nancarrow Tested-By: Haiying Tang Discussion: https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru Discussion: https://postgr.es/m/CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com
* Replace RelationOpenSmgr() with RelationGetSmgr().Tom Lane2021-07-12
| | | | | | | | | | | | | | | | | | | | The idea behind this patch is to design out bugs like the one fixed by commit 9d523119f. Previously, once one did RelationOpenSmgr(rel), it was considered okay to access rel->rd_smgr directly for some not-very-clear interval. But since that pointer will be cleared by relcache flushes, we had bugs arising from overreliance on a previous RelationOpenSmgr call still being effective. Now, very little code except that in rel.h and relcache.c should ever touch the rd_smgr field directly. The normal coding rule is to use RelationGetSmgr(rel) and not expect the result to be valid for longer than one smgr function call. There are a couple of places where using the function every single time seemed like overkill, but they are now annotated with large warning comments. Amul Sul, after an idea of mine. Discussion: https://postgr.es/m/CANiYTQsU7yMFpQYnv=BrcRVqK_3U3mtAzAsJCaqtzsDHfsUbdQ@mail.gmail.com
* Remove dead assignment to local variable.Heikki Linnakangas2021-07-12
| | | | | | | | | | | | This should have been removed in commit 7e30c186da, which split the loop into two. Only the first loop uses the 'from' variable; updating it in the second loop is bogus. It was never read after the first loop, so this was harmless and surely optimized away by the compiler, but let's be tidy. Backpatch to all supported versions. Author: Ranier Vilela Discussion: https://www.postgresql.org/message-id/CAEudQAoWq%2BAL3BnELHu7gms2GN07k-np6yLbukGaxJ1vY-zeiQ%40mail.gmail.com
* Add forgotten LSN_FORMAT_ARGS() in xlogreader.cMichael Paquier2021-07-09
| | | | | These should have been part of 4035cd5, that introduced LZ4 support for wal_compression.
* Use WaitLatch() instead of pg_usleep() at the end of backupsMichael Paquier2021-07-06
| | | | | | | | | | This concerns pg_stop_backup() and BASE_BACKUP, when waiting for the WAL segments required for a backup to be archived. This simplifies a bit the handling of the wait event used in this code path. Author: Bharath Rupireddy Reviewed-by: Michael Paquier, Stephen Frost Discussion: https://postgr.es/m/CALj2ACU4AdPCq6NLfcA-ZGwX7pPCK5FgEj-CAU0xCKzkASSy_A@mail.gmail.com
* Use InvalidBucket instead of -1 where appropriatePeter Eisentraut2021-07-02
| | | | | Reported-by: Ranier Vilela <ranier.vf@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAEudQAp%3DZwKjrP4L%2BCzqV7SmWiaQidPPRqj4tqdjDG4KBx5yrg%40mail.gmail.com
* Use WaitLatch() instead of pg_usleep() at end-of-vacuum truncationMichael Paquier2021-07-02
| | | | | | | | | | | | This has the advantage to make a process more responsive when the postmaster dies, even if the wait time was rather limited as there was only a 50ms timeout here. Another advantage of this change is for monitoring, as we gain a new wait event for the end-of-vacuum truncation. Author: Bharath Rupireddy Reviewed-by: Aleksander Alekseev, Thomas Munro, Michael Paquier Discussion: https://postgr.es/m/CALj2ACU4AdPCq6NLfcA-ZGwX7pPCK5FgEj-CAU0xCKzkASSy_A@mail.gmail.com
* Fix incorrect PITR message for transaction ROLLBACK PREPAREDMichael Paquier2021-06-30
| | | | | | | | | | | Reaching PITR on such a transaction would cause the generation of a LOG message mentioning a transaction committed, not aborted. Oversight in 4f1b890. Author: Simon Riggs Discussion: https://postgr.es/m/CANbhV-GJ6KijeCgdOrxqMCQ+C8QiK657EMhCy4csjrPcEUFv_Q@mail.gmail.com Backpatch-through: 9.6
* Fix compilation warning in xloginsert.cMichael Paquier2021-06-29
| | | | | | | | This is reproducible with gcc using at least -O0. The last checks validating the compression of a block could not be reached with this variable not set, but let's be clean. Oversight in 4035cd5, per buildfarm member lapwing.
* Add support for LZ4 with compression of full-page writes in WALMichael Paquier2021-06-29
| | | | | | | | | | | | | | | | | | | | | | The logic is implemented so as there can be a choice in the compression used when building a WAL record, and an extra per-record bit is used to track down if a block is compressed with PGLZ, LZ4 or nothing. wal_compression, the existing parameter, is changed to an enum with support for the following backward-compatible values: - "off", the default, to not use compression. - "pglz" or "on", to compress FPWs with PGLZ. - "lz4", the new mode, to compress FPWs with LZ4. Benchmarking has showed that LZ4 outclasses easily PGLZ. ZSTD would be also an interesting choice, but going just with LZ4 for now makes the patch minimalistic as toast compression is already able to use LZ4, so there is no need to worry about any build-related needs for this implementation. Author: Andrey Borodin, Justin Pryzby Reviewed-by: Dilip Kumar, Michael Paquier Discussion: https://postgr.es/m/3037310D-ECB7-4BF1-AF20-01C10BB33A33@yandex-team.ru