aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/transam/xlog.c
Commit message (Collapse)AuthorAge
...
* Make XLOG_FPI_FOR_HINT records honor full_page_writes setting.Fujii Masao2021-07-21
| | | | | | | | | | | | | | | | Commit 2c03216d83 changed XLOG_FPI_FOR_HINT records so that they always included full-page images even when full_page_writes was disabled. However, in this setting, they don't need to do that because hint bit updates don't need to be protected from torn writes. Therefore, this commit makes XLOG_FPI_FOR_HINT records honor full_page_writes setting. That is, XLOG_FPI_FOR_HINT records may include no full-page images if full_page_writes is disabled, and WAL replay of them does nothing. Reported-by: Zhang Wenjie Author: Kyotaro Horiguchi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/tencent_60F11973A111EED97A8596FFECC4A91ED405@qq.com
* Advance old-segment horizon properly after slot invalidationAlvaro Herrera2021-07-16
| | | | | | | | | | | | | | | | When some slots are invalidated due to the max_slot_wal_keep_size limit, the old segment horizon should move forward to stay within the limit. However, in commit c6550776394e we forgot to call KeepLogSeg again to recompute the horizon after invalidating replication slots. In cases where other slots remained, the limits would be recomputed eventually for other reasons, but if all slots were invalidated, the limits would not move at all afterwards. Repair. Backpatch to 13 where the feature was introduced. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reported-by: Marcin Krupowicz <mk@071.ovh> Discussion: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org
* Replace explicit PIN entries in pg_depend with an OID range test.Tom Lane2021-07-15
| | | | | | | | | | | | | | | | As of v14, pg_depend contains almost 7000 "pin" entries recording the OIDs of built-in objects. This is a fair amount of bloat for every database, and it adds time to pg_depend lookups as well as initdb. We can get rid of all of those entries in favor of an OID range check, i.e. "OIDs below FirstUnpinnedObjectId are pinned". (template1 and the public schema are exceptions. Those exceptions are now wired into IsPinnedObject() instead of initdb's code for filling pg_depend, but it's the same amount of cruft either way.) The contents of pg_shdepend are modified likewise. Discussion: https://postgr.es/m/3737988.1618451008@sss.pgh.pa.us
* Use WaitLatch() instead of pg_usleep() at the end of backupsMichael Paquier2021-07-06
| | | | | | | | | | This concerns pg_stop_backup() and BASE_BACKUP, when waiting for the WAL segments required for a backup to be archived. This simplifies a bit the handling of the wait event used in this code path. Author: Bharath Rupireddy Reviewed-by: Michael Paquier, Stephen Frost Discussion: https://postgr.es/m/CALj2ACU4AdPCq6NLfcA-ZGwX7pPCK5FgEj-CAU0xCKzkASSy_A@mail.gmail.com
* Fix incorrect PITR message for transaction ROLLBACK PREPAREDMichael Paquier2021-06-30
| | | | | | | | | | | Reaching PITR on such a transaction would cause the generation of a LOG message mentioning a transaction committed, not aborted. Oversight in 4f1b890. Author: Simon Riggs Discussion: https://postgr.es/m/CANbhV-GJ6KijeCgdOrxqMCQ+C8QiK657EMhCy4csjrPcEUFv_Q@mail.gmail.com Backpatch-through: 9.6
* Add support for LZ4 with compression of full-page writes in WALMichael Paquier2021-06-29
| | | | | | | | | | | | | | | | | | | | | | The logic is implemented so as there can be a choice in the compression used when building a WAL record, and an extra per-record bit is used to track down if a block is compressed with PGLZ, LZ4 or nothing. wal_compression, the existing parameter, is changed to an enum with support for the following backward-compatible values: - "off", the default, to not use compression. - "pglz" or "on", to compress FPWs with PGLZ. - "lz4", the new mode, to compress FPWs with LZ4. Benchmarking has showed that LZ4 outclasses easily PGLZ. ZSTD would be also an interesting choice, but going just with LZ4 for now makes the patch minimalistic as toast compression is already able to use LZ4, so there is no need to worry about any build-related needs for this implementation. Author: Andrey Borodin, Justin Pryzby Reviewed-by: Dilip Kumar, Michael Paquier Discussion: https://postgr.es/m/3037310D-ECB7-4BF1-AF20-01C10BB33A33@yandex-team.ru
* Skip WAL recycling and preallocation during archive recovery.Noah Misch2021-06-28
| | | | | | | | | | | | The previous commit addressed the chief consequences of a race condition between InstallXLogFileSegment() and KeepFileRestoredFromArchive(). Fix three lesser consequences. A spurious durable_rename_excl() LOG message remained possible. KeepFileRestoredFromArchive() wasted the proceeds of WAL recycling and preallocation. Finally, XLogFileInitInternal() could return a descriptor for a file that KeepFileRestoredFromArchive() had already unlinked. That felt like a recipe for future bugs. Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
* Don't ERROR on PreallocXlogFiles() race condition.Noah Misch2021-06-28
| | | | | | | | | | | | Before a restartpoint finishes PreallocXlogFiles(), a startup process KeepFileRestoredFromArchive() call can unlink the preallocated segment. If a CHECKPOINT sql command had elicited the restartpoint experiencing the race condition, that sql command failed. Moreover, the restartpoint omitted its log_checkpoints message and some inessential resource reclamation. Prevent the ERROR by skipping open() of the segment. Since these consequences are so minor, no back-patch. Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
* Remove XLogFileInit() ability to unlink a pre-existing file.Noah Misch2021-06-28
| | | | | | | | Only initdb used it. initdb refuses to operate on a non-empty directory and generally does not cope with pre-existing files of other kinds. Hence, use the opportunity to simplify. Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
* In XLogFileInit(), fix *use_existent postcondition to suit callers.Noah Misch2021-06-28
| | | | | | | | Infrequently, the mismatch caused log_checkpoints messages and TRACE_POSTGRESQL_CHECKPOINT_DONE() to witness an "added" count too high by one. Since that consequence is so minor, no back-patch. Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
* Remove XLogFileInit() ability to skip ControlFileLock.Noah Misch2021-06-28
| | | | | | Cold paths, initdb and end-of-recovery, used it. Don't optimize them. Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
* Fix outdated comment that talked about seek position of WAL file.Heikki Linnakangas2021-06-16
| | | | | | | | Since commit c24dcd0cfd, we have been using pg_pread() to read the WAL file, which doesn't change the seek position (unless we fall back to the implementation in src/port/pread.c). Update comment accordingly. Backpatch-through: 12, where we started to use pg_pread()
* Fix corner case failure of new standby to follow new primary.Robert Haas2021-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | This only happens if (1) the new standby has no WAL available locally, (2) the new standby is starting from the old timeline, (3) the promotion happened in the WAL segment from which the new standby is starting, (4) the timeline history file for the new timeline is available from the archive but the WAL files for are not (i.e. this is a race), (5) the WAL files for the new timeline are available via streaming, and (6) recovery_target_timeline='latest'. Commit ee994272ca50f70b53074f0febaec97e28f83c4e introduced this logic and was an improvement over the previous code, but it mishandled this case. If recovery_target_timeline='latest' and restore_command is set, validateRecoveryParameters() can change recoveryTargetTLI to be different from receiveTLI. If streaming is then tried afterward, expectedTLEs gets initialized with the history of the wrong timeline. It's supposed to be a list of entries explaining how to get to the target timeline, but in this case it ends up with a list of entries explaining how to get to the new standby's original timeline, which isn't right. Dilip Kumar and Robert Haas, reviewed by Kyotaro Horiguchi. Discussion: http://postgr.es/m/CAFiTN-sE-jr=LB8jQuxeqikd-Ux+jHiXyh4YDiZMPedgQKup0g@mail.gmail.com
* Make standby promotion reset the recovery pause state to 'not paused'.Fujii Masao2021-05-19
| | | | | | | | | | | | | | If a promotion is triggered while recovery is paused, the paused state ends and promotion continues. But previously in that case pg_get_wal_replay_pause_state() returned 'paused' wrongly while a promotion was ongoing. This commit changes a standby promotion so that it marks the recovery pause state as 'not paused' when it's triggered, to fix the issue. Author: Fujii Masao Reviewed-by: Dilip Kumar, Kyotaro Horiguchi Discussion: https://postgr.es/m/f706876c-4894-0ba5-6f4d-79803eeea21b@oss.nttdata.com
* Initial pgindent and pgperltidy run for v14.Tom Lane2021-05-12
| | | | | | | | Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.
* Revert recovery prefetching feature.Thomas Munro2021-05-10
| | | | | | | | | | | | | | | | | | This set of commits has some bugs with known fixes, but at this late stage in the release cycle it seems best to revert and resubmit next time, along with some new automated test coverage for this whole area. Commits reverted: dc88460c: Doc: Review for "Optionally prefetch referenced data in recovery." 1d257577: Optionally prefetch referenced data in recovery. f003d9f8: Add circular WAL decoding buffer. 323cbe7c: Remove read_page callback from XLogReader. Remove the new GUC group WAL_RECOVERY recently added by a55a9847, as the corresponding section of config.sgml is now reverted. Discussion: https://postgr.es/m/CAOuzzgrn7iKnFRsB4MHp3UisEQAGgZMbk_ViTN4HV4-Ksq8zCg%40mail.gmail.com
* Optionally prefetch referenced data in recovery.Thomas Munro2021-04-08
| | | | | | | | | | | | | | | | | | | | | | | | Introduce a new GUC recovery_prefetch, disabled by default. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Better mechanisms will follow in later work on the I/O subsystem. The GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os we allow ourselves to initiate, based on pessimistic heuristics used to infer that I/Os have begun and completed. The GUC wal_decode_buffer_size is used to limit the maximum distance we are prepared to read ahead in the WAL to find uncached blocks. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts) Reviewed-by: Andres Freund <andres@anarazel.de> (parts) Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
* Add circular WAL decoding buffer.Thomas Munro2021-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | Teach xlogreader.c to decode its output into a circular buffer, to support optimizations based on looking ahead. * XLogReadRecord() works as before, consuming records one by one, and allowing them to be examined via the traditional XLogRecGetXXX() macros. * An alternative new interface XLogNextRecord() is added that returns pointers to DecodedXLogRecord structs that can be examined directly. * XLogReadAhead() provides a second cursor that lets you see further ahead, as long as data is available and there is enough space in the decoding buffer. This returns DecodedXLogRecord pointers to the caller, but also adds them to a queue of records that will later be consumed by XLogNextRecord()/XLogReadRecord(). The buffer's size is controlled with wal_decode_buffer_size. The buffer could potentially be placed into shared memory, for future projects. Large records that don't fit in the circular buffer are called "oversized" and allocated separately with palloc(). Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
* Remove read_page callback from XLogReader.Thomas Munro2021-04-08
| | | | | | | | | | | | | | | | | | | | | Previously, the XLogReader module would fetch new input data using a callback function. Redesign the interface so that it tells the caller to insert more data with a special return value instead. This API suits later patches for prefetching, encryption and maybe other future projects that would otherwise require continually extending the callback interface. As incidental cleanup work, move global variables readOff, readLen and readSegNo inside XlogReaderState. Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> Author: Heikki Linnakangas <hlinnaka@iki.fi> (parts of earlier version) Reviewed-by: Antonin Houska <ah@cybertec.at> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Takashi Menjo <takashi.menjo@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/20190418.210257.43726183.horiguchi.kyotaro%40lab.ntt.co.jp
* Stop archive recovery if WAL generated with wal_level=minimal is found.Fujii Masao2021-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously if hot standby was enabled, archive recovery exited with an error when it found WAL generated with wal_level=minimal. But if hot standby was disabled, it just reported a warning and continued in that case. Which could lead to data loss or errors during normal operation. A warning was emitted, but users could easily miss that and not notice this serious situation until they encountered the actual errors. To improve this situation, this commit changes archive recovery so that it exits with FATAL error when it finds WAL generated with wal_level=minimal whatever the setting of hot standby. This enables users to notice the serious situation soon. The FATAL error is thrown if archive recovery starts from a base backup taken before wal_level is changed to minimal. When archive recovery exits with the error, if users have a base backup taken after setting wal_level to higher than minimal, they can recover the database by starting archive recovery from that newer backup. But note that if such backup doesn't exist, there is no easy way to complete archive recovery, which may make the database server unstartable and users may lose whole database. The commit adds the note about this risk into the document. Even in the case of unstartable database server, previously by just disabling hot standby users could avoid the error during archive recovery, forcibly start up the server and salvage data from it. But note that this commit makes this procedure unavailable at all. Author: Takamichi Osumi Reviewed-by: Laurenz Albe, Kyotaro Horiguchi, David Steele, Fujii Masao Discussion: https://postgr.es/m/OSBPR01MB4888CBE1DA08818FD2D90ED8EDF90@OSBPR01MB4888.jpnprd01.prod.outlook.com
* Code review for server's handling of "tablespace map" files.Tom Lane2021-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While looking at Robert Foggia's report, I noticed a passel of other issues in the same area: * The scheme for backslash-quoting newlines in pathnames is just wrong; it will misbehave if the last ordinary character in a pathname is a backslash. I'm not sure why we're bothering to allow newlines in tablespace paths, but if we're going to do it we should do it without introducing other problems. Hence, backslashes themselves have to be backslashed too. * The author hadn't read the sscanf man page very carefully, because this code would drop any leading whitespace from the path. (I doubt that a tablespace path with leading whitespace could happen in practice; but if we're bothering to allow newlines in the path, it sure seems like leading whitespace is little less implausible.) Using sscanf for the task of finding the first space is overkill anyway. * While I'm not 100% sure what the rationale for escaping both \r and \n is, if the idea is to allow Windows newlines in the file then this code failed, because it'd throw an error if it saw \r followed by \n. * There's no cross-check for an incomplete final line in the map file, which would be a likely apparent symptom of the improper-escaping bug. On the generation end, aside from the escaping issue we have: * If needtblspcmapfile is true then do_pg_start_backup will pass back escaped strings in tablespaceinfo->path values, which no caller wants or is prepared to deal with. I'm not sure if there's a live bug from that, but it looks like there might be (given the dubious assumption that anyone actually has newlines in their tablespace paths). * It's not being very paranoid about the possibility of random stuff in the pg_tblspc directory. IMO we should ignore anything without an OID-like name. The escaping rule change doesn't seem back-patchable: it'll require doubling of backslashes in the tablespace_map file, which is basically a basebackup format change. The odds of that causing trouble are considerably more than the odds of the existing bug causing trouble. The rest of this seems somewhat unlikely to cause problems too, so no back-patch.
* Prevent buffer overrun in read_tablespace_map().Tom Lane2021-03-17
| | | | | | | | | | | | | | | | Robert Foggia of Trustwave reported that read_tablespace_map() fails to prevent an overrun of its on-stack input buffer. Since the tablespace map file is presumed trustworthy, this does not seem like an interesting security vulnerability, but still we should fix it just in the name of robustness. While here, document that pg_basebackup's --tablespace-mapping option doesn't work with tar-format output, because it doesn't. To make it work, we'd have to modify the tablespace_map file within the tarball sent by the server, which might be possible but I'm not volunteering. (Less-painful solutions would require changing the basebackup protocol so that the source server could adjust the map. That's not very appetizing either.)
* Add condition variable for recovery resume.Thomas Munro2021-03-12
| | | | | | | | | Replace a sleep loop with a CV, to get a fast reaction time when recovery is resumed or the postmaster exits via standard infrastructure. Unfortunately we still need to wake up every second to perform extra polling during the recovery pause loop. Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
* Be clear about whether a recovery pause has taken effect.Robert Haas2021-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the code and documentation seem to have essentially assumed than a call to pg_wal_replay_pause() would take place immediately, but that's not the case, because we only check for a pause in certain places. This means that a tool that uses this function and then wants to do something else afterward that is dependent on the pause having taken effect doesn't know how long it needs to wait to be sure that no more WAL is going to be replayed. To avoid that, add a new function pg_get_wal_replay_pause_state() which returns either 'not paused', 'paused requested', or 'paused'. After calling pg_wal_replay_pause() the status will immediate change from 'not paused' to 'pause requested'; when the startup process has noticed this, the status will change to 'pause'. For backward compatibility, pg_is_wal_replay_paused() still exists and returns the same thing as before: true if a pause has been requested, whether or not it has taken effect yet; and false if not. The documentation is updated to clarify. To improve the changes that a pause request is quickly confirmed effective, adjust things so that WaitForWALToBecomeAvailable will swiftly reach a call to recoveryPausesHere() when a pause request is made. Dilip Kumar, reviewed by Simon Riggs, Kyotaro Horiguchi, Yugo Nagata, Masahiko Sawada, and Bharath Rupireddy. Discussion: http://postgr.es/m/CAFiTN-vcLLWEm8Zr%3DYK83rgYrT9pbC8VJCfa1kY9vL3AUPfu6g%40mail.gmail.com
* Track total amounts of times spent writing and syncing WAL data to disk.Fujii Masao2021-03-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds new GUC track_wal_io_timing. When this is enabled, the total amounts of time XLogWrite writes and issue_xlog_fsync syncs WAL data to disk are counted in pg_stat_wal. This information would be useful to check how much WAL write and sync affect the performance. Enabling track_wal_io_timing will make the server query the operating system for the current time every time WAL is written or synced, which may cause significant overhead on some platforms. To avoid such additional overhead in the server with track_io_timing enabled, this commit introduces track_wal_io_timing as a separate parameter from track_io_timing. Note that WAL write and sync activity by walreceiver has not been tracked yet. This commit makes the server also track the numbers of times XLogWrite writes and issue_xlog_fsync syncs WAL data to disk, in pg_stat_wal, regardless of the setting of track_wal_io_timing. This counters can be used to calculate the WAL write and sync time per request, for example. Bump PGSTAT_FILE_FORMAT_ID. Bump catalog version. Author: Masahiro Ikeda Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston, Fujii Masao Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
* Simplify printing of LSNsPeter Eisentraut2021-02-23
| | | | | | | | | | Add a macro LSN_FORMAT_ARGS for use in printf-style printing of LSNs. Convert all applicable code to use it. Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ub5NaTELZ3hJUCE6amuvqAtsSxc7O+uK7y4t9Rrk23cw@mail.gmail.com
* Use errmsg_internal for debug messagesPeter Eisentraut2021-02-17
| | | | | | An inconsistent set of debug-level messages was not using errmsg_internal(), thus uselessly exposing the messages to translation work. Fix those.
* Clarify some comments around SharedRecoveryState in xlog.cMichael Paquier2021-02-06
| | | | | | | | | SharedRecoveryState has been switched from a boolean to an enum as of commit 4e87c48, but some comments still referred to it as a boolean. Author: Amul Sul Reviewed-by: Dilip Kumar, Kyotaro Horiguchi Discussion: https://postgr.es/m/CAAJ_b97Hf+1SXnm8jySpO+Fhm+-VKFAAce1T_cupUYtnE3Nxig
* Retire pg_standby.Thomas Munro2021-01-29
| | | | | | | | | | | | | | pg_standby was useful more than a decade ago, but now it is obsolete. It has been proposed that we retire it many times. Now seems like a good time to finally do it, because "waiting restore commands" are incompatible with a proposed recovery prefetching feature. Discussion: https://postgr.es/m/20201029024412.GP5380%40telsasoft.com Author: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
* Move StartupCLOG() calls to just after we initialize ShmemVariableCache.Robert Haas2021-01-27
| | | | | | | | | | | | | Previously, the hot_standby=off code path did this at end of recovery, while the hot_standby=on code path did it at the beginning of recovery. It's better to do this in only one place because (a) it's simpler, (b) StartupCLOG() is trivial so trying to postpone the work isn't useful, and (c) this will make it possible to simplify some other logic. Patch by me, reviewed by Heikki Linnakangas. Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com
* Remove CheckpointLock.Robert Haas2021-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up until now, we've held this lock when performing a checkpoint or restartpoint, but commit 076a055acf3c55314de267c62b03191586d79cf6 back in 2004 and commit 7e48b77b1cebb9a43f9fdd6b17128a0ba36132f9 from 2009, taken together, have removed all need for this. In the present code, there's only ever one process entitled to attempt a checkpoint: either the checkpointer, during normal operation, or the postmaster, during single-user operation. So, we don't need the lock. One possible concern in making this change is that it means that a substantial amount of code where HOLD_INTERRUPTS() was previously in effect due to the preceding LWLockAcquire() will now be running without that. This could mean that ProcessInterrupts() gets called in places from which it didn't before. However, this seems unlikely to do very much, because the checkpointer doesn't have any signal mapped to die(), so it's not clear how, for example, ProcDiePending = true could happen in the first place. Similarly with ClientConnectionLost and recovery conflicts. Also, if there are any such problems, we might want to fix them rather than reverting this, since running lots of code with interrupt handling suspended is generally bad. Patch by me, per an inquiry by Amul Sul. Review by Tom Lane and Michael Paquier. Discussion: http://postgr.es/m/CAAJ_b97XnBBfYeSREDJorFsyoD1sHgqnNuCi=02mNQBUMnA=FA@mail.gmail.com
* Pause recovery for insufficient parameter settingsPeter Eisentraut2021-01-18
| | | | | | | | | | | | | | | | | | When certain parameters are changed on a physical replication primary, this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record. The standby then checks whether its own settings are at least as big as the ones on the primary. If not, the standby shuts down with a fatal error. This patch changes this behavior for hot standbys to pause recovery at that point instead. That allows read traffic on the standby to continue while database administrators figure out next steps. When recovery is unpaused, the server shuts down (as before). The idea is to fix the parameters while recovery is paused and then restart when there is a maintenance window. Reviewed-by: Sergei Kornilov <sk@zsrv.org> Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
* Fix O(N^2) stat() calls when recycling WAL segmentsMichael Paquier2021-01-15
| | | | | | | | | | | | | | | | | | The counter tracking the last segment number recycled was getting initialized when recycling one single segment, while it should be used across a full cycle of segments recycled to prevent useless checks related to entries already recycled. This performance issue has been introduced by b2a5545, and it was first implemented in 61b86142. No backpatch is done per the lack of field complaints. Reported-by: Andres Freund, Thomas Munro Author: Michael Paquier Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20170621211016.eln6cxxp3jrv7m4m@alap3.anarazel.de Discussion: https://postgr.es/m/CA+hUKG+DRiF9z1_MU4fWq+RfJMxP7zjoptfcmuCFPeO4JM2iVg@mail.gmail.com
* Use vectored I/O to fill new WAL segments.Thomas Munro2021-01-11
| | | | | | | | | | | | | Instead of making many block-sized write() calls to fill a new WAL file with zeroes, make a smaller number of pwritev() calls (or various emulations). The actual number depends on the OS's IOV_MAX, which PG_IOV_MAX currently caps at 32. That means we'll write 256kB per call on typical systems. We may want to tune the number later with more experience. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Revert "Add key management system" (978f869b99) & later commitsBruce Momjian2020-12-27
| | | | | | | | | | The patch needs test cases, reorganization, and cfbot testing. Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive) and 08db7c63f3..ccbe34139b. Reported-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
* Add key management systemBruce Momjian2020-12-25
| | | | | | | | | | | | | | | This adds a key management system that stores (currently) two data encryption keys of length 128, 192, or 256 bits. The data keys are AES256 encrypted using a key encryption key, and validated via GCM cipher mode. A command to obtain the key encryption key must be specified at initdb time, and will be run at every database server start. New parameters allow a file descriptor open to the terminal to be passed. pg_upgrade support has also been added. Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us Author: Masahiko Sawada, me, Stephen Frost
* Fix typos and grammar in docs and commentsMichael Paquier2020-12-24
| | | | | | | | This fixes several areas of the documentation and some comments in matters of style, grammar, or even format. Author: Justin Pryzby Discussion: https://postgr.es/m/20201222041153.GK30237@telsasoft.com
* Revert "Get rid of the dedicated latch for signaling the startup process".Fujii Masao2020-12-17
| | | | | | | | | | | | | | | | | Revert ac22929a26, as well as the followup fix 113d3591b8. Because it broke the assumption that the startup process waiting for the recovery conflict on buffer pin should be waken up only by buffer unpin or the timeout enabled in ResolveRecoveryConflictWithBufferPin(). It caused, for example, SIGHUP signal handler or walreceiver process to wake that startup process up unnecessarily frequently. Additionally, add the comments about why that dedicated latch that the reverted patch tried to get rid of should not be removed. Thanks to Kyotaro Horiguchi for the discussion. Author: Fujii Masao Discussion: https://postgr.es/m/d8c0c608-021b-3c73-fffd-3240829ee986@oss.nttdata.com
* Add some checkpoint/restartpoint status to ps displayMichael Paquier2020-12-14
| | | | | | | | | | | | | | | | | This is done for end-of-recovery and shutdown checkpoints/restartpoints (end-of-recovery restartpoints don't exist) rather than all types of checkpoints, in cases where it may not be possible to rely on pg_stat_activity to get a status from the startup or checkpointer processes. For example, at the end of a crash recovery, this is useful to know if a checkpoint is running in the startup process, while previously the ps display may only show some information about "recovering" something, that can be confusing while a checkpoint runs. Author: Justin Pryzby Reviewed-by: Nathan Bossart, Kirk Jamison, Fujii Masao, Michael Paquier Discussion: https://postgr.es/m/20200818225238.GP17022@telsasoft.com
* Convert elog(LOG) calls to ereport() where appropriatePeter Eisentraut2020-12-04
| | | | | | | | | | | User-visible log messages should go through ereport(), so they are subject to translation. Many remaining elog(LOG) calls are really debugging calls. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com
* Replace a macro by a functionPeter Eisentraut2020-11-20
| | | | | | Using a macro is ugly and not justified here. Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
* Make the standby server promptly handle interrupt signals.Fujii Masao2020-11-16
| | | | | | | | | | | | | | | | This commit changes the startup process in the standby server so that it handles the interrupt signals after waiting for wal_retrieve_retry_interval on the latch and resetting it, before entering another wait on the latch. This change causes the standby server to promptly handle interrupt signals. Otherwise, previously, there was the case where the standby needs to wait extra five seconds to shutdown when the shutdown request arrived while the startup process was waiting for wal_retrieve_retry_interval on the latch. Author: Fujii Masao, but implementation idea is from Soumyadeep Chakraborty Reviewed-by: Soumyadeep Chakraborty Discussion: https://postgr.es/m/9d7e6ab0-8a53-ddb9-63cd-289bcb25fe0e@oss.nttdata.com
* Fix and simplify some usages of TimestampDifference().Tom Lane2020-11-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce TimestampDifferenceMilliseconds() to simplify callers that would rather have the difference in milliseconds, instead of the select()-oriented seconds-and-microseconds format. This gets rid of at least one integer division per call, and it eliminates some apparently-easy-to-mess-up arithmetic. Two of these call sites were in fact wrong: * pg_prewarm's autoprewarm_main() forgot to multiply the seconds by 1000, thus ending up with a delay 1000X shorter than intended. That doesn't quite make it a busy-wait, but close. * postgres_fdw's pgfdw_get_cleanup_result() thought it needed to compute microseconds not milliseconds, thus ending up with a delay 1000X longer than intended. Somebody along the way had noticed this problem but misdiagnosed the cause, and imposed an ad-hoc 60-second limit rather than fixing the units. This was relatively harmless in context, because we don't care that much about exactly how long this delay is; still, it's wrong. There are a few more callers of TimestampDifference() that don't have a direct need for seconds-and-microseconds, but can't use TimestampDifferenceMilliseconds() either because they do need microsecond precision or because they might possibly deal with intervals long enough to overflow 32-bit milliseconds. It might be worth inventing another API to improve that, but that seems outside the scope of this patch; so those callers are untouched here. Given the fact that we are fixing some bugs, and the likelihood that future patches might want to back-patch code that uses this new API, back-patch to all supported branches. Alexey Kondratov and Tom Lane Discussion: https://postgr.es/m/3b1c053a21c07c1ed5e00be3b2b855ef@postgrespro.ru
* Fix segmentation fault that commit ac22929a26 caused.Fujii Masao2020-11-04
| | | | | | | | | | | | | | | | | Commit ac22929a26 changed recoveryWakeupLatch so that it's reset to NULL at the end of recovery. This change could cause a segmentation fault in the buildfarm member 'elver'. Previously the latch was reset to NULL after calling ShutdownWalRcv(). But there could be a window between ShutdownWalRcv() and the actual exit of walreceiver. If walreceiver set the latch during that window, the segmentation fault could happen. To fix the issue, this commit changes walreceiver so that it sets the latch only when the latch has not been reset to NULL yet. Author: Fujii Masao Discussion: https://postgr.es/m/5c1f8a85-747c-7bf9-241e-dd467d8a3586@iki.fi
* Get rid of the dedicated latch for signaling the startup process.Fujii Masao2020-11-04
| | | | | | | | | | | | | | This commit gets rid of the dedicated latch for signaling the startup process in favor of using its procLatch, since that comports better with possible generic signal handlers using that latch. Commit 1e53fe0e70 changed background processes so that they use standard SIGHUP handler. Like that, this commit also makes the startup process use standard SIGHUP handler to simplify the code. Author: Fujii Masao Reviewed-by: Bharath Rupireddy, Michael Paquier Discussion: https://postgr.es/m/CALj2ACXPorUqePswDtOeM_s82v9RW32E1fYmOPZ5NuE+TWKj_A@mail.gmail.com
* Fix compilation warning in xlog.cMichael Paquier2020-10-06
| | | | | | | Oversight in 9d0bd95. Reported-by: Andres Freund Discussion: https://postgr.es/m/20201006023802.qqfi6m5bw5y77zql@alap3.anarazel.de
* Add pg_stat_wal statistics view.Fujii Masao2020-10-02
| | | | | | | | | | | | | | | | | | | | This view shows the statistics about WAL activity. Currently it has only two columns: wal_buffers_full and stats_reset. wal_buffers_full column indicates the number of times WAL data was written to the disk because WAL buffers got full. This information is useful when tuning wal_buffers. stats_reset column indicates the time at which these statistics were last reset. pg_stat_wal view is also the basic infrastructure to expose other various statistics about WAL activity later. Bump PGSTAT_FILE_FORMAT_ID due to the change in pgstat format. Bump catalog version. Author: Masahiro Ikeda Reviewed-by: Takayuki Tsunakawa, Kyotaro Horiguchi, Amit Kapila, Fujii Masao Discussion: https://postgr.es/m/188bd3f2d2233cf97753b5ced02bb050@oss.nttdata.com
* Add block information in error context of WAL REDO apply loopMichael Paquier2020-10-02
| | | | | | | | | | | | | | | | Providing this information can be useful for example when diagnosing problems related to recovery conflicts or for recovery issues without having to go through the output generated by pg_waldump to get some information about the blocks a WAL record works on. The block information is printed in the same format as pg_waldump. This already existed in xlog.c for debugging purposes with -DWAL_DEBUG, so adding the block information in the callback has required just a small refactoring. Author: Bertrand Drouvot Reviewed-by: Michael Paquier, Masahiko Sawada Discussion: https://postgr.es/m/c31e2cba-efda-762c-f4ad-5c25e5dac3d0@amazon.com
* Defer flushing of SLRU files.Thomas Munro2020-09-25
| | | | | | | | | | | | | | | | | | | | | | | | Previously, we called fsync() after writing out individual pg_xact, pg_multixact and pg_commit_ts pages due to cache pressure, leading to regular I/O stalls in user backends and recovery. Collapse requests for the same file into a single system call as part of the next checkpoint, as we already did for relation files, using the infrastructure developed by commit 3eb77eba. This can cause a significant improvement to recovery performance, especially when it's otherwise CPU-bound. Hoist ProcessSyncRequests() up into CheckPointGuts() to make it clearer that it applies to all the SLRU mini-buffer-pools as well as the main buffer pool. Rearrange things so that data collected in CheckpointStats includes SLRU activity. Also remove the Shutdown{CLOG,CommitTS,SUBTRANS,MultiXact}() functions, because they were redundant after the shutdown checkpoint that immediately precedes them. (I'm not sure if they were ever needed, but they aren't now.) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (parts) Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> Discussion: https://postgr.es/m/CA+hUKGLJ=84YT+NvhkEEDAuUtVHMfQ9i-N7k_o50JmQ6Rpj_OQ@mail.gmail.com