aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/transam/xlogreader.c
Commit message (Collapse)AuthorAge
* Adjust assertion in XLogDecodeNextRecord.Robert Haas2022-08-18
| | | | | | | | | | | | | | | | | As written, if you use XLogBeginRead() to position an xlogreader at the beginning of a WAL page and then try to read WAL, this assertion will fail. However, the header comment for XLogBeginRead() claims that positioning an xlogreader at the beginning of a page is valid, and the code here is perfectly able to cope with it. It's only the assertion that causes trouble. So relax it. This is formally a bug in all supported branches, but as it doesn't seem to have any consequences for current uses of the xlogreader facility, no back-patch, at least for now. Dilip Kumar and Robert Haas Discussion: http://postgr.es/m/CA+TgmoaJSs2_7WHW2GzFYe9+zfPtxBKvT3GW47+x=ptUE=cULw@mail.gmail.com
* Remove dead pread and pwrite replacement code.Thomas Munro2022-08-05
| | | | | | | | | | | | | | | | | | | | | | | pread() and pwrite() are in SUSv2, and all targeted Unix systems have them. Previously, we defined pg_pread and pg_pwrite to emulate these function with lseek() on old Unixen. The names with a pg_ prefix were a reminder of a portability hazard: they might change the current file position. That hazard is gone, so we can drop the prefixes. Since the remaining replacement code is Windows-only, move it into src/port/win32p{read,write}.c, and move the declarations into src/include/port/win32_port.h. No need for vestigial HAVE_PREAD, HAVE_PWRITE macros as they were only used for declarations in port.h which have now moved into win32_port.h. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
* Change internal RelFileNode references to RelFileNumber or RelFileLocator.Robert Haas2022-07-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
* Make XLogRecGetBlockTag() throw error if there's no such block.Tom Lane2022-04-11
| | | | | | | | | | | | | | | | | | | | | | | | All but a few existing callers assume without checking that this function succeeds. While it probably will, that's a poor excuse for not checking. Let's make it return void and instead throw an error if it doesn't find the block reference. Callers that actually need to handle the no-such-block case must now use the underlying function XLogRecGetBlockTagExtended. In addition to being a bit less error-prone, this should also serve to suppress some Coverity complaints about XLogRecGetBlockRefInfo. While at it, clean up some inconsistency about use of the XLogRecHasBlockRef macro: make XLogRecGetBlockTagExtended use that instead of open-coding the same condition, and avoid calling XLogRecHasBlockRef twice in relevant code paths. (That is, calling XLogRecHasBlockRef followed by XLogRecGetBlockTag is now deprecated: use XLogRecGetBlockTagExtended instead.) Patch HEAD only; this doesn't seem to have enough value to consider a back-branch API break. Discussion: https://postgr.es/m/425039.1649701221@sss.pgh.pa.us
* Fix various typos and spelling mistakes in code commentsDavid Rowley2022-04-11
| | | | | Author: Justin Pryzby Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
* Add contrib/pg_walinspect.Jeff Davis2022-04-08
| | | | | | | | | Provides similar functionality to pg_waldump, but from a SQL interface rather than a separate utility. Author: Bharath Rupireddy Reviewed-by: Greg Stark, Kyotaro Horiguchi, Andres Freund, Ashutosh Sharma, Nitin Jadhav, RKN Sai Krishna Discussion: https://postgr.es/m/CALj2ACUGUYXsEQdKhEdsBzhGEyF3xggvLdD8C0VT72TNEfOiog%40mail.gmail.com
* Prefetch data referenced by the WAL, take II.Thomas Munro2022-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new GUC recovery_prefetch. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Since not all OSes have that system call, "try" is provided so that it can be enabled where available. Better mechanisms for asynchronous I/O are possible in later work. Set to "try" for now for test coverage. Default setting to be finalized before release. The GUC wal_decode_buffer_size limits the distance we can look ahead in bytes of decoded data. The existing GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os allowed, based on pessimistic heuristics used to infer that I/Os have begun and completed. We'll also not look more than maintenance_io_concurrency * 4 block references ahead. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (earlier version) Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (earlier version) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (earlier version) Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> (earlier version) Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> (earlier version) Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
* Fix warning introduced in 5c279a6d350.Jeff Davis2022-04-07
| | | | | | | | | Change two macros to be static inline functions instead to keep the data type consistent. This avoids a "comparison is always true" warning that was occurring with -Wtype-limits. In the process, change the names to look less like macros. Discussion: https://postgr.es/m/20220407063505.njnnrmbn4sxqfsts@alap3.anarazel.de
* Custom WAL Resource Managers.Jeff Davis2022-04-06
| | | | | | | | | | | | | Allow extensions to specify a new custom resource manager (rmgr), which allows specialized WAL. This is meant to be used by a Table Access Method or Index Access Method. Prior to this commit, only Generic WAL was available, which offers support for recovery and physical replication but not logical replication. Reviewed-by: Julien Rouhaud, Bharath Rupireddy, Andres Freund Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
* Add circular WAL decoding buffer, take II.Thomas Munro2022-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach xlogreader.c to decode the WAL into a circular buffer. This will support optimizations based on looking ahead, to follow in a later commit. * XLogReadRecord() works as before, decoding records one by one, and allowing them to be examined via the traditional XLogRecGetXXX() macros and certain traditional members like xlogreader->ReadRecPtr. * An alternative new interface XLogReadAhead()/XLogNextRecord() is added that returns pointers to DecodedXLogRecord objects so that it's now possible to look ahead in the WAL stream while replaying. * In order to be able to use the new interface effectively while streaming data, support is added for the page_read() callback to respond to a new nonblocking mode with XLREAD_WOULDBLOCK instead of waiting for more data to arrive. No direct user of the new interface is included in this commit, though XLogReadRecord() uses it internally. Existing code doesn't need to change, except in a few places where it was accessing reader internals directly and now needs to go through accessor macros. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
* Add support for zstd with compression of full-page writes in WALMichael Paquier2022-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wal_compression gains a new value, "zstd", to allow the compression of full-page images using the compression method of the same name. Compression is done using the default level recommended by the library, as of ZSTD_CLEVEL_DEFAULT = 3. Some benchmarking has shown that it could make sense to use a level lower for the FPI compression, like 1 or 2, as the compression rate did not change much with a bit less CPU consumed, but any tests done would only cover few scenarios so it is hard to come to a clear conclusion. Anyway, there is no reason to not use the default level instead, which is the level recommended by the library so it should be fine for most cases. zstd outclasses easily pglz, and is better than LZ4 where one wants to have more compression at the cost of extra CPU but both are good enough in their own scenarios, so the choice between one or the other of these comes to a study of the workload patterns and the schema involved, mainly. This commit relies heavily on 4035cd5, that reshaped the code creating and restoring full-page writes to be aware of the compression type, making this integration straight-forward. This patch borrows some early work from Andrey Borodin, though the patch got a complete rewrite. Author: Justin Pryzby Discussion: https://postgr.es/m/20220222231948.GJ9008@telsasoft.com
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Fix determination of broken LSN in OVERWRITTEN_CONTRECORDAlvaro Herrera2021-11-26
| | | | | | | | | | | | | | In commit ff9f111bce24 I mixed up inconsistent definitions of the LSN of the first record in a page, when the previous record ends exactly at the page boundary. The correct LSN is adjusted to skip the WAL page header; I failed to use that when setting XLogReaderState->overwrittenRecPtr, so at WAL replay time VerifyOverwriteContrecord would refuse to let replay continue past that record. Backpatch to 10. 9.6 also contains this bug, but it's no longer being maintained. Discussion: https://postgr.es/m/45597.1637694259@sss.pgh.pa.us
* Fix typos in commentsDaniel Gustafsson2021-10-27
| | | | | Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/CAHut+PsN_gmKu-KfeEb9NDARoTPbs4AN4PPu=6LZXFZRJ13SEw@mail.gmail.com
* Fix WAL replay in presence of an incomplete recordAlvaro Herrera2021-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Physical replication always ships WAL segment files to replicas once they are complete. This is a problem if one WAL record is split across a segment boundary and the primary server crashes before writing down the segment with the next portion of the WAL record: WAL writing after crash recovery would happily resume at the point where the broken record started, overwriting that record ... but any standby or backup may have already received a copy of that segment, and they are not rewinding. This causes standbys to stop following the primary after the latter crashes: LOG: invalid contrecord length 7262 at A8/D9FFFBC8 because the standby is still trying to read the continuation record (contrecord) for the original long WAL record, but it is not there and it will never be. A workaround is to stop the replica, delete the WAL file, and restart it -- at which point a fresh copy is brought over from the primary. But that's pretty labor intensive, and I bet many users would just give up and re-clone the standby instead. A fix for this problem was already attempted in commit 515e3d84a0b5, but it only addressed the case for the scenario of WAL archiving, so streaming replication would still be a problem (as well as other things such as taking a filesystem-level backup while the server is down after having crashed), and it had performance scalability problems too; so it had to be reverted. This commit fixes the problem using an approach suggested by Andres Freund, whereby the initial portion(s) of the split-up WAL record are kept, and a special type of WAL record is written where the contrecord was lost, so that WAL replay in the replica knows to skip the broken parts. With this approach, we can continue to stream/archive segment files as soon as they are complete, and replay of the broken records will proceed across the crash point without a hitch. Because a new type of WAL record is added, users should be careful to upgrade standbys first, primaries later. Otherwise they risk the standby being unable to start if the primary happens to write such a record. A new TAP test that exercises this is added, but the portability of it is yet to be seen. This has been wrong since the introduction of physical replication, so backpatch all the way back. In stable branches, keep the new XLogReaderState members at the end of the struct, to avoid an ABI break. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Nathan Bossart <bossartn@amazon.com> Discussion: https://postgr.es/m/202108232252.dh7uxf6oxwcy@alvherre.pgsql
* Add forgotten LSN_FORMAT_ARGS() in xlogreader.cMichael Paquier2021-07-09
| | | | | These should have been part of 4035cd5, that introduced LZ4 support for wal_compression.
* Add support for LZ4 with compression of full-page writes in WALMichael Paquier2021-06-29
| | | | | | | | | | | | | | | | | | | | | | The logic is implemented so as there can be a choice in the compression used when building a WAL record, and an extra per-record bit is used to track down if a block is compressed with PGLZ, LZ4 or nothing. wal_compression, the existing parameter, is changed to an enum with support for the following backward-compatible values: - "off", the default, to not use compression. - "pglz" or "on", to compress FPWs with PGLZ. - "lz4", the new mode, to compress FPWs with LZ4. Benchmarking has showed that LZ4 outclasses easily PGLZ. ZSTD would be also an interesting choice, but going just with LZ4 for now makes the patch minimalistic as toast compression is already able to use LZ4, so there is no need to worry about any build-related needs for this implementation. Author: Andrey Borodin, Justin Pryzby Reviewed-by: Dilip Kumar, Michael Paquier Discussion: https://postgr.es/m/3037310D-ECB7-4BF1-AF20-01C10BB33A33@yandex-team.ru
* Revert recovery prefetching feature.Thomas Munro2021-05-10
| | | | | | | | | | | | | | | | | | This set of commits has some bugs with known fixes, but at this late stage in the release cycle it seems best to revert and resubmit next time, along with some new automated test coverage for this whole area. Commits reverted: dc88460c: Doc: Review for "Optionally prefetch referenced data in recovery." 1d257577: Optionally prefetch referenced data in recovery. f003d9f8: Add circular WAL decoding buffer. 323cbe7c: Remove read_page callback from XLogReader. Remove the new GUC group WAL_RECOVERY recently added by a55a9847, as the corresponding section of config.sgml is now reverted. Discussion: https://postgr.es/m/CAOuzzgrn7iKnFRsB4MHp3UisEQAGgZMbk_ViTN4HV4-Ksq8zCg%40mail.gmail.com
* Add some forgotten LSN_FORMAT_ARGS() in xlogreader.cMichael Paquier2021-04-24
| | | | | | | | | | 6f6f284 has introduced a specific macro to make printf()-ing of LSNs easier. This takes care of what looks like the remaining code paths that did not get the call. Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, Tom Lane Discussion: https://postgr.es/m/YIJS9x6K8ruizN7j@paquier.xyz
* Optionally prefetch referenced data in recovery.Thomas Munro2021-04-08
| | | | | | | | | | | | | | | | | | | | | | | | Introduce a new GUC recovery_prefetch, disabled by default. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Better mechanisms will follow in later work on the I/O subsystem. The GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os we allow ourselves to initiate, based on pessimistic heuristics used to infer that I/Os have begun and completed. The GUC wal_decode_buffer_size is used to limit the maximum distance we are prepared to read ahead in the WAL to find uncached blocks. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts) Reviewed-by: Andres Freund <andres@anarazel.de> (parts) Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
* Add circular WAL decoding buffer.Thomas Munro2021-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | Teach xlogreader.c to decode its output into a circular buffer, to support optimizations based on looking ahead. * XLogReadRecord() works as before, consuming records one by one, and allowing them to be examined via the traditional XLogRecGetXXX() macros. * An alternative new interface XLogNextRecord() is added that returns pointers to DecodedXLogRecord structs that can be examined directly. * XLogReadAhead() provides a second cursor that lets you see further ahead, as long as data is available and there is enough space in the decoding buffer. This returns DecodedXLogRecord pointers to the caller, but also adds them to a queue of records that will later be consumed by XLogNextRecord()/XLogReadRecord(). The buffer's size is controlled with wal_decode_buffer_size. The buffer could potentially be placed into shared memory, for future projects. Large records that don't fit in the circular buffer are called "oversized" and allocated separately with palloc(). Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
* Remove read_page callback from XLogReader.Thomas Munro2021-04-08
| | | | | | | | | | | | | | | | | | | | | Previously, the XLogReader module would fetch new input data using a callback function. Redesign the interface so that it tells the caller to insert more data with a special return value instead. This API suits later patches for prefetching, encryption and maybe other future projects that would otherwise require continually extending the callback interface. As incidental cleanup work, move global variables readOff, readLen and readSegNo inside XlogReaderState. Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> Author: Heikki Linnakangas <hlinnaka@iki.fi> (parts of earlier version) Reviewed-by: Antonin Houska <ah@cybertec.at> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Takashi Menjo <takashi.menjo@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/20190418.210257.43726183.horiguchi.kyotaro%40lab.ntt.co.jp
* Simplify printing of LSNsPeter Eisentraut2021-02-23
| | | | | | | | | | Add a macro LSN_FORMAT_ARGS for use in printf-style printing of LSNs. Convert all applicable code to use it. Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ub5NaTELZ3hJUCE6amuvqAtsSxc7O+uK7y4t9Rrk23cw@mail.gmail.com
* Fix thinko in commentAlvaro Herrera2021-01-12
| | | | | | | | This comment has been wrong since its introduction in commit 2c03216d8311. Author: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoAzz6qipFJBbGEaHmyWxvvNDp8httbwLR9tUQWaTjUs2Q@mail.gmail.com
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Report expected contrecord length on mismatchAlvaro Herrera2020-09-04
| | | | | | | | When reading a WAL record fails to find continuation record(s) of the proper length, report what it expects, for clarity. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/20200903212152.GA15319@alvherre.pgsql
* Rename VariableCacheData.nextFullXid to nextXid.Andres Freund2020-08-11
| | | | | | | | | Including Full in variable names duplicates the type information and leads to overly long names. As FullTransactionId cannot accidentally be casted to TransactionId that does not seem necessary. Author: Andres Freund Discussion: https://postgr.es/m/20200724011143.jccsyvsvymuiqfxu@alap3.anarazel.de
* Immediately WAL-log subtransaction and top-level XID association.Amit Kapila2020-07-20
| | | | | | | | | | | | | | | | | | | | The logical decoding infrastructure needs to know which top-level transaction the subxact belongs to, in order to decode all the changes. Until now that might be delayed until commit, due to the caching (GPROC_MAX_CACHED_SUBXIDS), preventing features requiring incremental decoding. So we also write the assignment info into WAL immediately, as part of the next WAL record (to minimize overhead) only when wal_level=logical. We can not remove the existing XLOG_XACT_ASSIGNMENT WAL as that is required for avoiding overflow in the hot standby snapshot. Bump XLOG_PAGE_MAGIC, since this introduces XLR_BLOCK_ID_TOPLEVEL_XID. Author: Tomas Vondra, Dilip Kumar, Amit Kapila Reviewed-by: Amit Kapila Tested-by: Neha Sharma and Mahendra Singh Thalor Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
* Fix crash in WAL sender when starting physical replicationMichael Paquier2020-06-08
| | | | | | | | | | | | | | | | | | | | | Since database connections can be used with WAL senders in 9.4, it is possible to use physical replication. This commit fixes a crash when starting physical replication with a WAL sender using a database connection, caused by the refactoring done in 850196b. There have been discussions about forbidding the use of physical replication in a database connection, but this is left for later, taking care only of the crash new to 13. While on it, add a test to check for a failure when attempting logical replication if the WAL sender does not have a database connection. This part is extracted from a larger patch by Kyotaro Horiguchi. Reported-by: Vladimir Sitnikov Author: Michael Paquier, Kyotaro Horiguchi Reviewed-by: Kyotaro Horiguchi, Álvaro Herrera Discussion: https://postgr.es/m/CAB=Je-GOWMj1PTPkeUhjqQp-4W3=nW-pXe2Hjax6rJFffB5_Aw@mail.gmail.com Backpatch-through: 13
* Initial pgindent and pgperltidy run for v13.Tom Lane2020-05-14
| | | | | | | | | | | Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit c9d297751, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.
* Adjust walsender usage of xlogreader, simplify APIsAlvaro Herrera2020-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Have both physical and logical walsender share a 'xlogreader' state struct for tracking state. This replaces the existing globals sendSeg and sendCxt. * Change WALRead not to receive XLogReaderState->seg and ->segcxt as separate arguments anymore; just use the ones from 'state'. This is made possible by the above change. * have the XLogReader segment_open contract require the callbacks to install the file descriptor in the state struct themselves instead of returning it. xlogreader was already ignoring any possible failed return from the callbacks, relying solely on them never returning. (This point is not altogether excellent, as it means the callbacks have to know more of XLogReaderState; but to really improve on that we would have to pass back error info from the callbacks to xlogreader. And the complexity would not be saved but instead just transferred to the callers of WALRead, which would have to learn how to throw errors from the open_segment callback in addition of, as currently, from pg_pread.) * segment_open no longer receives the 'segcxt' as a separate argument, since it's part of the XLogReaderState argument. Per comments from Kyotaro Horiguchi. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200511203336.GA9913@alvherre.pgsql
* Rework XLogReader callback systemAlvaro Herrera2020-05-08
| | | | | | | | | | | | | | | | | | | Code review for 0dc8ead46363, prompted by a bug closed by 91c40548d5f7. XLogReader's system for opening and closing segments had gotten too complicated, with callbacks being passed at both the XLogReaderAllocate level (read_page) as well as at the WALRead level (segment_open). This was confusing and hard to follow, so restructure things so that these callbacks are passed together at XLogReaderAllocate time, and add another callback to the set (segment_close) to make it a coherent whole. Also, ensure XLogReaderState is an argument to all the callbacks, so that they can grab at the ->private data if necessary. Document the whole arrangement more clearly. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20200422175754.GA19858@alvherre.pgsql
* Fix XLogReader FD leak that makes backends unusable after 2PC usage.Andres Freund2020-04-07
| | | | | | | | | | | | | | | | | | | | | | | | Before the fix every 2PC commit/abort leaked a file descriptor. As the files are opened using BasicOpenFile(), that quickly leads to the backend running out of file descriptors. Once enough 2PC abort/commit have caused enough FDs to leak, any IO in the backend will fail with "Too many open files", as BasicOpenFilePerm() will have triggered all open files known to fd.c to be closed. The leak causing the problem at hand is a consequence of 0dc8ead46, but is only exascerbated by it. Previously most XLogPageReadCB callbacks used static variables to cache one open file, but after the commit the cache is private to each XLogReader instance. There never was infrastructure to close FDs at the time of XLogReaderFree, but the way XLogReader was used limited the leak to one FD. This commit just closes the during XLogReaderFree() if the FD is stored in XLogReaderState.seg.ws_segno. This may not be the way to solve this medium/long term, but at least unbreaks 2PC. Discussion: https://postgr.es/m/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de
* Allow walreceiver configuration to change on reloadAlvaro Herrera2020-03-27
| | | | | | | | | | | | | | | The parameters primary_conninfo, primary_slot_name and wal_receiver_create_temp_slot can now be changed with a simple "reload" signal, no longer requiring a server restart. This is achieved by signalling the walreceiver process to terminate and having it start again with the new values. Thanks to Andres Freund, Kyotaro Horiguchi, Fujii Masao for discussion. Author: Sergei Kornilov <sk@zsrv.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/19513901543181143@sas1-19a94364928d.qloud-c.yandex.net
* Refactor XLogReadRecord(), adding XLogBeginRead() function.Heikki Linnakangas2020-01-26
| | | | | | | | | | | | | | | | | | | | The signature of XLogReadRecord() required the caller to pass the starting WAL position as argument, or InvalidXLogRecPtr to continue reading at the end of previous record. That's slightly awkward to the callers, as most of them don't want to randomly jump around in the WAL stream, but start reading at one position and then read everything from that point onwards. Remove the 'RecPtr' argument and add a new function XLogBeginRead() to specify the starting position instead. That's more convenient for the callers. Also, xlogreader holds state that is reset when you change the starting position, so having a separate function for doing that feels like a more natural fit. This changes XLogFindNextRecord() function so that it doesn't reset the xlogreader's state to what it was before the call anymore. Instead, it positions the xlogreader to the found record, like XLogBeginRead(). Reviewed-by: Kyotaro Horiguchi, Alvaro Herrera Discussion: https://www.postgresql.org/message-id/5382a7a3-debe-be31-c860-cb810c08f366%40iki.fi
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Refactor WAL file-reading code into WALRead()Alvaro Herrera2019-11-25
| | | | | | | | | | | | | | | | | | | | | XLogReader, walsender and pg_waldump all had their own routines to read data from WAL files to memory, with slightly different approaches according to the particular conditions of each environment. There's a lot of commonality, so we can refactor that into a single routine WALRead in XLogReader, and move the differences to a separate (simpler) callback that just opens the next WAL-segment. This results in a clearer (ahem) code flow. The error reporting needs are covered by filling in a new error-info struct, WALReadError, and it's the caller's responsibility to act on it. The backend has WALReadRaiseError() to do so. We no longer ever need to seek in this interface; switch to using pg_pread(). Author: Antonin Houska, with contributions from Álvaro Herrera Reviewed-by: Michaël Paquier, Kyotaro Horiguchi Discussion: https://postgr.es/m/14984.1554998742@spoje.net
* Make the order of the header file includes consistent in backend modules.Amit Kapila2019-11-12
| | | | | | | | | | | Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order of header file inclusion consistent for backend modules. In the passing, removed a couple of duplicate inclusions. Author: Vignesh C Reviewed-by: Kuntal Ghosh and Amit Kapila Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
* Fix assertion failure when running pgbench -s.Fujii Masao2019-11-07
| | | | | | | | | | | | | | | | | | If there is the WAL page that the continuation WAL record just fits within (i.e., the continuation record ends just at the end of the page) and the LSN in such page is specified with -s option, previously pg_waldump caused an assertion failure. The cause of this assertion failure was that XLogFindNextRecord() that pg_waldump -s calls mistakenly handled such special WAL page. This commit changes XLogFindNextRecord() so that it can handle such WAL page correctly. Back-patch to all supported versions. Author: Andrey Lepikhov Reviewed-by: Fujii Masao, Michael Paquier Discussion: https://postgr.es/m/99303554-5dd5-06e6-f943-b3005ccd6edd@postgrespro.ru
* Fix comment in xlogreader.cMichael Paquier2019-09-26
| | | | | | | | | This has been introduced by 709d003, that has moved readSegNo, readOff and readPageTLI into a new structure called WALOpenSegment initialized separately. Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20190926.110809.248342687.horikyota.ntt@gmail.com
* Rework WAL-reading supporting structsAlvaro Herrera2019-09-24
| | | | | | | | | | | | | | The state-tracking of WAL reading in various places was pretty messy, mostly because the ancient physical-replication WAL reading code wasn't using the XLogReader abstraction. This led to some untidy code. Make it prettier by creating two additional supporting structs, WALSegmentContext and WALOpenSegment which keep track of WAL-reading state. This makes code cleaner, as well as supports more future cleanup. Author: Antonin Houska Reviewed-by: Álvaro Herrera and (older versions) Robert Haas Discussion: https://postgr.es/m/14984.1554998742@spoje.net
* Make XLogReaderInvalReadState staticAlvaro Herrera2019-09-03
| | | | | | | | | | | | | This function is only used by xlogreader.c itself, so there's no need to export it. It was introduced by commit 3b02ea4f0780 with the apparent intention that it could be used externally, but I couldn't find any external code calling it. I (Álvaro) couldn't resist the urge to sort nearby function prototypes properly while at it. Author: Antonin Houska Discussion: https://postgr.es/m/14984.1554998742@spoje.net
* Fix inconsistencies and typos in the tree, take 10Michael Paquier2019-08-13
| | | | | | | | | This addresses some issues with unnecessary code comments, fixes various typos in docs and comments, and removes some orphaned structures and definitions. Author: Alexander Lakhin Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com
* Remove superfluous semicolon.Andres Freund2019-07-30
| | | | Author: Andres Freund
* Provide XLogRecGetFullXid().Thomas Munro2019-07-15
| | | | | | | | | | | | | | | In order to be able to work with FullTransactionId values during replay without increasing the size of the WAL, infer the epoch. In general we can't do that safely, but during replay we can because we know that nextFullXid can't advance concurrently. Prevent frontend code from seeing this new function, due to the above restriction. Perhaps in future it will be possible to extract the value entirely from independent WAL records, and then this restriction can be lifted. Author: Thomas Munro, based on earlier code from Andres Freund Discussion: https://postgr.es/m/CA%2BhUKG%2BmLmuDjMi6o1dxkKvGRL56Y2Rz%2BiXAcrZV03G9ZuFQ8Q%40mail.gmail.com
* Unwind some workarounds for lack of portable int64 format specifierPeter Eisentraut2019-07-04
| | | | | | | | | | | | | | Because there is no portable int64/uint64 format specifier and we can't stick macros like INT64_FORMAT into the middle of a translatable string, we have been using various workarounds that put the number to be printed into a string buffer first. Now that we always use our own sprintf(), we can rely on %lld and %llu to work, so we can use those. This patch undoes this workaround in a few places where it was egregiously verbose. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/CAH2-Wz%3DWbNxc5ob5NJ9yqo2RMJ0q4HXDS30GVCobeCvC9A1L9A%40mail.gmail.com
* Phase 2 pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | | Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
* Add support for partial TOAST decompressionStephen Frost2019-04-02
| | | | | | | | | | | | | | | | When asked for a slice of a TOAST entry, decompress enough to return the slice instead of decompressing the entire object. For use cases where the slice is at, or near, the beginning of the entry, this avoids a lot of unnecessary decompression work. This changes the signature of pglz_decompress() by adding a boolean to indicate if it's ok for the call to finish before consuming all of the source or destination buffers. Author: Paul Ramsey Reviewed-By: Rafia Sabih, Darafei Praliaskouski, Regina Obe Discussion: https://postgr.es/m/CACowWR07EDm7Y4m2kbhN_jnys%3DBBf9A6768RyQdKm_%3DNpkcaWg%40mail.gmail.com
* Avoid some unnecessary block reads in WAL readerMichael Paquier2019-02-18
| | | | | | | | | | | | | | | | When reading a new page internally and depending on the way the WAL reader facility gets used by plugins, the current implementation of the WAL reader may finish by reading a block multiple times while it is not actually necessary as the requested data length may be equal to what has been already read. This can happen for any size, but is more likely to happen at the end of a page. This can cause performance penalties in plugins which rely on the block reads to be purely sequential, zlib not liking backward reads for example. The new behavior also shaves some cycles when doing recovery. Author: Arthur Zakirov Reviewed-by: Andrey Lepikhov, Michael Paquier, Grigory Smolkin Discussion: https://postgr.es/m/2ddf4a32-517e-d6f4-d992-4a63b6035bfd@postgrespro.ru
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4