aboutsummaryrefslogtreecommitdiff
path: root/src/backend/storage/file
Commit message (Collapse)AuthorAge
...
* Don't leak descriptors into subprograms.Thomas Munro2023-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open long-lived data and WAL file descriptors with O_CLOEXEC. This flag was introduced by SUSv4 (POSIX.1-2008), and by now all of our target Unix systems have it. Our open() implementation for Windows already had that behavior, so provide a dummy O_CLOEXEC flag on that platform. For now, callers of open() and the "thin" wrappers in fd.c that deal in raw descriptors need to pass in O_CLOEXEC explicitly if desired. This commit does that for WAL files, and automatically for everything accessed via VFDs including SMgrRelation and BufFile. (With more discussion we might decide to turn it on automatically for the thin open()-wrappers too to avoid risk of missing places that need it, but these are typically used for short-lived descriptors where we don't expect to fork/exec, and it's remotely possible that extensions could be using these APIs and passing descriptors to subprograms deliberately, so that hasn't been done here.) Do the same for sockets and the postmaster pipe with FD_CLOEXEC. (Later commits might use modern interfaces to remove these extra fcntl() calls and more where possible, but we'll need them as a fallback for a couple of systems, so do it that way in this initial commit.) With this change, subprograms executed for archiving, copying etc will no longer have access to the server's descriptors, other than the ones that we decide to pass down. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
* Remove obsolete coding for early macOS.Thomas Munro2023-02-22
| | | | | | | | | | | | | | | Commits 04cad8f7 and 0c088568 supported old macOS systems that didn't define O_CLOEXEC or O_DSYNC yet, but those arrived in macOS releases 10.7 and 10.6 (respectively), which themselves reached EOL around a decade ago. We've already made use of other POSIX features that early macOS vintages can't compile (for example commits 623cc673, d2e15083). A later commit will use O_CLOEXEC on POSIX systems so it would be strange to pretend here that it's optional, and we might as well give O_DSYNC the same treatment since the reference is also guarded by a test for a macOS-specific macro, and we know that current Macs have it. Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
* Zero initialize uses of instr_time about to trigger compiler warningsAndres Freund2023-01-20
| | | | | | | | | These are all not necessary from a correctness POV. However, in the near future instr_time will be simplified to an int64, at which point gcc would otherwise start to warn about the changed places. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/20230116023639.rn36vf6ajqmfciua@awork3.anarazel.de
* Constify the arguments of copydir.h functionsMichael Paquier2023-01-18
| | | | | | | | | | This makes sure that the internal logic of these functions does not attempt to change the value of the arguments constified, and it removes one unconstify() in basic_archive.c. Author: Nathan Bossart Reviewed-by: Andrew Dunstan, Peter Eisentraut Discussion: https://postgr.es/m/20230114231126.GA2580330@nathanxps13
* Add BufFileRead variants with short read and EOF detectionPeter Eisentraut2023-01-16
| | | | | | | | | | | | | | | Most callers of BufFileRead() want to check whether they read the full specified length. Checking this at every call site is very tedious. This patch provides additional variants BufFileReadExact() and BufFileReadMaybeEOF() that include the length checks. I considered changing BufFileRead() itself, but this function is also used in extensions, and so changing the behavior like this would create a lot of problems there. The new names are analogous to the existing LogicalTapeReadExact(). Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f3501945-c591-8cc3-5ef0-b72a2e0eaa9c@enterprisedb.com
* Fix pg_truncate() on Windows.Thomas Munro2023-01-06
| | | | | | | | | | | | | Commit 57faaf376 added pg_truncate(const char *path, off_t length), but "length" was ignored under WIN32 and the file was unconditionally truncated to 0. There was no live bug, since the only caller passes 0. Fix, and back-patch to 14 where the function arrived. Author: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20230106031652.GR3109%40telsasoft.com
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Add const to BufFileWritePeter Eisentraut2022-12-30
| | | | | | | | Make data buffer argument to BufFileWrite a const pointer and bubble this up to various callers and related APIs. This makes the APIs clearer and more consistent. Discussion: https://www.postgresql.org/message-id/flat/11dda853-bb5b-59ba-a746-e168b1ce4bdb%40enterprisedb.com
* Remove unnecessary castsPeter Eisentraut2022-12-30
| | | | | | | | Some code carefully cast all data buffer arguments for data write and read function calls to void *, even though the respective arguments are already void *. Remove this unnecessary clutter. Discussion: https://www.postgresql.org/message-id/flat/11dda853-bb5b-59ba-a746-e168b1ce4bdb%40enterprisedb.com
* Add copyright notices to meson filesAndrew Dunstan2022-12-20
| | | | Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net
* Update types in File APIPeter Eisentraut2022-12-08
| | | | | | | | | | | | | Make the argument types of the File API match stdio better: - Change the data buffer to void *, from char *. - Change FileWrite() data buffer to const on top of that. - Change amounts to size_t, from int. In passing, change the FilePrefetch() amount argument from int to off_t, to match the underlying posix_fadvise(). Discussion: https://www.postgresql.org/message-id/flat/11dda853-bb5b-59ba-a746-e168b1ce4bdb%40enterprisedb.com
* Remove unneeded includes of <sys/stat.h>Michael Paquier2022-11-05
| | | | | | | | | | Since bfb9dfd, none of the files updated in this commit have any stat() calls, so these inclusions are not necessary, for the same reasons as 233cf6e. Per discussion with John Naylor. Discussion: https://postgr.es/m/CAFBsxsGGGX7KD6RxbNoSJzuSc8Gz3hOxcfhTOMLB_hJcm68dKQ@mail.gmail.com
* Move pg_pwritev_with_retry() to src/common/file_utils.cMichael Paquier2022-10-27
| | | | | | | | | | | | | This commit moves pg_pwritev_with_retry(), a convenience wrapper of pg_writev() able to handle partial writes, to common/file_utils.c so that the frontend code is able to use it. A first use-case targetted for this routine is pg_basebackup and pg_receivewal, for the zero-padding of a newly-initialized WAL segment. This is used currently in the backend when the GUC wal_init_zero is enabled (default). Author: Bharath Rupireddy Reviewed-by: Nathan Bossart, Thomas Munro Discussion: https://postgr.es/m/CALj2ACUq7nAb7=bJNbK3yYmp-SZhJcXFR_pLk8un6XgDzDF3OA@mail.gmail.com
* Restore pg_pread and friends.Thomas Munro2022-09-29
| | | | | | | | | | | | | | | | | | | | Commits cf112c12 and a0dc8271 were a little too hasty in getting rid of the pg_ prefixes where we use pread(), pwrite() and vectored variants. We dropped support for ancient Unixes where we needed to use lseek() to implement replacements for those, but it turns out that Windows also changes the current position even when you pass in an offset to ReadFile() and WriteFile() if the file handle is synchronous, despite its documentation saying otherwise. Switching to asynchronous file handles would fix that, but have other complications. For now let's just put back the pg_ prefix and add some comments to highlight the non-standard side-effect, which we can now describe as Windows-only. Reported-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/20220923202439.GA1156054%40nathanxps13
* Revert 56-bit relfilenode change and follow-up commits.Robert Haas2022-09-28
| | | | | | | | There are still some alignment-related failures in the buildfarm, which might or might not be able to be fixed quickly, but I've also just realized that it increased the size of many WAL records by 4 bytes because a block reference contains a RelFileLocator. The effect of that hasn't been studied or discussed, so revert for now.
* Increase width of RelFileNumbers from 32 bits to 56 bits.Robert Haas2022-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | RelFileNumbers are now assigned using a separate counter, instead of being assigned from the OID counter. This counter never wraps around: if all 2^56 possible RelFileNumbers are used, an internal error occurs. As the cluster is limited to 2^64 total bytes of WAL, this limitation should not cause a problem in practice. If the counter were 64 bits wide rather than 56 bits wide, we would need to increase the width of the BufferTag, which might adversely impact buffer lookup performance. Also, this lets us use bigint for pg_class.relfilenode and other places where these values are exposed at the SQL level without worrying about overflow. This should remove the need to keep "tombstone" files around until the next checkpoint when relations are removed. We do that to keep RelFileNumbers from being recycled, but now that won't happen anyway. However, this patch doesn't actually change anything in this area; it just makes it possible for a future patch to do so. Dilip Kumar, based on an idea from Andres Freund, who also reviewed some earlier versions of the patch. Further review and some wordsmithing by me. Also reviewed at various points by Ashutosh Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane. Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
* meson: Add initial version of meson based build systemAndres Freund2022-09-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Autoconf is showing its age, fewer and fewer contributors know how to wrangle it. Recursive make has a lot of hard to resolve dependency issues and slow incremental rebuilds. Our home-grown MSVC build system is hard to maintain for developers not using Windows and runs tests serially. While these and other issues could individually be addressed with incremental improvements, together they seem best addressed by moving to a more modern build system. After evaluating different build system choices, we chose to use meson, to a good degree based on the adoption by other open source projects. We decided that it's more realistic to commit a relatively early version of the new build system and mature it in tree. This commit adds an initial version of a meson based build system. It supports building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD, Solaris and Windows (however only gcc is supported on aix, solaris). For Windows/MSVC postgres can now be built with ninja (faster, particularly for incremental builds) and msbuild (supporting the visual studio GUI, but building slower). Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM bitcode generation, documentation adjustments) are done in subsequent commits requiring further review. Other aspects (e.g. not installing test-only extensions) are not yet addressed. When building on Windows with msbuild, builds are slower when using a visual studio version older than 2019, because those versions do not support MultiToolTask, required by meson for intra-target parallelism. The plan is to remove the MSVC specific build system in src/tools/msvc soon after reaching feature parity. However, we're not planning to remove the autoconf/make build system in the near future. Likely we're going to keep at least the parts required for PGXS to keep working around until all supported versions build with meson. Some initial help for postgres developers is at https://wiki.postgresql.org/wiki/Meson With contributions from Thomas Munro, John Naylor, Stone Tickle and others. Author: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
* Improve some error messages.Amit Kapila2022-09-21
| | | | | | | | It is not our usual style to use "we" in the error messages. Author: Kyotaro Horiguchi Reviewed-By: Amit Kapila Discussion: https://postgr.es/m/20220914.111507.13049297635620898.horikyota.ntt@gmail.com
* Harmonize parameter names in storage and AM code.Peter Geoghegan2022-09-19
| | | | | | | | | | | | | | | Make sure that function declarations use names that exactly match the corresponding names from function definitions in storage, catalog, access method, executor, and logical replication code, as well as in miscellaneous utility/library code. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
* Expand the use of get_dirent_type(), shaving a few calls to stat()/lstat()Michael Paquier2022-09-02
| | | | | | | | | | | | | | | | | | | Several backend-side loops scanning one or more directories with ReadDir() (WAL segment recycle/removal in xlog.c, backend-side directory copy, temporary file removal, configuration file parsing, some logical decoding logic and some pgtz stuff) already know the type of the entry being scanned thanks to the dirent structure associated to the entry, on platforms where we know about DT_REG, DT_DIR and DT_LNK to make the difference between a regular file, a directory and a symbolic link. Relying on the direct structure of an entry saves a few system calls to stat() and lstat() in the loops updated here, shaving some code while on it. The logic of the code remains the same, calling stat() or lstat() depending on if it is necessary to look through symlinks. Authors: Nathan Bossart, Bharath Rupireddy Reviewed-by: Andres Freund, Thomas Munro, Michael Paquier Discussion: https://postgr.es/m/CALj2ACV8n-J-f=yiLUOx2=HrQGPSOZM3nWzyQQvLPcccPXxEdg@mail.gmail.com
* Clean up inconsistent use of fflush().Tom Lane2022-08-29
| | | | | | | | | | | | | | | | | | | | | | More than twenty years ago (79fcde48b), we hacked the postmaster to avoid a core-dump on systems that didn't support fflush(NULL). We've mostly, though not completely, hewed to that rule ever since. But such systems are surely gone in the wild, so in the spirit of cleaning out no-longer-needed portability hacks let's get rid of multiple per-file fflush() calls in favor of using fflush(NULL). Also, we were fairly inconsistent about whether to fflush() before popen() and system() calls. While we've received no bug reports about that, it seems likely that at least some of these call sites are at risk of odd behavior, such as error messages appearing in an unexpected order. Rather than expend a lot of brain cells figuring out which places are at hazard, let's just establish a uniform coding rule that we should fflush(NULL) before these calls. A no-op fflush() is surely of trivial cost compared to launching a sub-process via a shell; while if it's not a no-op then we likely need it. Discussion: https://postgr.es/m/2923412.1661722825@sss.pgh.pa.us
* Remove configure probe for sys/resource.h and refactor.Thomas Munro2022-08-14
| | | | | | | | | | | | <sys/resource.h> is in SUSv2 and is on all targeted Unix systems. We have a replacement for getrusage() on Windows, so let's just move its declarations into src/include/port/win32/sys/resource.h so that we can use a standard-looking #include. Also remove an obsolete reference to CLK_TCK. Also rename src/port/getrusage.c to win32getrusage.c, following the convention for Windows-only fallback code. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
* Replace pgwin32_is_junction() with lstat().Thomas Munro2022-08-06
| | | | | | | | | | | | | | | | | | | Now that lstat() reports junction points with S_IFLNK/S_ISLINK(), and unlink() can unlink them, there is no need for conditional code for Windows in a few places. That was expressed by testing for WIN32 or S_ISLNK, which we can now constant-fold. The coding around pgwin32_is_junction() was a bit suspect anyway, as we never checked for errors, and we also know that errors can be spuriously reported because of transient sharing violations on this OS. The lstat()-based code has handling for that. This also reverts 4fc6b6ee on master only. That was done because lstat() didn't previously work for symlinks (junction points), but now it does. Tested-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/CA%2BhUKGLfOOeyZpm5ByVcAt7x5Pn-%3DxGRNCvgiUPVVzjFLtnY0w%40mail.gmail.com
* Remove configure probe for fdatasync.Thomas Munro2022-08-05
| | | | | | | | | | | | | | | | | fdatasync() is in SUSv2, and all targeted Unix systems have it. We have a replacement function for Windows. We retain the probe for the function declaration, which allows us to supply the mysteriously missing declaration for macOS, and also for Windows. No need to keep a HAVE_FDATASYNC macro around. Also rename src/port/fdatasync.c to win32fdatasync.c since it's only for Windows. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
* Simplify replacement code for preadv and pwritev.Thomas Munro2022-08-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | preadv() and pwritev() are not standardized by POSIX, but appeared in NetBSD in 1999 and were adopted by at least OpenBSD, FreeBSD, DragonFlyBSD, Linux, AIX, illumos and macOS. We don't use them much yet, but an active proposal uses them heavily. In 15, we had two replacement implementations for other OSes: one based on lseek() + -v function if available for true vector I/O, and the other based on a loop over p- function. The former would be an obstacle to hypothetical future multi-threaded code sharing file descriptors, while the latter would not, since commit cf112c12. Furthermore, the number of targeted systems that could benefit from the former's potential upside has dwindled to just one niche OS, since macOS added the functions and we de-supported HP-UX. That doesn't seem like a good trade-off. Therefore, drop the lseek()-based variant, and also the pg_ prefix now that the file position portability hazard is gone. At the time of writing, the only systems in our build farm that lack native preadv/pwritev and thus use fallback code are: * Solaris (but not illumos) * macOS before release 11.0 * Windows With this commit, the above systems will now use the *same* fallback code, the version that loops over pread()/pwrite(). Windows already used that (though a later proposal may include true vector I/O for Windows), so this decision really only affects Solaris, until it gets around to adding these system calls. Also remove some useless includes while here. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
* Remove dead pread and pwrite replacement code.Thomas Munro2022-08-05
| | | | | | | | | | | | | | | | | | | | | | | pread() and pwrite() are in SUSv2, and all targeted Unix systems have them. Previously, we defined pg_pread and pg_pwrite to emulate these function with lseek() on old Unixen. The names with a pg_ prefix were a reminder of a portability hazard: they might change the current file position. That hazard is gone, so we can drop the prefixes. Since the remaining replacement code is Windows-only, move it into src/port/win32p{read,write}.c, and move the declarations into src/include/port/win32_port.h. No need for vestigial HAVE_PREAD, HAVE_PWRITE macros as they were only used for declarations in port.h which have now moved into win32_port.h. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
* Remove configure probe and related tests for getrlimit.Thomas Munro2022-08-05
| | | | | | | | | | | | | | | | | | getrlimit() is in SUSv2 and all targeted systems have it. Windows doesn't have it. We could just use #ifndef WIN32, but for a little more explanation about why we're making things conditional, let's retain the HAVE_GETRLIMIT macro. It's defined in port.h for Unix systems. On systems that have it, it's not necessary to test for RLIMIT_CORE, RLIMIT_STACK or RLIMIT_NOFILE macros, since SUSv2 requires those and all targeted systems have them. Also remove references to a pre-historic alternative spelling of RLIMIT_NOFILE, and coding that seemed to believe that Cygwin didn't have it. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
* Clean up some residual confusion between OIDs and RelFileNumbers.Robert Haas2022-07-28
| | | | | | | | | | | | Commit b0a55e43299c4ea2a9a8c757f9c26352407d0ccc missed a few places where we are referring to the number used as a part of the relation filename as an "OID". We now want to call that a "RelFileNumber". Some of these places actually made it sound like the OID in question is pg_class.oid rather than pg_class.relfilenode, which is especially good to clean up. Dilip Kumar with some editing by me.
* Remove durable_rename_excl()Michael Paquier2022-07-05
| | | | | | | | | | | | A previous commit replaced all the calls to this function with durable_rename() as of dac1ff3, making it used nowhere in the tree. Using it in extension code is also risky based on the issues described in this previous commit, so let's remove it. This makes possible the removal of HAVE_WORKING_LINK. Author: Nathan Bossart Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
* Revert recent changes with durable_rename_excl()Michael Paquier2022-04-28
| | | | | | | | | | | | | | | This reverts commits 2c902bb and ccfbd92. Per buildfarm members kestrel, rorqual and calliphoridae, the assertions checking that a TLI history file should not exist when created by a WAL receiver have been failing, and switching to durable_rename() over durable_rename_excl() would cause the newest TLI history file to overwrite the existing one. We need to think harder about such cases, so revert the new logic for now. Note that all the failures have been reported in the test 025_stuck_on_old_timeline. Discussion: https://postgr.es/m/511362.1651116498@sss.pgh.pa.us
* Remove durable_rename_excl()Michael Paquier2022-04-28
| | | | | | | | | | ccfbd92 has replaced all existing in-core callers of this function in favor of durable_rename(). durable_rename_excl() is by nature unsafe on crashes happening at the wrong time, so just remove it. Author: Nathan Bossart Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
* Add missing spaces after single-line commentsDavid Rowley2022-04-14
| | | | | | | | | | | | | Only 1 of 3 of these changes appear to be handled by pgindent. That change is new to v15. The remaining two appear to be left alone by pgindent. The exact reason for that is not 100% clear to me. It seems related to the fact that it's a line that contains *only* a single line comment and no actual code. It does not seem worth investigating this in too much detail. In any case, these do not conform to our usual practices, so fix them. Author: Justin Pryzby Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
* Track I/O timing for temporary file blocks in EXPLAIN (BUFFERS)Michael Paquier2022-04-08
| | | | | | | | | | | | | | | | | | | | Previously, the output of EXPLAIN (BUFFERS) option showed only the I/O timing spent reading and writing shared and local buffers. This commit adds on top of that the I/O timing for temporary buffers in the output of EXPLAIN (for spilled external sorts, hashes, materialization. etc). This can be helpful for users in cases where the I/O related to temporary buffers is the bottleneck. Like its cousin, this information is available only when track_io_timing is enabled. Playing the patch, this is showing an extra overhead of up to 1% even when using gettimeofday() as implementation for interval timings, which is slightly within the usual range noise still that's measurable. Author: Masahiko Sawada Reviewed-by: Georgios Kokolatos, Melanie Plageman, Julien Rouhaud, Ranier Vilela Discussion: https://postgr.es/m/CAD21AoAJgotTeP83p6HiAGDhs_9Fw9pZ2J=_tYTsiO5Ob-V5GQ@mail.gmail.com
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Replace random(), pg_erand48(), etc with a better PRNG API and algorithm.Tom Lane2021-11-28
| | | | | | | | | | | | | | | | | | | Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating a bunch of platform dependencies as well as fundamentally-obsolete PRNG code. In addition, this API replacement will ease replacing the algorithm again in future, should that become necessary. xoroshiro128** is a few percent slower than the drand48 family, but it can produce full-width 64-bit random values not only 48-bit, and it should be much more trustworthy. It's likely to be noticeably faster than the platform's random(), depending on which platform you are thinking about; and we can have non-global state vectors easily, unlike with random(). It is not cryptographically strong, but neither are the functions it replaces. Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
* Report progress of startup operations that take a long time.Robert Haas2021-10-25
| | | | | | | | | | | | | | | | | | | | | | | | Users sometimes get concerned whe they start the server and it emits a few messages and then doesn't emit any more messages for a long time. Generally, what's happening is either that the system is taking a long time to apply WAL, or it's taking a long time to reset unlogged relations, or it's taking a long time to fsync the data directory, but it's not easy to tell which is the case. To fix that, add a new 'log_startup_progress_interval' setting, by default 10s. When an operation that is known to be potentially long-running takes more than this amount of time, we'll log a status update each time this interval elapses. To avoid undesirable log chatter, don't log anything about WAL replay when in standby mode. Nitin Jadhav and Robert Haas, reviewed by Amul Sul, Bharath Rupireddy, Justin Pryzby, Michael Paquier, and Álvaro Herrera. Discussion: https://postgr.es/m/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com Discussion: https://postgr.es/m/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com
* Clean up some code using "(expr) ? true : false"Michael Paquier2021-09-08
| | | | | | | | | | All the code paths simplified here were already using a boolean or used an expression that led to zero or one, making the extra bits unnecessary. Author: Justin Pryzby Reviewed-by: Tom Lane, Michael Paquier, Peter Smith Discussion: https://postgr.es/m/20210428182936.GE27406@telsasoft.com
* In count_usable_fds(), duplicate stderr not stdin.Tom Lane2021-09-02
| | | | | | | | | | | | | | | | | | | | We had a complaint that the postmaster fails to start if the invoking program closes stdin. That happens because count_usable_fds expects to be able to dup(0), and if it can't, we conclude there are no free FDs and go belly-up. So far as I can find, though, there is no other place in the server that touches stdin, and it's not unreasonable to expect that a daemon wouldn't use that file. As a simple improvement, let's dup FD 2 (stderr) instead. Unlike stdin, it *is* reasonable for us to expect that stderr be open; even if we are configured not to touch it, common libraries such as libc might try to write error messages there. Per gripe from Mario Emmenlauer. Given the lack of previous complaints, I'm not excited about pushing this into stable branches, but it seems OK to squeeze it into v14. Discussion: https://postgr.es/m/48bafc63-c30f-3962-2ded-f2e985d93e86@emmenlauer.de
* Optimize fileset usage in apply worker.Amit Kapila2021-09-02
| | | | | | | | | | | | | | | | Use one fileset for the entire worker lifetime instead of using separate filesets for each streaming transaction. Now, the changes/subxacts files for every streaming transaction will be created under the same fileset and the files will be deleted after the transaction is completed. This patch extends the BufFileOpenFileSet and BufFileDeleteFileSet APIs to allow users to specify whether to give an error on missing files. Author: Dilip Kumar, based on suggestion by Thomas Munro Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org
* Refactor sharedfileset.c to separate out fileset implementation.Amit Kapila2021-08-30
| | | | | | | | | | | | Move fileset related implementation out of sharedfileset.c to allow its usage by backends that don't want to share filesets among different processes. After this split, fileset infrastructure is used by both sharedfileset.c and worker.c for the named temporary files that survive across transactions. Author: Dilip Kumar, based on suggestion by Andres Freund Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org
* Move temporary file cleanup to before_shmem_exit().Andres Freund2021-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported by a few OSX buildfarm animals there exist at least one path where temporary files exist during AtProcExit_Files() processing. As temporary file cleanup causes pgstat reporting, the assertions added in ee3f8d3d3ae caused failures. This is not an OSX specific issue, we were just lucky that timing on OSX reliably triggered the problem. The known way to cause this is a FATAL error during perform_base_backup() with a MANIFEST used - adding an elog(FATAL) after InitializeBackupManifest() reliably reproduces the problem in isolation. The problem is that the temporary file created in InitializeBackupManifest() is not cleaned up via resource owner cleanup as WalSndResourceCleanup() currently is only used for non-FATAL errors. That then allows to reach AtProcExit_Files() with existing temporary files, causing the assertion failure. To fix this problem, move temporary file cleanup to a before_shmem_exit() hook and add assertions ensuring that no temporary files are created before / after temporary file management has been initialized / shut down. The cleanest way to do so seems to be to split fd.c initialization into two, one for plain file access and one for temporary file access. Right now there's no need to perform further fd.c cleanup during process exit, so I just renamed AtProcExit_Files() to BeforeShmemExit_Files(). Alternatively we could perform another pass through the files to check that no temporary files exist, but the added assertions seem to provide enough protection against that. It might turn out that the assertions added in ee3f8d3d3ae will cause too much noise - in that case we'll have to downgrade them to a WARNING, at least temporarily. This commit is not necessarily the best approach to address this issue, but it should resolve the buildfarm failures. We can revise later. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210807190131.2bm24acbebl4wl6i@alap3.anarazel.de
* Don't use #if inside function-like macro arguments.Thomas Munro2021-07-20
| | | | | | | | No concrete problem reported, but in the past it's been known to cause problems on some compilers so let's avoid doing that. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/234364.1626704007%40sss.pgh.pa.us
* Adjust commit 2dbe8905 for ancient macOS.Thomas Munro2021-07-19
| | | | | | | A couple of open flags used in an assertion didn't exist in macOS 10.4. Per build farm animal prairiedog. Also add O_EXCL while here (there are a few more standard flags but they're not relevant and likely to be missing).
* Support direct I/O on macOS.Thomas Munro2021-07-19
| | | | | | | | | | | | | | | | | | Macs don't understand O_DIRECT, but they can disable caching with a separate fcntl() call. Extend the file opening functions in fd.c to handle this for us if the caller passes in PG_O_DIRECT. For now, this affects only WAL data and even then only if you set: max_wal_senders=0 wal_level=minimal This is not expected to be very useful on its own, but later proposed patches will make greater use of direct I/O, and it'll be useful for testing if developers on Macs can see the effects. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKG%2BADiyyHe0cun2wfT%2BSVnFVqNYPxoO6J9zcZkVO7%2BNGig%40mail.gmail.com
* Message style improvementsPeter Eisentraut2021-06-28
|
* Initial pgindent and pgperltidy run for v14.Tom Lane2021-05-12
| | | | | | | | Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.
* Fix concurrency issues with WAL segment recycling on WindowsMichael Paquier2021-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit is mostly a revert of aaa3aed, that switched the routine doing the internal renaming of recycled WAL segments to use on Windows a combination of CreateHardLinkA() plus unlink() instead of rename(). As reported by several users of Postgres 13, this is causing concurrency issues when manipulating WAL segments, mostly in the shape of the following error: LOG: could not rename file "pg_wal/000000XX000000YY000000ZZ": Permission denied This moves back to a logic where a single rename() (well, pgrename() for Windows) is used. This issue has proved to be hard to hit when I tested it, facing it only once with an archive_command that was not able to do its work, so it is environment-sensitive. The reporters of this issue have been able to confirm that the situation improved once we switched back to a single rename(). In order to check things, I have provided to the reporters a patched build based on 13.2 with aaa3aed reverted, to test if the error goes away, and an unpatched build of 13.2 to test if the error still showed up (just to make sure that I did not mess up my build process). Extra thanks to Fujii Masao for pointing out what looked like the culprit commit, and to all the reporters for taking the time to test what I have sent them. Reported-by: Andrus, Guy Burgess, Yaroslav Pashinsky, Thomas Trenz Reviewed-by: Tom Lane, Andres Freund Discussion: https://postgr.es/m/3861ff1e-0923-7838-e826-094cc9bef737@hot.ee Discussion: https://postgr.es/m/16874-c3eecd319e36a2bf@postgresql.org Discussion: https://postgr.es/m/095ccf8d-7f58-d928-427c-b17ace23cae6@burgess.co.nz Discussion: https://postgr.es/m/16927-67c570d968c99567%40postgresql.org Discussion: https://postgr.es/m/YFBcRbnBiPdGZvfW@paquier.xyz Backpatch-through: 13
* Provide recovery_init_sync_method=syncfs.Thomas Munro2021-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | Since commit 2ce439f3 we have opened every file in the data directory and called fsync() at the start of crash recovery. This can be very slow if there are many files, leading to field complaints of systems taking minutes or even hours to begin crash recovery. Provide an alternative method, for Linux only, where we call syncfs() on every possibly different filesystem under the data directory. This is equivalent, but avoids faulting in potentially many inodes from potentially slow storage. The new mode comes with some caveats, described in the documentation, so the default value for the new setting is "fsync", preserving the older behavior. Reported-by: Michael Brown <michael.brown@discourse.org> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Paul Guo <guopa@vmware.com> Reviewed-by: Bruce Momjian <bruce@momjian.us> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: David Steele <david@pgmasters.net> Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com
* Remove temporary files after backend crashTomas Vondra2021-03-18
| | | | | | | | | | | | | | | | | | After a crash of a backend using temporary files, the files used to be left behind, on the basis that it might be useful for debugging. But we don't have any reports of anyone actually doing that, and it means the disk usage may grow over time due to repeated backend failures (possibly even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing it required either manual cleanup (deleting files, which is error-prone) or restart of the instance (i.e. service disruption). This implements automatic cleanup of temporary files, controled by a new GUC remove_temp_files_after_crash. By default the files are removed, but it can be disabled to restore the old behavior if needed. Author: Euler Taveira Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com
* Move our p{read,write}v replacements into their own files.Thomas Munro2021-01-14
| | | | | | | | | | | | | macOS's ranlib issued a warning about an empty pread.o file with the previous arrangement, on systems new enough to require no replacement functions. Let's go back to using configure's AC_REPLACE_FUNCS system to build and include each .o in the library only if it's needed, which requires moving the *v() functions to their own files. Also move the _with_retry() wrapper to a more permanent home. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1283127.1610554395%40sss.pgh.pa.us