aboutsummaryrefslogtreecommitdiff
path: root/src/backend/storage/buffer
Commit message (Collapse)AuthorAge
...
* Add VACUUM/ANALYZE BUFFER_USAGE_LIMIT optionDavid Rowley2023-04-07
| | | | | | | | | | | | | | | | | | | | | | | | Add new options to the VACUUM and ANALYZE commands called BUFFER_USAGE_LIMIT to allow users more control over how large to make the buffer access strategy that is used to limit the usage of buffers in shared buffers. Larger rings can allow VACUUM to run more quickly but have the drawback of VACUUM possibly evicting more buffers from shared buffers that might be useful for other queries running on the database. Here we also add a new GUC named vacuum_buffer_usage_limit which controls how large to make the access strategy when it's not specified in the VACUUM/ANALYZE command. This defaults to 256KB, which is the same size as the access strategy was prior to this change. This setting also controls how large to make the buffer access strategy for autovacuum. Per idea by Andres Freund. Author: Melanie Plageman Reviewed-by: David Rowley Reviewed-by: Andres Freund Reviewed-by: Justin Pryzby Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb@awork3.anarazel.de
* bufmgr: Introduce infrastructure for faster relation extensionAndres Freund2023-04-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The primary bottlenecks for relation extension are: 1) The extension lock is held while acquiring a victim buffer for the new page. Acquiring a victim buffer can require writing out the old page contents including possibly needing to flush WAL. 2) When extending via ReadBuffer() et al, we write a zero page during the extension, and then later write out the actual page contents. This can nearly double the write rate. 3) The existing bulk relation extension infrastructure in hio.c just amortized the cost of acquiring the relation extension lock, but none of the other costs. Unfortunately 1) cannot currently be addressed in a central manner as the callers to ReadBuffer() need to acquire the extension lock. To address that, this this commit moves the responsibility for acquiring the extension lock into bufmgr.c functions. That allows to acquire the relation extension lock for just the required time. This will also allow us to improve relation extension further, without changing callers. The reason we write all-zeroes pages during relation extension is that we hope to get ENOSPC errors earlier that way (largely works, except for CoW filesystems). It is easier to handle out-of-space errors gracefully if the page doesn't yet contain actual tuples. This commit addresses 2), by using the recently introduced smgrzeroextend(), which extends the relation, without dirtying the kernel page cache for all the extended pages. To address 3), this commit introduces a function to extend a relation by multiple blocks at a time. There are three new exposed functions: ExtendBufferedRel() for extending the relation by a single block, ExtendBufferedRelBy() to extend a relation by multiple blocks at once, and ExtendBufferedRelTo() for extending a relation up to a certain size. To avoid duplicating code between ReadBuffer(P_NEW) and the new functions, ReadBuffer(P_NEW) now implements relation extension with ExtendBufferedRel(), using a flag to tell ExtendBufferedRel() that the relation lock is already held. Note that this commit does not yet lead to a meaningful performance or scalability improvement - for that uses of ReadBuffer(P_NEW) will need to be converted to ExtendBuffered*(), which will be done in subsequent commits. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de
* bufmgr: Support multiple in-progress IOs by using resownerAndres Freund2023-04-05
| | | | | | | | | | | | | | | | | | A future patch will add support for extending relations by multiple blocks at once. To be concurrency safe, the buffers for those blocks need to be marked as BM_IO_IN_PROGRESS. Until now we only had infrastructure for recovering from an IO error for a single buffer. This commit extends that infrastructure to multiple buffers by using the resource owner infrastructure. This commit increases the size of the ResourceOwnerData struct, which appears to have a just about measurable overhead in very extreme workloads. Medium term we are planning to substantially shrink the size of ResourceOwnerData. Short term the increase is small enough to not worry about it for now. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de Discussion: https://postgr.es/m/20221029200025.w7bvlgvamjfo6z44@awork3.anarazel.de
* bufmgr: Acquire and clean victim buffer separatelyAndres Freund2023-04-05
| | | | | | | | | | | | | | | | | | | | | | | Previously we held buffer locks for two buffer mapping partitions at the same time to change the identity of buffers. Particularly for extending relations needing to hold the extension lock while acquiring a victim buffer is painful.But it also creates a bottleneck for workloads that just involve reads. Now we instead first acquire a victim buffer and write it out, if necessary. Then we remove that buffer from the old partition with just the old partition's partition lock held and insert it into the new partition with just that partition's lock held. By separating out the victim buffer acquisition, future commits will be able to change relation extensions to scale better. On my workstation, a micro-benchmark exercising buffered reads strenuously and under a lot of concurrency, sees a >2x improvement. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de
* bufmgr: Add Pin/UnpinLocalBuffer()Andres Freund2023-04-05
| | | | | | | | So far these were open-coded in quite a few places, without a good reason. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de
* bufmgr: Add some more error checking [infrastructure] around pinningAndres Freund2023-04-05
| | | | | | | | | | | This adds a few more assertions against a buffer being local in places we don't expect, and extracts the check for a buffer being pinned exactly once from LockBufferForCleanup() into its own function. Later commits will use this function. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: http://postgr.es/m/419312fd-9255-078c-c3e3-f0525f911d7f@iki.fi
* bufmgr: Remove buffer-write-dirty tracepointsAndres Freund2023-04-03
| | | | | | | | | The trace point was using the relfileno / fork / block for the to-be-read-in buffer. Some upcoming work would make that more expensive to provide. We still have buffer-flush-start/done, which can serve most tracing needs that buffer-write-dirty could serve. Discussion: https://postgr.es/m/f5164e7a-eef6-8972-75a3-8ac622ed0c6e@iki.fi
* Rename BufferAccessStrategyData.ring_size to nbuffersDavid Rowley2023-04-03
| | | | | | | | | | | | The new name better reflects what the field is - the size of the buffers[] array. ring_size sounded more like it is in units of bytes. An upcoming commit allows a BufferAccessStrategy of custom sizes, so it seems relevant to improve this beforehand. Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAAKRu_YefVbhg4rAxU2frYFdTap61UftH-kUNQZBvAs%2BYK81Zg%40mail.gmail.com
* Track shared buffer hits in pg_stat_ioAndres Freund2023-03-30
| | | | | | | | | | | | Among other things, this should make it easier to calculate a useful cache hit ratio by excluding buffer reads via buffer access strategies. As buffer access strategies reuse buffers (and thus evict the prior buffer contents), it is normal to see reads on repeated scans of the same data. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAAKRu_beMa9Hzih40%3DXPYqhDVz6tsgUGTrhZXRo%3Dunp%2Bszb%3DUA%40mail.gmail.com
* bufmgr: Fix undefined behaviour with, unrealistically, large temp_buffersAndres Freund2023-03-30
| | | | | | | | | | | | | | | | | | Quoting Melanie: > Since if buffer is INT_MAX, then the -(buffer + 1) version invokes > undefined behavior while the -buffer - 1 version doesn't. All other places were already using the correct version. I (Andres), copied the code into more places in a patch. Melanie caught it in review, but to prevent more people from copying the bad code, fix it. Even if it is a theoretical issue. We really ought to wrap these accesses in a helper function... As this is a theoretical issue, don't backpatch. Reported-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_aW2SX_LWtwHgfnqYpBrunMLfE9PD6-ioPpkh92XH0qpg@mail.gmail.com
* Remove empty function BufmgrCommit().Tom Lane2023-03-29
| | | | | | | | | | | | | | This function has been a no-op for over a decade. Even if bufmgr regains a need to be called during commit, it seems unlikely that the most appropriate call points would be precisely here, so it's not doing us much good as a placeholder either. Now, removing it probably doesn't save any noticeable number of cycles --- but the main call is inside the commit critical section, and the less work done there the better. Matthias van de Meent Discussion: https://postgr.es/m/CAEze2Wi1=tLKbxZnXzcD+8fYKyKqBtivVakLQC_mYBsP4Y8qVA@mail.gmail.com
* Fix memory leak and inefficiency in CREATE DATABASE ... STRATEGY WAL_LOGAndres Freund2023-03-22
| | | | | | | | | | | | | | | RelationCopyStorageUsingBuffer() did not free the strategies used to access the source / target relation. They memory was released at the end of the transaction, but when using a template database with a lot of relations, the temporary leak can become big prohibitively big. RelationCopyStorageUsingBuffer() acquired the buffer for the target relation with RBM_NORMAL, therefore requiring a read of a block guaranteed to be zero. Use RBM_ZERO_AND_LOCK instead. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/20230321070113.o2vqqxogjykwgfrr@awork3.anarazel.de Backpatch: 15-, where STRATEGY WAL_LOG was introduced
* Update types in smgr APIPeter Eisentraut2023-02-27
| | | | | | | | Change data buffer to void *, from char *, and add const where appropriate. This makes it match the File API (see also 2d4f1ba6cfc2f0a977f1c30bda9848041343e248) and stdio. Discussion: https://www.postgresql.org/message-id/flat/11dda853-bb5b-59ba-a746-e168b1ce4bdb%40enterprisedb.com
* Speedup and increase usability of set proc title functionsDavid Rowley2023-02-20
| | | | | | | | | | | | | | | | | | | | | | The setting of the process title could be seen on profiles of very fast-to-execute queries. In many locations where we call set_ps_display() we pass along a string constant, the length of which is known during compilation. Here we effectively rename set_ps_display() to set_ps_display_with_len() and then add a static inline function named set_ps_display() which calls strlen() on the given string. This allows the compiler to optimize away the strlen() call when dealing with call sites passing a string constant. We can then also use memcpy() instead of strlcpy() to copy the string into the destination buffer. That's significantly faster than strlcpy's byte-at-a-time way of copying. Here we also take measures to improve some code which was adjusting the process title to add a " waiting" suffix to it. Call sites which require this can now just call set_ps_display_suffix() to add or adjust the suffix and call set_ps_display_remove_suffix() to remove it again. Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CAApHDvocBvvk-0gWNA2Gohe+sv9fMcv+fK_G+siBKJrgDG4O7g@mail.gmail.com
* pgstat: Track more detailed relation IO statisticsAndres Freund2023-02-09
| | | | | | | | | | | | | | | | | | | Commit 28e626bde00 introduced the infrastructure for tracking more detailed IO statistics. This commit adds the actual collection of the new IO statistics for relations and temporary relations. See aforementioned commit for goals and high-level design. The changes in this commit are fairly straight-forward. The bulk of the change is to passing sufficient information to the callsites of pgstat_count_io_op(). A somewhat unsightly detail is that it currently is hard to find a better place to count fsyncs than in md.c, whereas the other pgstat_count_io_op() calls are in bufmgr.c/localbuf.c. As the number of fsyncs is tied to md.c implementation details, it's not obvious there is a better answer. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
* Remove useless casts to (void *) in hash_search() callsPeter Eisentraut2023-02-06
| | | | | | | | | | | Some of these appear to be leftovers from when hash_search() took a char * argument (changed in 5999e78fc45dcb91784b64b6e9ae43f4e4f68ca2). Since after this there is some more horizontal space available, do some light reformatting where suitable. Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
* Zero initialize uses of instr_time about to trigger compiler warningsAndres Freund2023-01-20
| | | | | | | | | These are all not necessary from a correctness POV. However, in the near future instr_time will be simplified to an int64, at which point gcc would otherwise start to warn about the changed places. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/20230116023639.rn36vf6ajqmfciua@awork3.anarazel.de
* Manual cleanup and pgindent of pgstat and bufmgr related codeAndres Freund2023-01-13
| | | | | | This is in preparation for commiting a larger patch series in the area. Discussion: https://postgr.es/m/CAAKRu_bHwGEbzNxxy+MQDkrsgog6aO6iUvajJ4d6PD98gFU7+w@mail.gmail.com
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Add copyright notices to meson filesAndrew Dunstan2022-12-20
| | | | Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net
* Clean up some inconsistencies with GUC declarationsMichael Paquier2022-10-31
| | | | | | | | | | | | | | | | | | | | This is similar to 7d25958, and this commit takes care of all the remaining inconsistencies between the initial value used in the C variable associated to a GUC and its default value stored in the GUC tables (as of pg_settings.boot_val). Some of the initial values of the GUCs updated rely on a compile-time default. These are refactored so as the GUC table and its C declaration use the same values. This makes everything consistent with other places, backend_flush_after, bgwriter_flush_after, port, checkpoint_flush_after doing so already, for example. Extracted from a larger patch by Peter Smith. The spots updated in the modules are from me. Author: Peter Smith, Michael Paquier Reviewed-by: Nathan Bossart, Tom Lane, Justin Pryzby Discussion: https://postgr.es/m/CAHut+PtHE0XSfjjRQ6D4v7+dqzCw=d+1a64ujra4EX8aoc_Z+w@mail.gmail.com
* Remove useless argument from UnpinBuffer()Michael Paquier2022-09-30
| | | | | | | | | | | The last caller of UnpinBuffer() that did not want to adjust CurrentResourceOwner was removed in 2d115e4, and nothing has been introduced in bufmgr.c to do the same thing since. This simplifies 10 code paths. Author: Aleksander Alekseev Reviewed-by: Nathan Bossart, Zhang Mingli, Bharath Rupireddy Discussion: https://postgr.es/m/CAJ7c6TOmmFpb6ohurLhTC7hKNJWGzdwf8s4EAtAZxD48g-e6Jw@mail.gmail.com
* meson: Add initial version of meson based build systemAndres Freund2022-09-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Autoconf is showing its age, fewer and fewer contributors know how to wrangle it. Recursive make has a lot of hard to resolve dependency issues and slow incremental rebuilds. Our home-grown MSVC build system is hard to maintain for developers not using Windows and runs tests serially. While these and other issues could individually be addressed with incremental improvements, together they seem best addressed by moving to a more modern build system. After evaluating different build system choices, we chose to use meson, to a good degree based on the adoption by other open source projects. We decided that it's more realistic to commit a relatively early version of the new build system and mature it in tree. This commit adds an initial version of a meson based build system. It supports building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD, Solaris and Windows (however only gcc is supported on aix, solaris). For Windows/MSVC postgres can now be built with ninja (faster, particularly for incremental builds) and msbuild (supporting the visual studio GUI, but building slower). Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM bitcode generation, documentation adjustments) are done in subsequent commits requiring further review. Other aspects (e.g. not installing test-only extensions) are not yet addressed. When building on Windows with msbuild, builds are slower when using a visual studio version older than 2019, because those versions do not support MultiToolTask, required by meson for intra-target parallelism. The plan is to remove the MSVC specific build system in src/tools/msvc soon after reaching feature parity. However, we're not planning to remove the autoconf/make build system in the near future. Likely we're going to keep at least the parts required for PGXS to keep working around until all supported versions build with meson. Some initial help for postgres developers is at https://wiki.postgresql.org/wiki/Meson With contributions from Thomas Munro, John Naylor, Stone Tickle and others. Author: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
* Harmonize parameter names in storage and AM code.Peter Geoghegan2022-09-19
| | | | | | | | | | | | | | | Make sure that function declarations use names that exactly match the corresponding names from function definitions in storage, catalog, access method, executor, and logical replication code, as well as in miscellaneous utility/library code. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
* Split up guc.c for better build speed and ease of maintenance.Tom Lane2022-09-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | guc.c has grown to be one of our largest .c files, making it a bottleneck for compilation. It's also acquired a bunch of knowledge that'd be better kept elsewhere, because of our not very good habit of putting variable-specific check hooks here. Hence, split it up along these lines: * guc.c itself retains just the core GUC housekeeping mechanisms. * New file guc_funcs.c contains the SET/SHOW interfaces and some SQL-accessible functions for GUC manipulation. * New file guc_tables.c contains the data arrays that define the built-in GUC variables, along with some already-exported constant tables. * GUC check/assign/show hook functions are moved to the variable's home module, whenever that's clearly identifiable. A few hard- to-classify hooks ended up in commands/variable.c, which was already a home for miscellaneous GUC hook functions. To avoid cluttering a lot more header files with #include "guc.h", I also invented a new header file utils/guc_hooks.h and put all the GUC hook functions' declarations there, regardless of their originating module. That allowed removal of #include "guc.h" from some existing headers. The fallout from that (hopefully all caught here) demonstrates clearly why such inclusions are best minimized: there are a lot of files that, for example, were getting array.h at two or more levels of remove, despite not having any connection at all to GUCs in themselves. There is some very minor code beautification here, such as renaming a couple of inconsistently-named hook functions and improving some comments. But mostly this just moves code from point A to point B and deals with the ensuing needs for #include adjustments and exporting a few functions that previously weren't exported. Patch by me, per a suggestion from Andres Freund; thanks also to Michael Paquier for the idea to invent guc_funcs.c. Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us
* Include RelFileLocator fields individually in BufferTag.Robert Haas2022-08-24
| | | | | | | | | | | | | | This is preparatory work for a project to increase the number of bits in a RelFileNumber from 32 to 56. Along the way, introduce static inline accessor functions for a couple of BufferTag fields. Dilip Kumar, reviewed by me. The overall patch series has also had review at various times from Andres Freund, Ashutosh Sharma, Hannu Krosing, Vignesh C, Álvaro Herrera, and Tom Lane. Discussion: http://postgr.es/m/CAFiTN-trubju5YbWAq-BSpZ90-Z6xCVBQE8BVqXqANOZAF1Znw@mail.gmail.com
* Further reduce warnings with -Wshadow=compatible-localDavid Rowley2022-08-24
| | | | | | | | | | | | | | | | | | | | | | | In a similar effort to f01592f91, here we're targetting fixing the warnings that -Wshadow=compatible-local produces that we can fix by moving a variable to an inner scope to stop that variable from being shadowed by another variable declared somewhere later in the function. All of the warnings being fixed here are changing the scope of variables which are being used as an iterator for a "for" loop. In each instance, the fix happens to be changing the for loop to use the C99 type initialization. Much of this code likely pre-dates our use of C99. Reducing the scope of the outer scoped variable seems like the safest way to fix these. Renaming seems more likely to risk patches using the wrong variable. Reducing the scope is more likely to result in a compilation failure after applying some future patch rather than introducing bugs with it. By my count, this takes the warning count from 129 down to 114. Author: Justin Pryzby Discussion: https://postgr.es/m/CAApHDvrwLGBP%2BYw9vriayyf%3DXR4uPWP5jr6cQhP9au_kaDUhbA%40mail.gmail.com
* When using the WAL-logged CREATE DATABASE strategy, bulk extend.Robert Haas2022-08-18
| | | | | | | | | | This should improve performance, and was suggested by Andres Freund. Back-patch to v15 to keep the code consistent across branches. Dilip Kumar Discussion: http://postgr.es/m/C3458199-FEDD-4356-865A-08DFAA5D4065@anarazel.de Discussion: http://postgr.es/m/CAFiTN-sJ0vVpJrZ=R5M+g7Tr8=NN4wKOtrqOcDEsfFfnZgivVA@mail.gmail.com
* Avoid using a fake relcache entry to own an SmgrRelation.Robert Haas2022-08-12
| | | | | | | | | | | | | | | | | | | | If an error occurs before we close the fake relcache entry, the the fake relcache entry will be destroyed by the SmgrRelation will survive until end of transaction. Its smgr_owner pointer ends up pointing to already-freed memory. The original reason for using a fake relcache entry here was to try to avoid reusing an SMgrRelation across a relevant invalidation. To avoid that problem, just call smgropen() again each time we need a reference to it. Hopefully someday we will come up with a more elegant approach, but accessing uninitialized memory is bad so let's do this for now. Dilip Kumar, reviewed by Andres Freund and Tom Lane. Report by Justin Pryzby. Discussion: http://postgr.es/m/20220802175043.GA13682@telsasoft.com Discussion: http://postgr.es/m/CAFiTN-vSFeE6_W9z698XNtFROOA_nSqUXWqLcG0emob_kJ+dEQ@mail.gmail.com
* Fix data-corruption hazard in WAL-logged CREATE DATABASE.Tom Lane2022-08-06
| | | | | | | | | | | | | | RelationCopyStorageUsingBuffer thought it could skip copying empty pages, but of course that does not work at all, because subsequent blocks will be out of place. Also fix it to acquire share lock on the source buffer. It *might* be safe to not do that, but it's not very certain, and I don't think this code deserves any benefit of the doubt. Dilip Kumar, per complaint from me Discussion: https://postgr.es/m/3679800.1659654066@sss.pgh.pa.us
* Convert macros to static inline functions (buf_internals.h)Robert Haas2022-07-27
| | | | | | Dilip Kumar, reviewed by Vignesh C, Ashutosh Sharma, and me. Discussion: http://postgr.es/m/CAFiTN-tYbM7D+2UGiNc2kAFMSQTa5FTeYvmg-Vj2HvPdVw2Gvg@mail.gmail.com
* Fix ReadRecentBuffer for local buffers.Heikki Linnakangas2022-07-25
| | | | | | | | | | | | | | | | | | | | | | It incorrectly used GetBufferDescriptor instead of GetLocalBufferDescriptor, causing it to not find the correct buffer in most cases, and performing an out-of-bounds memory read in the corner case that temp_buffers > shared_buffers. It also bumped the usage-count on the buffer, even if it was previously pinned. That won't lead to crashes or incorrect results, but it's different from what the shared-buffer case does, and different from the usual code in LocalBufferAlloc. Fix that too, and make the code ordering match LocalBufferAlloc() more closely, so that it's easier to verify that it's doing the same thing. Currently, ReadRecentBuffer() is only used with non-temp relations, in WAL redo, so the broken code is currently dead code. However, it could be used by extensions. Backpatch-through: 14 Discussion: https://www.postgresql.org/message-id/2d74b46f-27c9-fb31-7f99-327a87184cc0%40iki.fi Reviewed-by: Thomas Munro, Zhang Mingli, Richard Guo
* Rename some functions to mention Relation instead of RelFileLocator.Robert Haas2022-07-12
| | | | | | | | This is definitely shorter, and hopefully clearer. Kyotaro Horiguchi, reviewed by Dilip Kumar and by me Discussion: http://postgr.es/m/20220707.174436.1885393789789795413.horikyota.ntt@gmail.com
* Fix mistake in comment.Robert Haas2022-07-11
| | | | | | Kyotaro Horiguchi Discussion: http://postgr.es/m/20220708.145951.382076151410075693.horikyota.ntt@gmail.com
* Change internal RelFileNode references to RelFileNumber or RelFileLocator.Robert Haas2022-07-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
* Pre-beta mechanical code beautification.Tom Lane2022-05-12
| | | | | Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.
* Remove extraneous blank lines before block-closing bracesAlvaro Herrera2022-04-13
| | | | | | | | | These are useless and distracting. We wouldn't have written the code with them to begin with, so there's no reason to keep them. Author: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch
* Rename delayChkpt to delayChkptFlags.Robert Haas2022-04-08
| | | | | | | | | | | | | | | | | | Before commit 412ad7a55639516f284cd0ef9757d6ae5c7abd43, delayChkpt was a Boolean. Now it's an integer. Extensions using it need to be appropriately updated, so let's rename the field to make sure that a hard compilation failure occurs. Replacing delayChkpt with delayChkptFlags made a few comments extend past 80 characters, so I reflowed them and changed some wording very slightly. The back-branches will need a different change to restore compatibility with existing minor releases; this is just for master. Per suggestion from Tom Lane. Discussion: http://postgr.es/m/a7880f4d-1d74-582a-ada7-dad168d046d1@enterprisedb.com
* Prefetch data referenced by the WAL, take II.Thomas Munro2022-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new GUC recovery_prefetch. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Since not all OSes have that system call, "try" is provided so that it can be enabled where available. Better mechanisms for asynchronous I/O are possible in later work. Set to "try" for now for test coverage. Default setting to be finalized before release. The GUC wal_decode_buffer_size limits the distance we can look ahead in bytes of decoded data. The existing GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os allowed, based on pessimistic heuristics used to infer that I/Os have begun and completed. We'll also not look more than maintenance_io_concurrency * 4 block references ahead. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (earlier version) Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (earlier version) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (earlier version) Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> (earlier version) Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> (earlier version) Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
* pgstat: store statistics in shared memory.Andres Freund2022-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
* Add new block-by-block strategy for CREATE DATABASE.Robert Haas2022-03-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because this strategy logs changes on a block-by-block basis, it avoids the need to checkpoint before and after the operation. However, because it logs each changed block individually, it might generate a lot of extra write-ahead logging if the template database is large. Therefore, the older strategy remains available via a new STRATEGY parameter to CREATE DATABASE, and a corresponding --strategy option to createdb. Somewhat controversially, this patch assembles the list of relations to be copied to the new database by reading the pg_class relation of the template database. Cross-database access like this isn't normally possible, but it can be made to work here because there can't be any connections to the database being copied, nor can it contain any in-doubt transactions. Even so, we have to use lower-level interfaces than normal, since the table scan and relcache interfaces will not work for a database to which we're not connected. The advantage of this approach is that we do not need to rely on the filesystem to determine what ought to be copied, but instead on PostgreSQL's own knowledge of the database structure. This avoids, for example, copying stray files that happen to be located in the source database directory. Dilip Kumar, with a fairly large number of cosmetic changes by me. Reviewed and tested by Ashutosh Sharma, Andres Freund, John Naylor, Greg Nancarrow, Neha Sharma. Additional feedback from Bruce Momjian, Heikki Linnakangas, Julien Rouhaud, Adam Brusselback, Kyotaro Horiguchi, Tomas Vondra, Andrew Dunstan, Álvaro Herrera, and others. Discussion: http://postgr.es/m/CA+TgmoYtcdxBjLh31DLxUXHxFVMPGzrU5_T=CYCvRyFHywSBUQ@mail.gmail.com
* Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.Robert Haas2022-03-24
| | | | | | | | | | | | | | | | If TRUNCATE causes some buffers to be invalidated and thus the checkpoint does not flush them, TRUNCATE must also ensure that the corresponding files are truncated on disk. Otherwise, a replay from the checkpoint might find that the buffers exist but have the wrong contents, which may cause replay to fail. Report by Teja Mupparti. Patch by Kyotaro Horiguchi, per a design suggestion from Heikki Linnakangas, with some changes to the comments by me. Review of this and a prior patch that approached the issue differently by Heikki Linnakangas, Andres Freund, Álvaro Herrera, Masahiko Sawada, and Tom Lane. Discussion: http://postgr.es/m/BYAPR06MB6373BF50B469CA393C614257ABF00@BYAPR06MB6373.namprd06.prod.outlook.com
* Remove xloginsert.h from xlog.hAlvaro Herrera2022-01-30
| | | | | | | | | xlog.h is directly and indirectly #included in a lot of places. With this change, xloginsert.h is no longer unnecessarily included in the large number of them that don't need it. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/CALj2ACVe-W+WM5P44N7eG9C2_FmaeM8Dq5aCnD3fHt0Ba=WR6w@mail.gmail.com
* Consistently use the function name CreateCheckPoint in code and comments.Amit Kapila2022-01-17
| | | | | Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACVZmKsvDjtd45+9oTcnjUJtC4LF2BYK8TpWT1f=NjJX3w@mail.gmail.com
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Change ProcSendSignal() to take pgprocno.Thomas Munro2021-12-16
| | | | | | | | | | | Instead of referring to target backends by pid, use pgprocno. This means that we don't have to scan the ProcArray and we can drop some special case code for dealing with the startup process. Discussion: https://postgr.es/m/CA%2BhUKGLYRyDaneEwz5Uya_OgFLMx5BgJfkQSD%3Dq9HmwsfRRb-w%40mail.gmail.com Reviewed-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com> Reviewed-by: Ashwin Agrawal <ashwinstar@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de>
* Some RELKIND macro refactoringPeter Eisentraut2021-12-03
| | | | | | | | | | | | Add more macros to group some RELKIND_* macros: - RELKIND_HAS_PARTITIONS() - RELKIND_HAS_TABLESPACE() - RELKIND_HAS_TABLE_AM() Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/a574c8f1-9c84-93ad-a9e5-65233d6fc00f%40enterprisedb.com
* Fix duplicate words in commentsDaniel Gustafsson2021-10-04
| | | | | | | Remove accidentally duplicated words in code comments. Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://postgr.es/m/87bl45t0co.fsf@wibble.ilmari.org
* Check for relation length overrun soon enough.Tom Lane2021-09-09
| | | | | | | | | | | | | | | | | We don't allow relations to exceed 2^32-1 blocks, because block numbers are 32 bits and the last possible block number is reserved to mean InvalidBlockNumber. There is a check for this in mdextend, but that's really way too late, because the smgr API requires us to create a buffer for the block-to-be-added, and we do not want to have any buffer with blocknum InvalidBlockNumber. (Such a case can trigger assertions in bufmgr.c, plus I think it might confuse ReadBuffer's logic for data-past-EOF later on.) So put the check into ReadBuffer. Per report from Christoph Berg. It's been like this forever, so back-patch to all supported branches. Discussion: https://postgr.es/m/YTn1iTkUYBZfcODk@msg.credativ.de
* Fix handling of partitioned index in RelationGetNumberOfBlocksInFork()Peter Eisentraut2021-08-26
| | | | | | | | | | | | | Since a partitioned index doesn't have storage, getting the number of blocks from it will not give sensible results. Existing callers already check that they don't call it that way, so there doesn't appear to be a live problem. But for correctness, handle RELKIND_PARTITIONED_INDEX together with the other non-storage relkinds. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/1d3a5fbe-f48b-8bea-80da-9a5c4244aef9@enterprisedb.com