aboutsummaryrefslogtreecommitdiff
path: root/src/backend/storage/buffer/localbuf.c
Commit message (Collapse)AuthorAge
...
* Allow Pin/UnpinBuffer to operate in a lockfree manner.Andres Freund2016-04-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pinning/Unpinning a buffer is a very frequent operation; especially in read-mostly cache resident workloads. Benchmarking shows that in various scenarios the spinlock protecting a buffer header's state becomes a significant bottleneck. The problem can be reproduced with pgbench -S on larger machines, but can be considerably worse for queries which touch the same buffers over and over at a high frequency (e.g. nested loops over a small inner table). To allow atomic operations to be used, cram BufferDesc's flags, usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable; that allows to manipulate them together using 32bit compare-and-swap operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could be lifted by using a 64bit field, but it's not a realistic configuration atm). As not all operations can easily implemented in a lockfree manner, implement the previous buf_hdr_lock via a flag bit in the atomic variable. That way we can continue to lock the header in places where it's needed, but can get away without acquiring it in the more frequent hot-paths. There's some additional operations which can be done without the lock, but aren't in this patch; but the most important places are covered. As bufmgr.c now essentially re-implements spinlocks, abstract the delay logic from s_lock.c into something more generic. It now has already two users, and more are coming up; there's a follupw patch for lwlock.c at least. This patch is based on a proof-of-concept written by me, which Alexander Korotkov made into a fully working patch; the committed version is again revised by me. Benchmarking and testing has, amongst others, been provided by Dilip Kumar, Alexander Korotkov, Robert Haas. On a large x86 system improvements for readonly pgbench, with a high client count, of a factor of 8 have been observed. Author: Alexander Korotkov and Andres Freund Discussion: 2400449.GjM57CE0Yg@dinodell
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Align buffer descriptors to cache line boundaries.Andres Freund2015-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Benchmarks has shown that aligning the buffer descriptor array to cache lines is important for scalability; especially on bigger, multi-socket, machines. Currently the array sometimes already happens to be aligned by happenstance, depending how large previous shared memory allocations were. That can lead to wildly varying performance results after minor configuration changes. In addition to aligning the start of descriptor array, also force the size of individual descriptors to be of a common cache line size (64 bytes). That happens to already be the case on 64bit platforms, but this way we can change the struct BufferDesc more easily. As the alignment primarily matters in highly concurrent workloads which probably all are 64bit these days, and the space wastage of element alignment would be a bit more noticeable on 32bit systems, we don't force the stride to be cacheline sized on 32bit platforms for now. If somebody does actual performance testing, we can reevaluate that decision by changing the definition of BUFFERDESC_PADDED_SIZE. Discussion: 20140202151319.GD32123@awork2.anarazel.de Per discussion with Bruce Momjan, Tom Lane, Robert Haas, and Peter Geoghegan.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Improve hash_create's API for selecting simple-binary-key hash functions.Tom Lane2014-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if you wanted anything besides C-string hash keys, you had to specify a custom hashing function to hash_create(). Nearly all such callers were specifying tag_hash or oid_hash; which is tedious, and rather error-prone, since a caller could easily miss the opportunity to optimize by using hash_uint32 when appropriate. Replace this with a design whereby callers using simple binary-data keys just specify HASH_BLOBS and don't need to mess with specific support functions. hash_create() itself will take care of optimizing when the key size is four bytes. This nets out saving a few hundred bytes of code space, and offers a measurable performance improvement in tidbitmap.c (which was not exploiting the opportunity to use hash_uint32 for its 4-byte keys). There might be some wins elsewhere too, I didn't analyze closely. In future we could look into offering a similar optimized hashing function for 8-byte keys. Under this design that could be done in a centralized and machine-independent fashion, whereas getting it right for keys of platform-dependent sizes would've been notationally painful before. For the moment, the old way still works fine, so as not to break source code compatibility for loadable modules. Eventually we might want to remove tag_hash and friends from the exported API altogether, since there's no real need for them to be explicitly referenced from outside dynahash.c. Teodor Sigaev and Tom Lane
* Don't allow to disable backend assertions via the debug_assertions GUC.Andres Freund2014-06-20
| | | | | | | | | | | | | | | | | | | | The existance of the assert_enabled variable (backing the debug_assertions GUC) reduced the amount of knowledge some static code checkers (like coverity and various compilers) could infer from the existance of the assertion. That could have been solved by optionally removing the assertion_enabled variable from the Assert() et al macros at compile time when some special macro is defined, but the resulting complication doesn't seem to be worth the gain from having debug_assertions. Recompiling is fast enough. The debug_assertions GUC is still available, but readonly, as it's useful when diagnosing problems. The commandline/client startup option -A, which previously also allowed to enable/disable assertions, has been removed as it doesn't serve a purpose anymore. While at it, reduce code duplication in bufmgr.c and localbuf.c assertions checking for spurious buffer pins. That code had to be reindented anyway to cope with the assert_enabled removal.
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Rationalize common/relpath.[hc].Tom Lane2014-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a73018392636ce832b09b5c31f6ad1f18a4643ea created rather a mess by putting dependencies on backend-only include files into include/common. We really shouldn't do that. To clean it up: * Move TABLESPACE_VERSION_DIRECTORY back to its longtime home in catalog/catalog.h. We won't consider this symbol part of the FE/BE API. * Push enum ForkNumber from relfilenode.h into relpath.h. We'll consider relpath.h as the source of truth for fork numbers, since relpath.c was already partially serving that function, and anyway relfilenode.h was kind of a random place for that enum. * So, relfilenode.h now includes relpath.h rather than vice-versa. This direction of dependency is fine. (That allows most, but not quite all, of the existing explicit #includes of relpath.h to go away again.) * Push forkname_to_number from catalog.c to relpath.c, just to centralize fork number stuff a bit better. * Push GetDatabasePath from catalog.c to relpath.c; it was rather odd that the previous commit didn't keep this together with relpath(). * To avoid needing relfilenode.h in common/, redefine the underlying function (now called GetRelationPath) as taking separate OID arguments, and make the APIs using RelFileNode or RelFileNodeBackend into macro wrappers. (The macros have a potential multiple-eval risk, but none of the existing call sites have an issue with that; one of them had such a risk already anyway.) * Fix failure to follow the directions when "init" fork type was added; specifically, the errhint in forkname_to_number wasn't updated, and neither was the SGML documentation for pg_relation_size(). * Fix tablespace-path-too-long check in CreateTableSpace() to account for fork-name component of maximum-length pathnames. This requires putting FORKNAMECHARS into a header file, but it was rather useless (and actually unreferenced) where it was. The last couple of items are potentially back-patchable bug fixes, if anyone is sufficiently excited about them; but personally I'm not. Per a gripe from Christoph Berg about how include/common wasn't self-contained.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* pgindent run for release 9.3Bruce Momjian2013-05-29
| | | | | This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
* Allow I/O reliability checks using 16-bit checksumsSimon Riggs2013-03-22
| | | | | | | | | | | | | | | | | | | Checksums are set immediately prior to flush out of shared buffers and checked when pages are read in again. Hint bit setting will require full page write when block is dirtied, which causes various infrastructure changes. Extensive comments, docs and README. WARNING message thrown if checksum fails on non-all zeroes page; ERROR thrown but can be disabled with ignore_checksum_failure = on. Feature enabled by an initdb option, since transition from option off to option on is long and complex and has not yet been implemented. Default is not to use checksums. Checksum used is WAL CRC-32 truncated to 16-bits. Simon Riggs, Jeff Davis, Greg Smith Wide input and assistance from many community members. Thank you.
* Improve error reporting in code that checks for buffer refcount leaks.Tom Lane2013-03-15
| | | | | | | | | | Formerly we just Assert'ed that each refcount was zero, which was quick and easy but failed to provide a good overview of what was wrong. Change the code so that we'll call PrintBufferLeakWarning() for each buffer with a nonzero refcount, and then Assert at the end of the loop. This costs nothing in runtime and might ease diagnosis of some bugs. Greg Smith, reviewed by Satoshi Nagayasu, further tweaked by me
* Move relpath() to libpgcommonAlvaro Herrera2013-02-21
| | | | | | | This enables non-backend code, such as pg_xlogdump, to use it easily. The previous location, in src/backend/catalog/catalog.c, made that essentially impossible because that file depends on many backend-only facilities; so this needs to live separately.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Split resowner.hAlvaro Herrera2012-08-28
| | | | | This lets files that are mere users of ResourceOwner not automatically include the headers for stuff that is managed by the resowner mechanism.
* Scan the buffer pool just once, not once per fork, during relation drop.Tom Lane2012-06-07
| | | | | | | | This provides a speedup of about 4X when NBuffers is large enough. There is also a useful reduction in sinval traffic, since we only do CacheInvalidateSmgr() once not once per fork. Simon Riggs, reviewed and somewhat revised by Tom Lane
* Make EXPLAIN (BUFFERS) track blocks dirtied, as well as those written.Robert Haas2012-02-22
| | | | | | Also expose the new counters through pg_stat_statements. Patch by me. Review by Fujii Masao and Greg Smith.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Remove unnecessary #include references, per pgrminclude script.Bruce Momjian2011-09-01
|
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-10
|
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Remove belt-and-suspenders guards against buffer pin leaks.Robert Haas2010-11-25
| | | | | | | | Forcibly releasing all leftover buffer pins should be unnecessary now that we have a robust ResourceOwner mechanism, and it significantly increases the cost of process shutdown. Instead, in an assert-enabled build, assert that no pins are held; in a non-assert-enabled build, do nothing.
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Allocate local buffers in a context of their own, rather than dumping themTom Lane2010-08-19
| | | | | | into TopMemoryContext. This makes no functional difference, but makes it easier to see what the space is being used for in MemoryContextStats dumps. Per a recent example in which I was surprised by the size of TopMemoryContext.
* Include the backend ID in the relpath of temporary relations.Robert Haas2010-08-13
| | | | | | | | | | | | | | | | | This allows us to reliably remove all leftover temporary relation files on cluster startup without reference to system catalogs or WAL; therefore, we no longer include temporary relations in XLOG_XACT_COMMIT and XLOG_XACT_ABORT WAL records. Since these changes require including a backend ID in each SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id field has been reduced from two bytes to one, and the maximum number of connections has been reduced from INT_MAX / 4 to 2^23-1. It would be possible to remove these restrictions by increasing the size of SharedInvalidationMessage by 4 bytes, but right now that doesn't seem like a good trade-off. Review by Jaime Casanova and Tom Lane.
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics.Robert Haas2009-12-15
| | | | | | | | This patch also removes buffer-usage statistics from the track_counts output, since this (or the global server statistics) is deemed to be a better interface to this information. Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Implement prefetching via posix_fadvise() for bitmap index scans. A newTom Lane2009-01-12
| | | | | | | | | | GUC variable effective_io_concurrency controls how many concurrent block prefetch requests will be issued. (The best way to handle this for plain index scans is still under debate, so that part is not applied yet --- tgl) Greg Stark
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Fix #ifdeffed debugging code to work with relation forks.Heikki Linnakangas2008-11-27
|
* Fix sloppy omission of now-required #include's.Tom Lane2008-11-11
|
* Change error messages to print the physical path, likeHeikki Linnakangas2008-11-11
| | | | | | "base/11517/3767_fsm", instead of symbolic names like "1663/11517/3767/1", per Alvaro's suggestion. I didn't change the messages in the higher-level index, heap and FSM routines, though, where the fork is implicit.
* Introduce the concept of relation forks. An smgr relation can now consistHeikki Linnakangas2008-08-11
| | | | | | | | | | | | | | | | of multiple forks, and each fork can be created and grown separately. The bulk of this patch is about changing the smgr API to include an extra ForkNumber argument in every smgr function. Also, smgrscheduleunlink and smgrdounlink no longer implicitly call smgrclose, because other forks might still exist after unlinking one. The callers of those functions have been modified to call smgrclose instead. This patch in itself doesn't have any user-visible effect, but provides the infrastructure needed for upcoming patches. The additional forks envisioned are a rewritten FSM implementation that doesn't rely on a fixed-size shared memory block, and a visibility map to allow skipping portions of a table in VACUUM that have no dead tuples.
* Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relationHeikki Linnakangas2008-06-12
| | | | | | | | | | forks. XLogOpenRelation() and the associated light-weight relation cache in xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument, instead of Relation. For functions that still need a Relation struct during WAL replay, there's a new function called CreateFakeRelcacheEntry() that returns a fake entry like XLogOpenRelation() used to.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Make large sequential scans and VACUUMs work in a limited-size "ring" ofTom Lane2007-05-30
| | | | | | | | | | | | | | | | | | | | | | | buffers, rather than blowing out the whole shared-buffer arena. Aside from avoiding cache spoliation, this fixes the problem that VACUUM formerly tended to cause a WAL flush for every page it modified, because we had it hacked to use only a single buffer. Those flushes will now occur only once per ring-ful. The exact ring size, and the threshold for seqscans to switch into the ring usage pattern, remain under debate; but the infrastructure seems done. The key bit of infrastructure is a new optional BufferAccessStrategy object that can be passed to ReadBuffer operations; this replaces the former StrategyHintVacuum API. This patch also changes the buffer usage-count methodology a bit: we now advance usage_count when first pinning a buffer, rather than when last unpinning it. To preserve the behavior that a buffer's lifetime starts to decrease when it's released, the clock sweep code is modified to not decrement usage_count of pinned buffers. Work not done in this commit: teach GiST and GIN indexes to use the vacuum BufferAccessStrategy for vacuum-driven fetches. Original patch by Simon, reworked by Heikki and again by Tom.
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-05
| | | | back-stamped for this.
* Modify local buffer management to request memory for local buffers in blocksTom Lane2006-12-27
| | | | | | | | | | | of increasing size, instead of one at a time. This reduces the memory management overhead when num_temp_buffers is large: in the previous coding we would actually waste 50% of the space used for temp buffers, because aset.c would round the individual requests up to 16K. Problem noted while studying a performance issue reported by Steven Flatt. Back-patch as far as 8.1 --- older versions used few enough local buffers that the issue isn't significant for them.
* Clean up WAL/buffer interactions as per my recent proposal. Get rid of theTom Lane2006-03-31
| | | | | | | | | | | | | | | | misleadingly-named WriteBuffer routine, and instead require routines that change buffer pages to call MarkBufferDirty (which does exactly what it says). We also require that they do so before calling XLogInsert; this takes care of the synchronization requirement documented in SyncOneBuffer. Note that because bufmgr takes the buffer content lock (in shared mode) while writing out any buffer, it doesn't matter whether MarkBufferDirty is executed before the buffer content change is complete, so long as the content change is completed before releasing exclusive lock on the buffer. So it's OK to set the dirtybit before we fill in the LSN. This eliminates the former kluge of needing to set the dirtybit in LockBuffer. Aside from making the code more transparent, we can also add some new debugging assertions, in particular that the caller of MarkBufferDirty must hold the buffer content lock, not merely a pin.
* Update copyright for 2006. Update scripts.Bruce Momjian2006-03-05
|
* Re-run pgindent, fixing a problem where comment lines after a blankBruce Momjian2005-11-22
| | | | | | | | | comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.
* DropRelFileNodeBuffers failed to fix the state of the lookup hash tableTom Lane2005-11-17
| | | | | | | | that was added to localbuf.c in 8.1; therefore, applying it to a temp table left corrupt lookup state in memory. The only case where this had a significant chance of causing problems was an ON COMMIT DELETE ROWS temp table; the other possible paths left bogus state that was unlikely to be used again. Per report from Csaba Nagy.
* Standard pgindent run for 8.1.Bruce Momjian2005-10-15
|
* Convert the arithmetic for shared memory size calculation from 'int'Tom Lane2005-08-20
| | | | | | | | | | | to 'Size' (that is, size_t), and install overflow detection checks in it. This allows us to remove the former arbitrary restrictions on NBuffers etc. It won't make any difference in a 32-bit machine, but in a 64-bit machine you could theoretically have terabytes of shared buffers. (How efficiently we could manage 'em remains to be seen.) Similarly, num_temp_buffers, work_mem, and maintenance_work_mem can be set above 2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional work by moi.
* Avoid useless loop overhead in AtEOXact routines when the backend isTom Lane2005-08-08
| | | | compiled with USE_ASSERT_CHECKING but is running with assert_enabled false.
* Modify hash_search() API to prevent future occurrences of the errorTom Lane2005-05-29
| | | | | | | | | | | | | spotted by Qingqing Zhou. The HASH_ENTER action now automatically fails with elog(ERROR) on out-of-memory --- which incidentally lets us eliminate duplicate error checks in quite a bunch of places. If you really need the old return-NULL-on-out-of-memory behavior, you can ask for HASH_ENTER_NULL. But there is now an Assert in that path checking that you aren't hoping to get that behavior in a palloc-based hash table. Along the way, remove the old HASH_FIND_SAVE/HASH_REMOVE_SAVED actions, which were not being used anywhere anymore, and were surely too ugly and unsafe to want to see revived again.
* Add temp_buffers GUC variable to allow users to determine the sizeTom Lane2005-03-19
| | | | of the local buffer arena for temporary table access.
* Upgrade localbuf.c to use a hash table instead of linear search toTom Lane2005-03-19
| | | | | find already-allocated local buffers. This is the last obstacle in the way of setting NLocBuffer to something reasonably large.