aboutsummaryrefslogtreecommitdiff
path: root/src/backend/storage/buffer
Commit message (Collapse)AuthorAge
* Fix ReadBuffer() to correctly handle the case where it's trying to extendTom Lane2006-01-06
| | | | | | | | | the relation but it finds a pre-existing valid buffer. The buffer does not correspond to any page known to the kernel, so we *must* do smgrextend to ensure that the space becomes allocated. The 7.x branches all do this correctly, but the corner case got lost somewhere during 8.0 bufmgr rewrites. (My fault no doubt :-( ... I think I assumed that such a buffer must be not-BM_VALID, which is not so.)
* Re-run pgindent, fixing a problem where comment lines after a blankBruce Momjian2005-11-22
| | | | | | | | | comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.
* DropRelFileNodeBuffers failed to fix the state of the lookup hash tableTom Lane2005-11-17
| | | | | | | | that was added to localbuf.c in 8.1; therefore, applying it to a temp table left corrupt lookup state in memory. The only case where this had a significant chance of causing problems was an ON COMMIT DELETE ROWS temp table; the other possible paths left bogus state that was unlikely to be used again. Per report from Csaba Nagy.
* Tweak buffer manager so that 'internal' accesses to a buffer do notTom Lane2005-10-27
| | | | | | | | advance its usage_count. This includes writes of dirty buffers triggered by bgwriter, checkpoint, or FlushRelationBuffers, as well as various corner cases that really ought not count as accesses to the page. Should make for some marginal improvement in the quality of our decisions about when to recycle buffers. Per suggestion from ITAGAKI Takahiro.
* Standard pgindent run for 8.1.Bruce Momjian2005-10-15
|
* Do all accesses to shared buffer headers through volatile-qualifiedTom Lane2005-10-12
| | | | | | | pointers, to ensure that compilers won't rearrange accesses to occur while we're not holding the buffer header spinlock. It's probably not necessary to mark volatile in every single place in bufmgr.c, but better safe than sorry. Per trouble report from Kevin Grittner.
* Convert the arithmetic for shared memory size calculation from 'int'Tom Lane2005-08-20
| | | | | | | | | | | to 'Size' (that is, size_t), and install overflow detection checks in it. This allows us to remove the former arbitrary restrictions on NBuffers etc. It won't make any difference in a 32-bit machine, but in a 64-bit machine you could theoretically have terabytes of shared buffers. (How efficiently we could manage 'em remains to be seen.) Similarly, num_temp_buffers, work_mem, and maintenance_work_mem can be set above 2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional work by moi.
* Reverse out Assert addition.Bruce Momjian2005-08-12
|
* Improve documention on loading large data sets into plperl.Bruce Momjian2005-08-12
| | | | David Fetter
* Remove BufferBlockPointers array in favor of a base + (bufnum) * BLCKSZTom Lane2005-08-12
| | | | | | | | computation. On modern machines this is as fast if not faster, and we don't have to clog the CPU's L2 cache with a tens-of-KB pointer array. If we ever decide to adopt a more dynamic allocation method for shared buffers, we'll probably have to revert this patch, but in the meantime we might as well save a few bytes and nanoseconds. Per Qingqing Zhou.
* Avoid useless loop overhead in AtEOXact routines when the backend isTom Lane2005-08-08
| | | | compiled with USE_ASSERT_CHECKING but is running with assert_enabled false.
* Cause ShutdownPostgres to do a normal transaction abort during backendTom Lane2005-08-08
| | | | | | | | exit, instead of trying to take shortcuts. Introduce some additional shutdown callback routines to eliminate kluges like having ProcKill be responsible for shutting down the buffer manager. Ensure that the order of operations during shutdown is predictable and what you would expect given the module layering.
* Tweak BgBufferSync() so that a persistent write error on a dirty bufferTom Lane2005-08-02
| | | | | | | doesn't block the bgwriter from making progress writing out other buffers. This was a hard problem in the context of the ARC/2Q design, but it's trivial in the context of clock sweep ... just advance the sweep counter before we try to write not after.
* Modify hash_search() API to prevent future occurrences of the errorTom Lane2005-05-29
| | | | | | | | | | | | | spotted by Qingqing Zhou. The HASH_ENTER action now automatically fails with elog(ERROR) on out-of-memory --- which incidentally lets us eliminate duplicate error checks in quite a bunch of places. If you really need the old return-NULL-on-out-of-memory behavior, you can ask for HASH_ENTER_NULL. But there is now an Assert in that path checking that you aren't hoping to get that behavior in a palloc-based hash table. Along the way, remove the old HASH_FIND_SAVE/HASH_REMOVE_SAVED actions, which were not being used anywhere anymore, and were surely too ugly and unsafe to want to see revived again.
* Split the shared-memory array of PGPROC pointers out of the sinvalTom Lane2005-05-19
| | | | | | communication structure, and make it its own module with its own lock. This should reduce contention at least a little, and it definitely makes the code seem cleaner. Per my recent proposal.
* Remove unnecessary calls of FlushRelationBuffers: there is no needTom Lane2005-03-20
| | | | | | | | | | to write out data that we are about to tell the filesystem to drop. smgr_internal_unlink already had a DropRelFileNodeBuffers call to get rid of dead buffers without a write after it's no longer possible to roll back the deleting transaction. Adding a similar call in smgrtruncate simplifies callers and makes the overall division of labor clearer. This patch removes the former behavior that VACUUM would write all dirty buffers of a relation unconditionally.
* Add temp_buffers GUC variable to allow users to determine the sizeTom Lane2005-03-19
| | | | of the local buffer arena for temporary table access.
* Upgrade localbuf.c to use a hash table instead of linear search toTom Lane2005-03-19
| | | | | find already-allocated local buffers. This is the last obstacle in the way of setting NLocBuffer to something reasonably large.
* Need to reset local buffer pin counts, not only shared buffer pins,Tom Lane2005-03-18
| | | | before we attempt any file deletions in ShutdownPostgres. Per Tatsuo.
* Avoid infinite loop in InvalidateBuffer if we ourselves are holdingTom Lane2005-03-18
| | | | a pin on the victim buffer.
* Replace the BufMgrLock with separate locks on the lookup hashtable andTom Lane2005-03-04
| | | | | | | | the freelist, plus per-buffer spinlocks that protect access to individual shared buffer headers. This requires abandoning a global freelist (since the freelist is a global contention point), which shoots down ARC and 2Q as well as plain LRU management. Adopt a clock sweep algorithm instead. Preliminary results show substantial improvement in multi-backend situations.
* Ensure that all details of the ARC algorithm are hidden within freelist.c.Tom Lane2005-02-03
| | | | | This refactoring does not change any algorithms or data structures, just remove visibility of the ARC datastructures from other source files.
* Phase 1 of fix for 'SMgrRelation hashtable corrupted' problem. ThisTom Lane2005-01-10
| | | | | | is the minimum required fix. I want to look next at taking advantage of it by simplifying the message semantics in the shared inval message queue, but that part can be held over for 8.1 if it turns out too ugly.
* Repair bufmgr deadlock problem reported by Michael Wildpaner. Must takeTom Lane2005-01-03
| | | | | | | | | | | share lock on a buffer being written out before releasing BufMgrLock in the BufferAlloc code path; if we do it later we might block on someone who's re-pinned the buffer. I believe this is only an issue for BufferAlloc and not the other places that call FlushBuffer. BufferSync must continue to do it the old way since it may well be trying to write buffers that other backends have pinned; but it should not be holding any conflicting locks. FlushRelationBuffers is okay since it's got exclusive lock at the relation level.
* Tag appropriate files for rc3PostgreSQL Daemon2004-12-31
| | | | | | | | Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...
* Assert that BufferIsPinned() in IncrBufferRefCount(), rather than usingNeil Conway2004-11-24
| | | | | a home-brewed combination of assertions that boiled down to the same thing.
* Allow background writing to be shut down by setting limit values to zero.Tom Lane2004-10-17
| | | | | | | This does not disable the bgwriter process: it still has to wake up often enough to collect fsync requests from backends in a timely fashion. But it responds to the recent gripe about not being able to prevent the disk from being spun up constantly.
* Give the ResourceOwner mechanism full responsibility for releasing bufferTom Lane2004-10-16
| | | | | | | | pins at end of transaction, and reduce AtEOXact_Buffers to an Assert cross-check that this was done correctly. When not USE_ASSERT_CHECKING, AtEOXact_Buffers is a complete no-op. This gets rid of an O(NBuffers) bottleneck during transaction commit/abort, which recent testing has shown becomes significant above a few tens of thousands of shared buffers.
* Remove BufferLocks[] array in favor of a single pointer to the bufferTom Lane2004-10-16
| | | | | | (if any) currently waited for by LockBufferForCleanup(), which is all that we were using it for anymore. Saves some space and eliminates proportional-to-NBuffers slowdown in UnlockBuffers().
* Repair possible failure to update hint bits back to disk, perTom Lane2004-10-15
| | | | | | | | | | http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php. This fix is intended to be permanent: it moves the responsibility for calling SetBufferCommitInfoNeedsSave() into the tqual.c routines, eliminating the requirement for callers to test whether t_infomask changed. Also, tighten validity checking on buffer IDs in bufmgr.c --- several routines were paranoid about out-of-range shared buffer numbers but not about out-of-range local ones, which seems a tad pointless.
* Restructure subtransaction handling to reduce resource consumption,Tom Lane2004-09-16
| | | | | | | | | | | | | | | | | as per recent discussions. Invent SubTransactionIds that are managed like CommandIds (ie, counter is reset at start of each top transaction), and use these instead of TransactionIds to keep track of subtransaction status in those modules that need it. This means that a subtransaction does not need an XID unless it actually inserts/modifies rows in the database. Accordingly, don't assign it an XID nor take a lock on the XID until it tries to do that. This saves a lot of overhead for subtransactions that are only used for error recovery (eg plpgsql exceptions). Also, arrange to release a subtransaction's XID lock as soon as the subtransaction exits, in both the commit and abort cases. This avoids holding many unique locks after a long series of subtransactions. The price is some additional overhead in XactLockTableWait, but that seems acceptable. Finally, restructure the state machine in xact.c to have a more orthogonal set of states for subtransactions.
* I can't see any good reason for DropRelFileNodeBuffers to be issuingTom Lane2004-09-06
| | | | FATAL when it detects a nonzero reference count. Reduce to ERROR.
* FlushRelationBuffers was also being a bit cavalier about whether theTom Lane2004-08-31
| | | | relation is already opened by smgr.
* Pgindent run for 8.0.Bruce Momjian2004-08-29
|
* Update copyright to 2004.Bruce Momjian2004-08-29
|
* Invent ResourceOwner mechanism as per my recent proposal, and use it toTom Lane2004-07-17
| | | | | | | | keep track of portal-related resources separately from transaction-related resources. This allows cursors to work in a somewhat sane fashion with nested transactions. For now, cursor behavior is non-subtransactional, that is a cursor's state does not roll back if you abort a subtransaction that fetched from the cursor. We might want to change that later.
* Nested transactions. There is still much left to do, especially on theTom Lane2004-07-01
| | | | | | | performance front, but with feature freeze upon us I think it's time to drive a stake in the ground and say that this will be in 7.5. Alvaro Herrera, with some help from Tom Lane.
* Tablespaces. Alternate database locations are dead, long live tablespaces.Tom Lane2004-06-18
| | | | | | | | | There are various things left to do: contrib dbsize and oid2name modules need work, and so does the documentation. Also someone should think about COMMENT ON TABLESPACE and maybe RENAME TABLESPACE. Also initlocation is dead, it just doesn't know it yet. Gavin Sherry and Tom Lane.
* StrategyDirtyBufferList wasn't being careful to honor max_buffers limit.Tom Lane2004-06-11
| | | | | Bug is only latent given that sole caller is passing NBuffers, but it could bite someone in the rear someday.
* Add some code to Assert that when we release pin on a buffer, we areTom Lane2004-06-11
| | | | | | | not holding the buffer's cntx_lock or io_in_progress_lock. A recent report from Litao Wu makes me wonder whether it is ever possible for us to drop a buffer and forget to release its cntx_lock. The Assert does not fire in the regression tests, but that proves little ...
* Adjust our timezone library to use pg_time_t (typedef'd as int64) inTom Lane2004-06-03
| | | | | | | | | | | | | | | | | | | place of time_t, as per prior discussion. The behavior does not change on machines without a 64-bit-int type, but on machines with one, which is most, we are rid of the bizarre boundary behavior at the edges of the 32-bit-time_t range (1901 and 2038). The system will now treat times over the full supported timestamp range as being in your local time zone. It may seem a little bizarre to consider that times in 4000 BC are PST or EST, but this is surely at least as reasonable as propagating Gregorian calendar rules back that far. I did not modify the format of the zic timezone database files, which means that for the moment the system will not know about daylight-savings periods outside the range 1901-2038. Given the way the files are set up, it's not a simple decision like 'widen to 64 bits'; we have to actually think about the range of years that need to be supported. We should probably inquire what the plans of the upstream zic people are before making any decisions of our own.
* Additional mop-up for sync-to-fsync changes: avoid issuing fsyncs forTom Lane2004-05-31
| | | | | | temp tables, and avoid WAL-logging truncations of temp tables. Do issue fsync on truncated files (not sure this is necessary but it seems like a good idea).
* Minor code rationalization: FlushRelationBuffers just returns void,Tom Lane2004-05-31
| | | | | | | | rather than an error code, and does elog(ERROR) not elog(WARNING) when it detects a problem. All callers were simply elog(ERROR)'ing on failure return anyway, and I find it hard to envision a caller that would not, so we may as well simplify the callers and produce the more useful error message directly.
* Per previous discussions, get rid of use of sync(2) in favor ofTom Lane2004-05-31
| | | | | | | | | | | | | | | | | explicitly fsync'ing every (non-temp) file we have written since the last checkpoint. In the vast majority of cases, the burden of the fsyncs should fall on the bgwriter process not on backends. (To this end, we assume that an fsync issued by the bgwriter will force out blocks written to the same file by other processes using other file descriptors. Anyone have a problem with that?) This makes the world safe for WIN32, which ain't even got sync(2), and really makes the world safe for Unixen as well, because sync(2) never had the semantics we need: it offers no way to wait for the requested I/O to finish. Along the way, fix a bug I recently introduced in xlog recovery: file truncation replay failed to clear bufmgr buffers for the dropped blocks, which could result in 'PANIC: heap_delete_redo: no block' later on in xlog replay.
* Separate out bgwriter code into a logically separate module, ratherTom Lane2004-05-29
| | | | | | | | | than being random pieces of other files. Give bgwriter responsibility for all checkpoint activity (other than a post-recovery checkpoint); so this child process absorbs the functionality of the former transient checkpoint and shutdown subprocesses. While at it, create an actual include file for postmaster.c, which for some reason never had its own file before.
* Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs byTom Lane2004-05-28
| | | | | | | | | | | about a third, make it work on non-Windows platforms again. (But perhaps I broke the WIN32 code, since I have no way to test that.) Fold all the paths that fork postmaster child processes to go through the single routine SubPostmasterMain, which takes care of resurrecting the state that would normally be inherited from the postmaster (including GUC variables). Clean up some places where there's no particularly good reason for the EXEC and non-EXEC cases to work differently. Take care of one or two FIXMEs that remained in the code.
* Get rid of rd_nblocks field in relcache entries. Turns out this wasTom Lane2004-05-08
| | | | | | | | | costing us lots more to maintain than it was worth. On shared tables it was of exactly zero benefit because we couldn't trust it to be up to date. On temp tables it sometimes saved an lseek, but not often enough to be worth getting excited about. And the real problem was that we forced an lseek on every relcache flush in order to update the field. So all in all it seems best to lose the complexity.
* Tiny assorted fixes: correct a typo in a comment in vacuumlazy.c, removeNeil Conway2004-04-25
| | | | | some unused #include directives from bufmgr.c, and clarify comments in bufmgr.h and buf.h
* Make LocalRefCount and PrivateRefCount arrays of int32, rather than long.Neil Conway2004-04-22
| | | | This saves a small amount of per-backend memory for LP64 machines.
* Another round of code cleanup on bufmgr. Use BM_VALID flag to keep trackTom Lane2004-04-21
| | | | | | | | | of whether we have successfully read data into a buffer; this makes the error behavior a bit more transparent (IMHO anyway), and also makes it work correctly for local buffers which don't use Start/TerminateBufferIO. Collapse three separate functions for writing a shared buffer into one. This overlaps a bit with cleanups that Neil proposed awhile back, but seems not to have committed yet.