| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
| |
(but not locked, as that would risk deadlocks). Also, make it work in a small
ring of buffers to avoid having bulk inserts trash the whole buffer arena.
Robert Haas, after an idea of Simon Riggs'.
|
| |
|
|
|
|
| |
avoid this problem in the future.)
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
buffers that cannot possibly need to be cleaned, and estimates how many
buffers it should try to clean based on moving averages of recent allocation
requests and density of reusable buffers. The patch also adds a couple
more columns to pg_stat_bgwriter to help measure the effectiveness of the
bgwriter.
Greg Smith, building on his own work and ideas from several other people,
in particular a much older patch from Itagaki Takahiro.
|
|
|
|
|
|
|
| |
when multiple backends are scanning the same relation concurrently, each page
is (ideally) read only once.
Jeff Davis, with review by Heikki and Tom.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
buffers, rather than blowing out the whole shared-buffer arena. Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer. Those flushes will now occur only once per
ring-ful. The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done. The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.
This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it. To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.
Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.
Original patch by Simon, reworked by Heikki and again by Tom.
|
|
|
|
| |
back-stamped for this.
|
| |
|
|
|
|
|
|
|
|
|
| |
BufferAlloc tries to insert a new mapping entry before deleting the old one
for a buffer, we have a transient need for more than NBuffers entries ---
one more in 8.1, and as many as NUM_BUFFER_PARTITIONS more in CVS HEAD.
In theory this could lead to an "out of shared memory" failure if shmem
had already been completely claimed by the time the extra entries were
needed.
|
| |
|
| |
|
|
|
|
|
|
|
| |
pointers, to ensure that compilers won't rearrange accesses to occur
while we're not holding the buffer header spinlock. It's probably
not necessary to mark volatile in every single place in bufmgr.c,
but better safe than sorry. Per trouble report from Kevin Grittner.
|
|
|
|
|
|
|
|
|
|
|
| |
to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc. It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.) Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional
work by moi.
|
|
|
|
|
|
|
|
| |
the freelist, plus per-buffer spinlocks that protect access to individual
shared buffer headers. This requires abandoning a global freelist (since
the freelist is a global contention point), which shoots down ARC and 2Q
as well as plain LRU management. Adopt a clock sweep algorithm instead.
Preliminary results show substantial improvement in multi-backend situations.
|
|
|
|
|
| |
This refactoring does not change any algorithms or data structures, just
remove visibility of the ARC datastructures from other source files.
|
|
|
|
|
|
|
|
| |
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
as per recent discussions. Invent SubTransactionIds that are managed like
CommandIds (ie, counter is reset at start of each top transaction), and
use these instead of TransactionIds to keep track of subtransaction status
in those modules that need it. This means that a subtransaction does not
need an XID unless it actually inserts/modifies rows in the database.
Accordingly, don't assign it an XID nor take a lock on the XID until it
tries to do that. This saves a lot of overhead for subtransactions that
are only used for error recovery (eg plpgsql exceptions). Also, arrange
to release a subtransaction's XID lock as soon as the subtransaction
exits, in both the commit and abort cases. This avoids holding many
unique locks after a long series of subtransactions. The price is some
additional overhead in XactLockTableWait, but that seems acceptable.
Finally, restructure the state machine in xact.c to have a more orthogonal
set of states for subtransactions.
|
| |
|
| |
|
|
|
|
|
| |
Bug is only latent given that sole caller is passing NBuffers, but it
could bite someone in the rear someday.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
place of time_t, as per prior discussion. The behavior does not change
on machines without a 64-bit-int type, but on machines with one, which
is most, we are rid of the bizarre boundary behavior at the edges of
the 32-bit-time_t range (1901 and 2038). The system will now treat
times over the full supported timestamp range as being in your local
time zone. It may seem a little bizarre to consider that times in
4000 BC are PST or EST, but this is surely at least as reasonable as
propagating Gregorian calendar rules back that far.
I did not modify the format of the zic timezone database files, which
means that for the moment the system will not know about daylight-savings
periods outside the range 1901-2038. Given the way the files are set up,
it's not a simple decision like 'widen to 64 bits'; we have to actually
think about the range of years that need to be supported. We should
probably inquire what the plans of the upstream zic people are before
making any decisions of our own.
|
|
|
|
|
|
|
|
|
| |
of whether we have successfully read data into a buffer; this makes the
error behavior a bit more transparent (IMHO anyway), and also makes it
work correctly for local buffers which don't use Start/TerminateBufferIO.
Collapse three separate functions for writing a shared buffer into one.
This overlaps a bit with cleanups that Neil proposed awhile back, but
seems not to have committed yet.
|
|
|
|
|
|
|
|
| |
of VACUUM cases so that VACUUM requests don't affect the ARC state at all,
avoid corner case where BufferSync would uselessly rewrite a buffer that
no longer contains the page that was to be flushed. Make some minor
other cleanups in and around the bufmgr as well, such as moving PinBuffer
and UnpinBuffer into bufmgr.c where they really belong.
|
|
|
|
|
|
|
|
|
|
|
| |
for already empty buffers because their buffer tag was not cleard out
when the buffers have been invalidated before.
Also removed the misnamed BM_FREE bufhdr flag and replaced the checks,
which effectively ask if the buffer is unpinned, with checks against the
refcount field.
Jan
|
|
|
|
| |
Jan
|
|
|
|
|
|
| |
ARC buffer replacement strategy.
Jan
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This first part of the background writer does no syncing at all.
It's only purpose is to keep the LRU heads clean so that regular
backends seldom to never have to call write().
Jan
|
|
|
|
|
|
|
|
|
| |
debug_shared_buffers = <seconds>
as per previous discussion.
Jan
|
|
|
|
|
|
|
|
|
| |
I added a couple more Assertions while tracking down the exact
cause of the former bug.
All 93 regression tests pass now.
Jan
|
|
|
|
| |
Jan
|
|
|
|
|
|
| |
algorithm adopted for PostgreSQL.
Jan
|
| |
|
| |
|
| |
|
|
|
|
| |
initdb/regression tests pass.
|
|
|
|
| |
spacing. Also adds space for one-line comments.
|
|
|
|
| |
tests pass.
|
|
|
|
|
|
|
|
|
| |
existing lock manager and spinlocks: it understands exclusive vs shared
lock but has few other fancy features. Replace most uses of spinlocks
with lightweight locks. All remaining uses of spinlocks have very short
lock hold times (a few dozen instructions), so tweak spinlock backoff
code to work efficiently given this assumption. All per my proposal on
pghackers 26-Sep-01.
|
|
|
|
|
| |
to wait until it's safe to remove tuples and compact free space in a
shared buffer page. Miscellaneous small code cleanups in bufmgr, too.
|
| |
|
|
|
|
|
|
|
| |
included by everything that includes bufmgr.h --- it's supposed to be
internals, after all, not part of the API! This fixes the conflict
against FreeBSD headers reported by Rosenman, by making it unnecessary
for s_lock.h to be included by plperl.c.
|
|
|
|
|
|
|
|
| |
as a shared dirtybit for each shared buffer. The shared dirtybit still
controls writing the buffer, but the local bit controls whether we need
to fsync the buffer's file. This arrangement fixes a bug that allowed
some required fsyncs to be missed, and should improve performance as well.
For more info see my post of same date on pghackers.
|
|
|
|
|
|
| |
* Portions Copyright (c) 1996-2000, PostgreSQL, Inc
to all files copyright Regents of Berkeley. Man, that's a lot of files.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Buffer refcount cleanup (per my "progress report" to pghackers, 9/22).
* Add links to backend PROC structs to sinval's array of per-backend info,
and use these links for routines that need to check the state of all
backends (rather than the slow, complicated search of the ShmemIndex
hashtable that was used before). Add databaseOID to PROC structs.
* Use this to implement an interlock that prevents DESTROY DATABASE of
a database containing running backends. (It's a little tricky to prevent
a concurrently-starting backend from getting in there, since the new
backend is not able to lock anything at the time it tries to look up
its database in pg_database. My solution is to recheck that the DB is
OK at the end of InitPostgres. It may not be a 100% solution, but it's
a lot better than no interlock at all...)
* In ALTER TABLE RENAME, flush buffers for the relation before doing the
rename of the physical files, to ensure we don't get failures later from
mdblindwrt().
* Update TRUNCATE patch so that it actually compiles against current
sources :-(.
You should do "make clean all" after pulling these changes.
|
| |
|
| |
|
| |
|