| Commit message (Collapse) | Author | Age |
|
|
|
| |
Euler Taveira de Oliveira
|
|
|
|
|
|
| |
discussion.
Patch from Simon Riggs.
|
|
|
|
|
|
|
|
|
|
| |
PGPROC array into snapshots, and use this information to avoid visits
to pg_subtrans in HeapTupleSatisfiesSnapshot. This appears to solve
the pg_subtrans-related context swap storm problem that's been reported
by several people for 8.1. While at it, modify GetSnapshotData to not
take an exclusive lock on ProcArrayLock, as closer analysis shows that
shared lock is always sufficient.
Itagaki Takahiro and Tom Lane
|
|
|
|
| |
Needed because lock.c is now going to use the same type of list.
|
|
|
|
|
|
|
|
|
| |
of the transaction ID counter. Nothing is done with the epoch except to
store it in checkpoint records, but this provides a foundation with which
add-on code can pretend that XIDs never wrap around. This is a severely
trimmed and rewritten version of the xxid patch submitted by Marko Kreen.
Per discussion, the epoch counter seems the only part of xxid that really
needs to be in the core server.
|
|
|
|
|
|
|
|
|
|
|
| |
than N seconds apart. This allows a simple, if not very high performance,
means of guaranteeing that a PITR archive is no more than N seconds behind
real time. Also make pg_current_xlog_location return the WAL Write pointer,
add pg_current_xlog_insert_location to return the Insert pointer, and fix
pg_xlogfile_name_offset to return its results as a two-element record instead
of a smashed-together string, as per recent discussion.
Simon Riggs
|
|
|
|
|
|
|
|
|
|
| |
operation every so often. This improves the usefulness of PITR log
shipping for hot standby: formerly, if the standby server crashed, it
was necessary to restart it from the last base backup and replay all
the WAL since then. Now it will only need to reread about the same
amount of WAL as the master server would. The behavior might also
come in handy during a long PITR replay sequence. Simon Riggs,
with some editorialization by Tom Lane.
|
|
|
|
|
|
|
| |
to happen automatically during pg_stop_backup(). Add some functions for
interrogating the current xlog insertion point and for easily extracting
WAL filenames from the hex WAL locations displayed by pg_stop_backup
and friends. Simon Riggs with some editorialization by Tom Lane.
|
|
|
|
|
|
|
|
|
| |
vacuums. This allows a OLTP-like system with big tables to continue
regular vacuuming on small-but-frequently-updated tables while the
big tables are being vacuumed.
Original patch from Hannu Krossing, rewritten by Tom Lane and updated
by me.
|
|
|
|
| |
by Robert Lor
|
|
|
|
|
|
|
|
|
|
|
| |
recovery. In the first place, it doesn't work because slru's
latest_page_number isn't set up yet (this is why we've been hearing reports
of strange "apparent wraparound" log messages during crash recovery, but
only from people who'd managed to advance their next-mxact counters some
considerable distance from 0). In the second place, it seems a bit unwise
to be throwing away data during crash recovery anwyway. This latter
consideration convinces me to just disable truncation during recovery,
rather than computing latest_page_number and pushing ahead.
|
| |
|
|
|
|
|
|
|
| |
Strip unused include files out unused include files, and add needed
includes to C files.
The next step is to remove unused include files in C files.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To this end, add a couple of columns to pg_class, relminxid and relvacuumxid,
based on which we calculate the pg_database columns after each vacuum.
We now force all databases to be vacuumed, even template ones. A backend
noticing too old a database (meaning pg_database.datminxid is in danger of
falling behind Xid wraparound) will signal the postmaster, which in turn will
start an autovacuum iteration to process the offending database. In principle
this is only there to cope with frozen (non-connectable) databases without
forcing users to set them to connectable, but it could force regular user
database to go through a database-wide vacuum at any time. Maybe we should
warn users about this somehow. Of course the real solution will be to use
autovacuum all the time ;-)
There are some additional improvements we could have in this area: for example
the vacuum code could be smarter about not updating pg_database for each table
when called by autovacuum, and do it only once the whole autovacuum iteration
is done.
I updated the system catalogs documentation, but I didn't modify the
maintenance section. Also having some regression tests for this would be nice
but it's not really a very straightforward thing to do.
Catalog version bumped due to system catalog changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
discussion (including making def_arg allow reserved words), add missed
opt_definition for UNIQUE case. Put the reloptions support code in a less
random place (I chose to make a new file access/common/reloptions.c).
Eliminate header inclusion creep. Make the index options functions safely
user-callable (seems like client apps might like to be able to test validity
of options before trying to make an index). Reduce overhead for normal case
with no options by allowing rd_options to be NULL. Fix some unmaintainably
klugy code, including getting rid of Natts_pg_class_fixed at long last.
Some stylistic cleanup too, and pay attention to keeping comments in sync
with code.
Documentation still needs work, though I did fix the omissions in
catalogs.sgml and indexam.sgml.
|
|
|
|
| |
ITAGAKI Takahiro
|
|
|
|
|
|
| |
this someday, but right now it seems that posix_fadvise is immature to
the point of being broken on many platforms ... and we don't have any
benchmark evidence proving it's worth spending time on.
|
|
|
|
|
| |
backup history file. Bug introduced by the 8.1 change to make pg_stop_backup
delete older history files. Per report from Masao Fujii.
|
|
|
|
|
|
|
|
|
|
|
| |
changing semantics too much. statement_timestamp is now set immediately
upon receipt of a client command message, and the various places that used
to do their own gettimeofday() calls to mark command startup are referenced
to that instead. I have also made stats_command_string use that same
value for pg_stat_activity.query_start for both the command itself and
its eventual replacement by <IDLE> or <idle in transaction>. There was
some debate about that, but no argument that seemed convincing enough to
justify an extra gettimeofday() call.
|
|
|
|
|
|
|
|
| |
for it. Hopefully will fix core dump evidenced by some buildfarm members
since fadvise patch went in. The actual definition of the function is not
ABI-compatible with compiler's default assumption in the absence of any
declaration, so it's clearly unsafe to try to call it without seeing a
declaration.
|
| |
|
|
|
|
|
|
| |
close.
ITAGAKI Takahiro
|
|
|
|
| |
text[], int4[], Tsearch2 support for GIN.
|
|
|
|
|
|
|
|
|
| |
transaction_timestamp() (just like now()).
Also update statement_timeout() to mention it is statement arrival time
that is measured.
Catalog version updated.
|
|
|
|
|
|
|
|
|
| |
whenever we start to read within that file. The first page carries
extra identification information that really ought to be checked, but
as the code stood, this was only checked when we switched sequentially
into a new WAL file, or if by chance the starting checkpoint record was
within the first page. This patch ensures that we will detect bogus
'long header' information before we start replaying the WAL sequence.
|
|
|
|
|
|
|
|
|
| |
to occur between pg_start_backup() and pg_stop_backup(), even if the GUC
setting full_page_writes is OFF. Per discussion, doing this in combination
with the already-existing checkpoint during pg_start_backup() should ensure
safety against partial page updates being included in the backup. We do
not have to force full page writes to occur during normal PITR operation,
as I had first feared.
|
|
|
|
|
|
|
|
|
| |
update no-longer-existing pages to fall through as no-ops, but make a note
of each page number referenced by such records. If we don't see a later
XLOG entry dropping the table or truncating away the page, complain at
the end of XLOG replay. Since this fixes the known failure mode for
full_page_writes = off, revert my previous band-aid patch that disabled
that GUC variable.
|
|
|
|
|
|
| |
XLOG_BLCKSZ. This ought to help in preventing configuration mismatch
problems if anyone tries to ship PITR files between servers compiled
with different XLOG_BLCKSZ settings. Simon Riggs
|
|
|
|
|
|
| |
instead a dedicated symbol. This probably makes no functional difference
for likely values of BLCKSZ, but it makes the intent clearer.
Simon Riggs, minor editorialization by Tom Lane.
|
|
|
|
|
|
|
|
|
|
|
| |
used within WAL files. Historically this was the same as the data file
BLCKSZ, but there's no necessary connection, and it's possible that
performance gains might ensue from reducing XLOG_BLCKSZ. In any case
distinguishing two symbols should improve code clarity. This commit
does not actually change the page size, only provide the infrastructure
to make it possible to do so. initdb forced because of addition of a
field to pg_control.
Mark Wong, with some help from Simon Riggs and Tom Lane.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
misleadingly-named WriteBuffer routine, and instead require routines that
change buffer pages to call MarkBufferDirty (which does exactly what it says).
We also require that they do so before calling XLogInsert; this takes care of
the synchronization requirement documented in SyncOneBuffer. Note that
because bufmgr takes the buffer content lock (in shared mode) while writing
out any buffer, it doesn't matter whether MarkBufferDirty is executed before
the buffer content change is complete, so long as the content change is
completed before releasing exclusive lock on the buffer. So it's OK to set
the dirtybit before we fill in the LSN.
This eliminates the former kluge of needing to set the dirtybit in LockBuffer.
Aside from making the code more transparent, we can also add some new
debugging assertions, in particular that the caller of MarkBufferDirty must
hold the buffer content lock, not merely a pin.
|
|
|
|
|
|
|
|
| |
This commit doesn't make much functional change, but it does eliminate some
duplicated code --- for instance, PageIsNew tests are now done inside
XLogReadBuffer rather than by each caller.
The GIST xlog code still needs a lot of love, but I'll worry about that
separately.
|
|
|
|
|
|
|
|
|
|
| |
failures even when the hardware and OS did nothing wrong. Per recent analysis
of a problem report from Alex Bahdushka.
For the moment I've just diked out the test of the parameter, rather than
removing the GUC infrastructure and documentation, in case we conclude that
there's something salvageable there. There seems no chance of it being
resurrected in the 8.1 branch though.
|
|
|
|
|
|
|
|
|
| |
when an error occurs during xlog replay. Also, replace the former risky
'write into a fixed-size buffer with no overflow detection' API for XLOG
record description routines; use an expansible StringInfo instead. (The
latter accounts for most of the patch bulk.)
Qingqing Zhou
|
| |
|
|
|
|
|
|
|
|
| |
more compliant with the error message style guide. In particular,
errdetail should begin with a capital letter and end with a period,
whereas errmsg should not. I also fixed a few related issues in
passing, such as fixing the repeated misspelling of "lexeme" in
contrib/tsearch2 (per Tom's suggestion).
|
|
|
|
|
|
|
|
|
|
| |
to try to create a log segment file concurrently, but the code erroneously
specified O_EXCL to open(), resulting in a needless failure. Before 7.4,
it was even a PANIC condition :-(. Correct code is actually simpler than
what we had, because we can just say O_CREAT to start with and not need a
second open() call. I believe this accounts for several recent reports of
hard-to-reproduce "could not create file ...: File exists" errors in both
pg_clog and pg_subtrans.
|
|
|
|
|
|
| |
rather than "return expr;" -- the latter style is used in most of the
tree. I kept the parentheses when they were necessary or useful because
the return expression was complex.
|
|
|
|
|
|
|
|
|
|
| |
in favor of having just one set of macros that don't do HOLD/RESUME_INTERRUPTS
(hence, these correspond to the old SpinLockAcquire_NoHoldoff case).
Given our coding rules for spinlock use, there is no reason to allow
CHECK_FOR_INTERRUPTS to be done while holding a spinlock, and also there
is no situation where ImmediateInterruptOK will be true while holding a
spinlock. Therefore doing HOLD/RESUME_INTERRUPTS while taking/releasing a
spinlock is just a waste of cycles. Qingqing Zhou and Tom Lane.
|
|
|
|
|
|
| |
setup. This protects against undesired changes in locale behavior
if someone carelessly does setlocale(LC_ALL, "") (and we know who
you are, perl guys).
|
|
|
|
|
|
|
| |
reduce contention for the former single LockMgrLock. Per my recent
proposal. I set it up for 16 partitions, but on a pgbench test this
gives only a marginal further improvement over 4 partitions --- we need
to test more scenarios to choose the number of partitions.
|
|
|
|
|
|
|
|
|
| |
SLRU area. The number of slots is still a compile-time constant (someday
we might want to change that), but at least it's a different constant for
each SLRU area. Increase number of subtrans buffers to 32 based on
experimentation with a heavily subtrans-bashing test case, and increase
number of multixact member buffers to 16, since it's obviously silly for
it not to be at least twice the number of multixact offset buffers.
|
|
|
|
|
|
|
|
|
|
| |
lock, not exclusive, if the desired page is already in memory. This can
be demonstrated to be a significant win on the pg_subtrans cache when there
is a large window of open transactions. It should be useful for pg_clog
as well. I didn't try to make GetMultiXactIdMembers() use the code, as
that would have taken some restructuring, and what with the local cache
for multixact contents it probably wouldn't really make a difference.
Per my recent proposal.
|
|
|
|
|
|
|
|
|
| |
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
|
| |
|
|
|
|
|
| |
for the SLRU race condition that I posted a few days ago, but we decided
not to use in 8.1 and older branches.
|
|
|
|
|
|
|
|
|
|
|
| |
very narrow window in which SimpleLruReadPage or SimpleLruWritePage could
think that I/O was needed when it wasn't (and indeed the buffer had already
been assigned to another page). This would result in an Assert failure if
Asserts were enabled, and probably in silent data corruption if not.
Reported independently by Jim Nasby and Robert Creager.
I intend a more extensive fix when 8.2 development starts, but this is a
reasonably low-impact patch for the existing branches.
|
| |
|