| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
| |
exactly at the point where we need to insert a new item, the calculation used
the wrong size for the "high key" of the new left page. This could lead to
choosing an unworkable split, resulting in "PANIC: failed to add item to the
left sibling" (or "right sibling") failure. Although this bug has been there
a long time, it's very difficult to trigger a failure before 8.2, since there
was generally a lot of free space on both sides of a chosen split. In 8.2,
where the user-selected fill factor determines how much free space the code
tries to leave, an unworkable split is much more likely. Report by Joe
Conway, diagnosis and fix by Heikki Linnakangas.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
preference for filling pages out-of-order tends to confuse the sanity checks
in md.c, as per report from Balazs Nagy in bug #2737. The fix is to ensure
that the smgr-level code always has the same idea of the logical EOF as the
hash index code does, by using ReadBuffer(P_NEW) where we are adding a single
page to the end of the index, and using smgrextend() to reserve a large batch
of pages when creating a new splitpoint. The patch is a bit ugly because it
avoids making any changes in md.c, which seems the most prudent approach for a
backpatchable beta-period fix. After 8.3 development opens, I'll take a look
at a cleaner but more invasive patch, in particular getting rid of the now
unnecessary hack to allow reading beyond EOF in mdread().
Backpatch as far as 7.4. The bug likely exists in 7.3 as well, but because
of the magnitude of the 7.3-to-7.4 changes in hash, the later-version patch
doesn't even begin to apply. Given the other known bugs in the 7.3-era hash
code, it does not seem worth trying to develop a separate patch for 7.3.
|
|
|
|
|
|
|
|
|
| |
cases where we already hold the desired lock "indirectly", either via
membership in a MultiXact or because the lock was originally taken by a
different subtransaction of the current transaction. These cases must be
accounted for to avoid needless deadlocks and/or inappropriate replacement of
an exclusive lock with a shared lock. Per report from Clarence Gardner and
subsequent investigation.
|
|
|
|
|
|
|
|
|
| |
_bt_pagedel to recover from the failure: just search the whole parent level
if searching to the right fails. This does nothing for the underlying problem
that index keys became out-of-order in the grandparent level. However, we
believe that there is no other consequence worse than slightly inefficient
searching, so this narrow patch seems like the safest solution for the back
branches.
|
|
|
|
|
|
|
|
|
|
|
| |
recovery. In the first place, it doesn't work because slru's
latest_page_number isn't set up yet (this is why we've been hearing reports
of strange "apparent wraparound" log messages during crash recovery, but
only from people who'd managed to advance their next-mxact counters some
considerable distance from 0). In the second place, it seems a bit unwise
to be throwing away data during crash recovery anwyway. This latter
consideration convinces me to just disable truncation during recovery,
rather than computing latest_page_number and pushing ahead.
|
|
|
|
|
| |
backup history file. Bug introduced by the 8.1 change to make pg_stop_backup
delete older history files. Per report from Masao Fujii.
|
|
|
|
|
|
|
|
|
|
| |
failures even when the hardware and OS did nothing wrong. Per recent analysis
of a problem report from Alex Bahdushka.
For the moment I've just diked out the test of the parameter, rather than
removing the GUC infrastructure and documentation, in case we conclude that
there's something salvageable there. There seems no chance of it being
resurrected in the 8.1 branch though.
|
|
|
|
|
|
|
|
|
|
| |
passed extend = true whenever we are reading a page we intend to reinitialize
completely, even if we think the page "should exist". This is because it
might indeed not exist, if the relation got truncated sometime after the
current xlog record was made and before the crash we're trying to recover
from. These two thinkos appear to explain both of the old bug reports
discussed here:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg01369.php
|
|
|
|
|
| |
with not responding to query cancel during the last stage of btree index
creation.
|
|
|
|
|
|
|
|
| |
we are not holding a buffer content lock; where it was, InterruptHoldoffCount
is positive and so we'd not respond to cancel signals as intended. Also
add missing vacuum_delay_point() call in btvacuumcleanup. This should fix
complaint from Evgeny Gridasov about failure to respond to SIGINT/SIGTERM
in a timely fashion (bug #2257).
|
| |
|
|
|
|
|
|
|
|
|
|
| |
to try to create a log segment file concurrently, but the code erroneously
specified O_EXCL to open(), resulting in a needless failure. Before 7.4,
it was even a PANIC condition :-(. Correct code is actually simpler than
what we had, because we can just say O_CREAT to start with and not need a
second open() call. I believe this accounts for several recent reports of
hard-to-reproduce "could not create file ...: File exists" errors in both
pg_clog and pg_subtrans.
|
|
|
|
|
|
|
| |
discarded by cache flush while still in use. This is a minimal patch that
just copies the tupdesc anywhere it could be needed across a flush. Applied
to back branches only; Neil Conway is working on a better long-term solution
for HEAD.
|
|
|
|
|
|
| |
use it. While it normally has been opened earlier during btree index
build, testing shows that it's possible for the link to be closed again
if an sinval reset occurs while the index is being built.
|
|
|
|
|
| |
The consequences of overwriting a non-empty page are bad enough that
we should not omit this test in production builds.
|
|
|
|
| |
Back-patch of previous fix in HEAD for plperl-vs-locale issue.
|
|
|
|
|
|
|
|
|
|
|
|
| |
equal: if strcoll claims two strings are equal, check it with strcmp, and
sort according to strcmp if not identical. This fixes inconsistent
behavior under glibc's hu_HU locale, and probably under some other locales
as well. Also, take advantage of the now-well-defined behavior to speed up
texteq, textne, bpchareq, bpcharne: they may as well just do a bitwise
comparison and not bother with strcoll at all.
NOTE: affected databases may need to REINDEX indexes on text columns to be
sure they are self-consistent.
|
|
|
|
|
|
|
|
|
| |
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
|
|
|
|
|
|
|
|
|
| |
tuple in-place, but instead passes back an all-new tuple structure if
any changes are needed. This is a much cleaner and more robust solution
for the bug discovered by Alexey Beschiokov; accordingly, revert the
quick hack I installed yesterday.
With this change, HeapTupleData.t_datamcxt is no longer needed; will
remove it in a separate commit in HEAD only.
|
|
|
|
|
| |
those names. (Debug and None were pretty bad names anyway.) I hope I catched
all uses of the names in comments too.
|
|
|
|
|
|
|
|
|
|
|
| |
very narrow window in which SimpleLruReadPage or SimpleLruWritePage could
think that I/O was needed when it wasn't (and indeed the buffer had already
been assigned to another page). This would result in an Assert failure if
Asserts were enabled, and probably in silent data corruption if not.
Reported independently by Jim Nasby and Robert Creager.
I intend a more extensive fix when 8.2 development starts, but this is a
reasonably low-impact patch for the existing branches.
|
| |
|
|
|
|
|
| |
reserving SLRU space for a new MultiXact. The original coding would have
treated out-of-disk-space as a PANIC condition, which is unnecessary.
|
|
|
|
|
|
|
|
|
|
| |
multixact's starting offset before the offset has been stored into the
SLRU file. A simple fix would be to hold the MultiXactGenLock until the
offset has been stored, but that looks like a big concurrency hit. Instead
rely on knowledge that unset offsets will be zero, and loop when we see
a zero. This requires a little extra hacking to ensure that zero is never
a valid value for the offset. Problem reported by Matteo Beccati, fix
ideas from Martijn van Oosterhout, Alvaro Herrera, and Tom Lane.
|
| |
|
|
|
|
|
|
| |
generated from subquery outputs: use the type info stored in the Var
itself. To avoid making ExecEvalVar and slot_getattr more complex
and slower, I split out the whole-row case into a separate ExecEval routine.
|
|
|
|
|
|
| |
type ID information even when it's a record type. This is needed to
handle whole-row Vars referencing subquery outputs. Per example from
Richard Huxton.
|
|
|
|
|
| |
by a recent HP C compiler. Mostly, get rid of useless local variables
that are assigned to but never used.
|
| |
|
|
|
|
|
|
|
| |
etc. match the docs, which talk about "transaction identifier" not
"gid" or "global transaction identifier".
Steve Woodcock
|
|
|
|
|
|
|
|
| |
This makes the error messages for PREPARE TRANSACTION, COMMIT PREPARED
etc. match the docs, which talk about "transaction identifier" not
"gid" or "global transaction identifier".
Steve Woodcock
|
|
|
|
|
|
|
| |
etc. match the docs, which talk about "transaction identifier" not
"gid" or "global transaction identifier".
Steve Woodcock
|
|
|
|
|
|
| |
the wrong buffer dirty when trying to kill a dead index entry that's on
a page after the one it started on. No risk of data corruption, just
inefficiency, but still a bug.
|
|
|
|
|
|
|
| |
generated by bitmap index scans. Along the way, simplify and speed up
the code for counting sequential and index scans; it was both confusing
and inefficient to be taking care of that in the per-tuple loops, IMHO.
initdb forced because of internal changes in pg_stat view definitions.
|
|
|
|
|
|
| |
was created on a machine with alignment rules and floating-point format
similar to the current machine. Per recent discussion, this seems like
a good idea with the increasing prevalence of 32/64 bit environments.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
on a page, as suggested by ITAGAKI Takahiro. Also, change a few places
that were using some other estimates of max-items-per-page to consistently
use MaxOffsetNumber. This is conservatively large --- we could have used
the new MaxHeapTuplesPerPage macro, or a similar one for index tuples ---
but those places are simply declaring a fixed-size buffer and assuming it
will work, rather than actively testing for overrun. It seems safer to
size these buffers in a way that can't overflow even if the page is
corrupt.
|
|
|
|
|
|
|
|
|
| |
saves nearly 700kB in the default shared memory segment size, which seems
worthwhile, and it is a feature that many users won't use anyway. Per
Heikki's argument, there is no point in a compromise value --- those who
are using 2PC at all will probably want it at least equal to max_connections.
But we can't set it to zero by default without breaking the prepared_xacts
regression test.
|
|
|
|
|
|
|
|
|
| |
after the fact. Fix bug with incorrect test for whether we are at end
of logfile segment. Arrange for writes triggered by XLogInsert's
is-cache-more-than-half-full test to synchronize with the cache boundaries,
so that in long transactions we tend to write alternating halves of the
cache rather than randomly chosen portions of it; this saves one more
write syscall per cache load.
|
|
|
|
|
|
|
|
|
| |
not accepting queries).
errmsg("database is not accepting queries to avoid
wraparound data loss in database \"%s\"",
errhint("Stop the postmaster and use a standalone
backend to VACUUM database \"%s\".",
|
|
|
|
|
| |
indexes all be int, rather than variously int, uint16 and uint32;
add some casts where necessary to support large buffer arrays.
|
|
|
|
| |
integer lists.
|
|
|
|
|
|
|
|
|
|
|
| |
to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc. It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.) Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional
work by moi.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
insufficient paranoia in code that follows t_ctid links. (We must do both
because even with VACUUM doing it properly, the intermediate state with
a dangling t_ctid link is visible concurrently during lazy VACUUM, and
could be seen afterwards if either type of VACUUM crashes partway through.)
Also try to improve documentation about what's going on. Patch is a bit
bulky because passing the XMAX information around required changing the
APIs of some low-level heapam.c routines, but it's not conceptually very
complicated. Per trouble report from Teodor and subsequent analysis.
This needs to be back-patched, but I'll do that after 8.1 beta is out.
|