| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to go beoynd 10MB, as demonstrated by Gavin Sharry's example of dropping a
schema with ~25000 objects. The really bogus thing about the limit was that
it was enforced when a state file file was read in, not when it was written,
so you would end up with a prepared transaction that you can't commit or
abort, and the only recourse was to shut down the server and remove the file
by hand.
Raise the limit to MaxAllocSize, and enforce it also when a state file is
written. We could've removed the limit altogether, but reading in a file
larger than MaxAllocSize would fail anyway because we read it into a
palloc'd buffer.
Backpatch down to 8.1, where 2PC and this issue was introduced.
|
|
|
|
|
|
|
| |
crashes on certain platforms. In particular, the MSVC runtime is known
to do this.
Fixes bug #4162, reported and diagnosed by Javier Pimas
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
parameter. This fixes bug 4137 reported by Wojciech Strzalka, where a WAL
file is deleted too early when starting the recovery of a warm standby server.
Also add a sanity check in pg_standby so that it will refuse to delete anything
earlier than the file being restored, and improve the debug message in case
nothing is deleted.
Simon Riggs. Backpatch to 8.3, which is where %r was introduced.
|
|
|
|
|
|
|
| |
binary search instead of linear search when checking child-transaction XIDs.
Per example from Robert Treat, the speed of TransactionIdIsCurrentTransactionId
is significantly more important in 8.3 than it was in prior releases, so
this seems worth taking back-patching risk for.
|
|
|
|
|
|
|
| |
<craig@postnewspapers.com.au>.
It was my mistake, I missed limitation of number of held locks, now GIN doesn't
use continiuous locks, but still hold buffers pinned to prevent interference
with vacuum's deletion algorithm.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
corrupted. (Neither is very important if SIGTERM is used to shut down the
whole database cluster together, but there's a problem if someone tries to
SIGTERM individual backends.) To do this, introduce new infrastructure
macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care
of transiently pushing an on_shmem_exit cleanup hook. Also use this method
for createdb cleanup --- that wasn't a shared-memory-corruption problem,
but SIGTERM abort of createdb could leave orphaned files lying around.
Backpatch as far as 8.2. The shmem corruption cases don't exist in 8.1,
and the createdb usage doesn't seem important enough to risk backpatching
further.
|
|
|
|
|
|
|
|
|
|
|
| |
messages if the calling transaction aborts later on. Collapsing out line
pointer redirects is a done deal as soon as we complete the page update,
so syscache *must* be notified even if the VACUUM FULL as a whole doesn't
complete. To fix, add some functionality to inval.c to allow the pending
inval messages to be sent immediately while heap_page_prune is still
running. The implementation is a bit chintzy: it will only work in the
context of VACUUM FULL. But that's all we need now, and it can always be
extended later if needed. Per my trouble report of a week ago.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
it accumulates the set of changes to be made and then applies them. It had
to accumulate the set of changes anyway to prepare a WAL record for the
pruning action, so this isn't an enormous change; the only new complexity is
to not doubly mark tuples that are visited twice in the scan. The main
advantage is that we can substantially reduce the scope of the critical
section in which the changes are applied, thus avoiding PANIC in foreseeable
cases like running out of memory in inval.c. A nice secondary advantage is
that it is now far clearer that WAL replay will actually do the same thing
that the original pruning did.
This commit doesn't do anything about the open problem that
CacheInvalidateHeapTuple doesn't have the right semantics for a CTID change
caused by collapsing out a redirect pointer. But whatever we do about that,
it'll be a good idea to not do it inside a critical section.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TopMemoryContext, rather than scattered through executor per-query contexts.
This poses no danger of memory leak since the ResourceOwner mechanism
guarantees release of no-longer-needed items. It is needed because the
per-query context might already be released by the time we try to clean up
the hash scan list. Report by ykhuang, diagnosis by Heikki.
Back-patch to 8.0, where the ResourceOwner-based cleanup was introduced.
The given test case does not fail before 8.2, probably because we rearranged
transaction abort processing somehow; but this coding is undoubtedly risky
so I'll patch 8.0 and 8.1 anyway.
|
|
|
|
|
|
|
| |
temporary table; we can't support that because there's no way to clean up the
source backend's internal state if the eventual COMMIT PREPARED is done by
another backend. This was checked correctly in 8.1 but I broke it in 8.2 :-(.
Patch by Heikki Linnakangas, original trouble report by John Smith.
|
|
|
|
|
|
|
|
|
|
| |
is also licensed to put a local variable declared that way at an unaligned
address. Which will not work if the variable is then manipulated with
SET_VARSIZE or other macros that assume alignment. So the previous patch
is not an unalloyed good, but on balance I think it's still a win, since
we have very few places that do that sort of thing. Fix the one place in
tuptoaster.c that does it. Per buildfarm results from gypsy_moth
(I'm a bit surprised that only one machine showed a failure).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
represented as "char ...[4]" not "int32". Since the length word is never
supposed to be accessed via this struct member anyway, this won't break
any existing code that is following the rules. The advantage is that C
compilers will no longer assume that a pointer to struct varlena is
word-aligned, which prevents incorrect optimizations in TOAST-pointer
access and perhaps other places. gcc doesn't seem to do this (at least
not at -O2), but the problem is demonstrable on some other compilers.
I changed struct inet as well, but didn't bother to touch a lot of other
struct definitions in which it wouldn't make any difference because there
were other fields forcing int alignment anyway. Hopefully none of those
struct definitions are used for accessing unaligned Datums.
|
|
|
|
|
| |
synchronized-scanning behavior, and make pg_dump disable sync scans so that
it will reliably preserve row ordering. Per recent discussions.
|
|
|
|
| |
wrong because of mismatched byte ordering.
|
|
|
|
|
|
|
|
|
|
|
| |
in whichever context happens to be current during a call of an xml.c function,
use a dedicated context that will not go away until we explicitly delete it
(which we do at transaction end or subtransaction abort). This makes recovery
after an error much simpler --- we don't have to individually delete the data
structures created by libxml. Also, we need to initialize and cleanup libxml
only once per transaction (if there's no error) instead of once per function
call, so it should be a bit faster. We'll need to keep an eye out for
intra-transaction memory leaks, though. Alvaro and Tom.
|
|
|
|
|
|
|
|
| |
its second pass over the table. It has to start at block zero, else the
"merge join" logic for detecting which TIDs are already in the index
doesn't work. Hence, extend heapam.c's API so that callers can enable or
disable syncscan. (I put in an option to disable buffer access strategy,
too, just in case somebody needs it.) Per report from Hannes Dorbath.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and CLUSTER) execute as the table owner rather than the calling user, using
the same privilege-switching mechanism already used for SECURITY DEFINER
functions. The purpose of this change is to ensure that user-defined
functions used in index definitions cannot acquire the privileges of a
superuser account that is performing routine maintenance. While a function
used in an index is supposed to be IMMUTABLE and thus not able to do anything
very interesting, there are several easy ways around that restriction; and
even if we could plug them all, there would remain a risk of reading sensitive
information and broadcasting it through a covert channel such as CPU usage.
To prevent bypassing this security measure, execution of SET SESSION
AUTHORIZATION and SET ROLE is now forbidden within a SECURITY DEFINER context.
Thanks to Itagaki Takahiro for reporting this vulnerability.
Security: CVE-2007-6600
|
| |
|
|
|
|
|
|
|
|
|
| |
since these seem to happen after all in corrupted indexes. Make sure we
supply the index name in all cases, and provide relevant block numbers where
available. Also consistently identify the index name as such.
Back-patch to 8.2, in hopes that this might help Mason Hale figure out his
problem.
|
|
|
|
|
|
|
|
|
|
| |
constraint status of copied indexes (bug #3774), as well as various other
small bugs such as failure to pstrdup when needed. Allow INCLUDING INDEXES
indexes to be merged with identical declared indexes (perhaps not real useful,
but the code is there and having it not apply to LIKE indexes seems pretty
unorthogonal). Avoid useless work in generateClonedIndexStmt(). Undo some
poorly chosen API changes, and put a couple of routines in modules that seem
to be better places for them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
but no database changes have been made since the last CommandCounterIncrement.
This should result in a significant improvement in the number of "commands"
that can typically be performed within a transaction before hitting the 2^32
CommandId size limit. In particular this buys back (and more) the possible
adverse consequences of my previous patch to fix plan caching behavior.
The implementation requires tracking whether the current CommandCounter
value has been "used" to mark any tuples. CommandCounter values stored into
snapshots are presumed not to be used for this purpose. This requires some
small executor changes, since the executor used to conflate the curcid of
the snapshot it was using with the command ID to mark output tuples with.
Separating these concepts allows some small simplifications in executor APIs.
Something for the TODO list: look into having CommandCounterIncrement not do
AcceptInvalidationMessages. It seems fairly bogus to be doing it there,
but exactly where to do it instead isn't clear, and I'm disinclined to mess
with asynchronous behavior during late beta.
|
|
|
|
|
|
|
|
|
| |
GetMemoryChunkSpace, not just the palloc request size. This brings the
allocatedMemory counter close enough to reality (as measured by
MemoryContextStats printouts) that I think we can get rid of the arbitrary
factor-of-2 adjustment that was put into the code initially. Given the
sensitivity of GIN build to work memory size, not using as much of work
memory as we're allowed to seems a pretty bad idea.
|
|
|
|
|
|
|
| |
it failed for splits of non-leaf pages because in such pages the first
data key on a page is suppressed, and so we can't just copy the first
key from the right page to reconstitute the left page's high key.
Problem found by Koichi Suzuki, patch by Heikki.
|
| |
|
|
|
|
|
| |
same line; previous fix was only partial. Re-run pgindent on files
that need it.
|
|
|
|
| |
avoid this problem in the future.)
|
|
|
|
| |
appear in the configuration file.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
checkpoint. This guards against an unlikely data-loss scenario in which
we re-use the relfilenode, then crash, then replay the deletion and
recreation of the file. Even then we'd be OK if all insertions into the
new relation had been WAL-logged ... but that's not guaranteed given all
the no-WAL-logging optimizations that have recently been added.
Patch by Heikki Linnakangas, per a discussion last month.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
even in code paths where we don't pay any subsequent attention to the typmod
value. This seems needed in view of the fact that 8.3's generalized typmod
support will accept a lot of bogus syntax, such as "timestamp(foo)" or
"record(int, 42)" --- if we allow such things to pass without comment,
users will get confused. Per a recent example from Greg Stark.
To implement this in a way that's not very vulnerable to future
bugs-of-omission, refactor the API of parse_type.c's TypeName lookup routines
so that typmod validation is folded into the base lookup operation. Callers
can still choose not to receive the encoded typmod, but we'll check the
decoration anyway if it's present.
|
|
|
|
| |
NOTICE.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ginRedoInsert(), because other ginRedo* functions rewrite whole page or
make changes which could be applied several times without consistent's loss
- Remove check of identifying of corresponding split record:
it's possible that replaying of WAL starts after actual page split, but before
removing of that split from incomplete splits list. In this case, that check
cause FATAL error.
Per stress test which reproduces bug reported by Craig McElroy
<craig.mcelroy@contegix.com>
|
|
|
|
|
|
|
|
| |
usage of any information from system catalog, because it could be called during
replay of WAL.
Per bug report from Craig McElroy <craig.mcelroy@contegix.com>. Patch doesn't
change on-disk storage.
|
|
|
|
|
|
|
|
|
| |
having several of them. Add two more flags: whether the process is
executing an ANALYZE, and whether a vacuum is for Xid wraparound (which
is obviously only set by autovacuum).
Sneakily move the worker's recently-acquired PostAuthDelay to a more useful
place.
|
|
|
|
|
| |
really change anything. Per report from Itagaki Takahiro. Fix by
Pavan Deolasee.
|
|
|
|
|
|
|
| |
when relkind = RELKIND_RELATION. This syncs these tests with the Asserts
in tuptoaster.c, and ensures that we won't ever try to, for example,
compress a sequence's tuple. Problem found by Greg Stark while stress-testing
with much-smaller-than-normal page sizes.
|
|
|
|
|
|
|
|
|
| |
has been consumed, recheck against the latest value of RedoRecPtr before
really sending the signal. This avoids useless checkpoint activity if
XLogWrite is executed when we have a very stale local copy of RedoRecPtr.
The potential for useless checkpoint is very much worse in 8.3 because of
the walwriter process (which never does XLogInsert), so while this behavior
was intentional, it needs to be changed. Per report from Itagaki Takahiro.
|
|
|
|
|
| |
instead of fix it, since once we've set toast_action[i] to 'p' it no longer
matters what toast_sizes[i] is. Greg Stark
|
|
|
|
|
|
|
|
|
|
|
| |
compiler --- at least on ARM, it does. I suspect that the varvarlena patch
has been creating larger-than-intended toast pointers all along on ARM,
but it wasn't exposed until the latest tweak added some Asserts that
calculated the expected size in a different way. We could probably have
fixed this by adding __attribute__((packed)) as is done for ItemPointerData,
but struct varattrib_pointer isn't really all that useful anyway, so it
seems cleanest to just get rid of it and have only struct varattrib_1b_e.
Per results from buildfarm member quagga.
|
|
|
|
|
|
|
| |
explicitly. This means a TOAST pointer takes 18 bytes instead of 17 --- still
smaller than in 8.2 --- which seems a good tradeoff to ensure we won't have
painted ourselves into a corner if we want to support multiple types of TOAST
pointer later on. Per discussion with Greg Stark.
|
|
|
|
|
|
| |
while the restore_command does its thing, then 'recovering XXX' while
processing the segment file. These operations are heavyweight enough
that an extra PS display set shouldn't bother anyone.
|
|
|
|
|
| |
process' PS display. After a suggestion by Simon (not exactly his
patch though).
|
|
|
|
|
|
|
| |
recovery stop time was used. This avoids a corner-case risk of trying to
overwrite an existing archived copy of the last WAL segment, and seems
simpler and cleaner all around than the original definition. Per example
from Jon Colverson and subsequent analysis by Simon.
|
|
|
|
|
|
| |
decompression of an already-compressed external value when we have to copy
it; save a few cycles when a value is too short for compression; and
annotate various lines that are currently unreachable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- create a separate archive_mode GUC, on which archive_command is dependent
- %r option in recovery.conf sends last restartpoint to recovery command
- %r used in pg_standby, updated README
- minor other code cleanup in pg_standby
- doc on Warm Standby now mentions pg_standby and %r
- log_restartpoints recovery option emits LOG message at each restartpoint
- end of recovery now displays last transaction end time, as requested
by Warren Little; also shown at each restartpoint
- restart archiver if needed to carry away WAL files at shutdown
Simon Riggs
|
|
|
|
|
|
| |
to not cause needless copying of text datums that have 1-byte headers.
Greg Stark, in response to performance gripe from Guillaume Smet and
ITAGAKI Takahiro.
|
|
|
|
|
|
| |
unpruned XMAX in its header. At the cost of 4 bytes per page, this keeps us
from performing heap_page_prune when there's no chance of pruning anything.
Seems to be necessary per Heikki's preliminary performance testing.
|