| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
| |
failed (due to lock conflicts or out-of-space). We might have already
extended the index's filesystem EOF before failing, causing the EOF to be
beyond what the metapage says is the last used page. Hence the invariant
maintained by the code needs to be "EOF is at or beyond last used page",
not "EOF is exactly the last used page". Problem was created by my patch
of 2006-11-19 that attempted to repair bug #2737. Since that was
back-patched to 7.4, this needs to be as well. Per report and test case
from Vlastimil Krejcir.
|
|
|
|
|
| |
and with insufficient paranoia in code that follows t_ctid links.
This patch covers the 7.4 branch.
|
|
|
|
| |
CLUSTER failure after ALTER TABLE SET WITHOUT OIDS.
|
|
|
|
|
|
|
|
|
|
| |
protocol, per report from Igor Shevchenko. NOTIFY thought it could
do its thing if transaction blockState is TBLOCK_DEFAULT, but in
reality it had better check the low-level transaction state is
TRANS_DEFAULT as well. Formerly it was not possible to wait for the
client in a state where the first is true and the second is not ...
but now we can have such a state. Minor cleanup in StartTransaction()
as well.
|
|
|
|
|
|
|
| |
discussion on pgsql-hackers: in READ COMMITTED mode we just have to force
a QuerySnapshot update in the trigger, but in SERIALIZABLE mode we have
to run the scan under a current snapshot and then complain if any rows
would be updated/deleted that are not visible in the transaction snapshot.
|
|
|
|
|
|
|
|
|
|
| |
invalid (has the wrong magic number) until the build is entirely
complete. This turns out to cost no additional writes in the normal
case, since we were rewriting the metapage at the end of the process
anyway. In normal scenarios there's no real gain in security, because
a failed index build would roll back the transaction leaving an unused
index file, but for rebuilding shared system indexes this seems to add
some useful protection.
|
|
|
|
|
|
| |
post-abort cleanup hooks. I'm surprised that we have not needed this
already, but I need it now to fix a plpgsql problem, and the usefulness
for other dynamically loaded modules seems obvious.
|
|
|
|
|
|
| |
to allow es_snapshot to be set to SnapshotNow rather than a query snapshot.
This solves a bug reported by Wade Klaver, wherein triggers fired as a
result of RI cascade updates could misbehave.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
really general fix might be difficult, I believe the only case where
AtCommit_Notify could see an uncommitted tuple is where the other guy
has just unlistened and not yet committed. The best solution seems to
be to just skip updating that tuple, on the assumption that the other
guy does not want to hear about the notification anyway. This is not
perfect --- if the other guy rolls back his unlisten instead of committing,
then he really should have gotten this notify. But to do that, we'd have
to wait to see if he commits or not, or make UNLISTEN hold exclusive lock
on pg_listener until commit. Either of these answers is deadlock-prone,
not to mention horrible for interactive performance. Do it this way
for now. (What happened to that project to do LISTEN/NOTIFY in memory
with no table, anyway?)
|
|
|
|
|
|
|
|
| |
pghackers. This fixes the problem recently reported by Markus KrÌutner
(hash bucket split corrupts the state of scans being done concurrently),
and I believe it also fixes all the known problems with deadlocks in
hash index operations. Hash indexes are still not really ready for prime
time (since they aren't WAL-logged), but this is a step forward.
|
|
|
|
|
|
|
|
|
| |
layout; therefore, this change forces REINDEX of hash indexes (though
not a full initdb). Widen hashm_ntuples to double so that hash space
management doesn't get confused by more than 4G entries; enlarge the
allowed number of free-space-bitmap pages; replace the useless bshift
field with a useful bmshift field; eliminate 4 bytes of wasted space
in the per-page special area.
|
|
|
|
|
|
|
| |
scheme. A pleasant side effect is that it is *much* faster when deleting
a large fraction of the indexed tuples, because of elimination of
redundant hash_step activity induced by hash_adjscans. Various other
continuing code cleanup.
|
|
|
|
|
|
|
|
|
| |
yet). Fix a couple of bugs that would only appear if multiple bitmap pages
are used, including a buffer reference leak and incorrect computation of bit
indexes. Get rid of 'overflow address' concept, which accomplished nothing
except obfuscating the code and creating a risk of failure due to limited
range of offset field. Rename some misleadingly-named fields and routines,
and improve documentation.
|
|
|
|
|
|
|
| |
target columns in INSERT and UPDATE targetlists. Don't rely on resname
to be accurate in ruleutils, either. This fixes bug reported by
Donald Fraser, in which renaming a column referenced in a rule did not
work very well.
|
| |
|
|
|
|
| |
so it won't miss 'em again.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
specific hash functions used by hash indexes, rather than the old
not-datatype-aware ComputeHashFunc routine. This makes it safe to do
hash joining on several datatypes that previously couldn't use hashing.
The sets of datatypes that are hash indexable and hash joinable are now
exactly the same, whereas before each had some that weren't in the other.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
least-recently-used strategy from clog.c into slru.c. It doesn't
change any visible behaviour and passes all regression tests plus a
TruncateCLOG test done manually.
Apart from refactoring I made a little change to SlruRecentlyUsed,
formerly ClogRecentlyUsed: It now skips incrementing lru_counts, if
slotno is already the LRU slot, thus saving a few CPU cycles. To make
this work, lru_counts are initialised to 1 in SimpleLruInit.
SimpleLru will be used by pg_subtrans (part of the nested transactions
project), so the main purpose of this patch is to avoid future code
duplication.
Manfred Koizar
|
|
|
|
|
|
| |
only remnant of this failed experiment is that the server will take
SET AUTOCOMMIT TO ON. Still TODO: provide some client-side autocommit
logic in libpq.
|
|
|
|
|
| |
but that was enough tedium for one day. Along the way, move the few
support routines for types xid and cid into a more logical place.
|
|
|
|
|
|
|
| |
handle multiple 'formats' for data I/O. Restructure CommandDest and
DestReceiver stuff one more time (it's finally starting to look a bit
clean though). Code now matches latest 3.0 protocol document as far
as message formats go --- but there is no support for binary I/O yet.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DestReceiver pointers instead of just CommandDest values. The DestReceiver
is made at the point where the destination is selected, rather than
deep inside the executor. This cleans up the original kluge implementation
of tstoreReceiver.c, and makes it easy to support retrieving results
from utility statements inside portals. Thus, you can now do fun things
like Bind and Execute a FETCH or EXPLAIN command, and it'll all work
as expected (e.g., you can Describe the portal, or use Execute's count
parameter to suspend the output partway through). Implementation involves
stuffing the utility command's output into a Tuplestore, which would be
kind of annoying for huge output sets, but should be quite acceptable
for typical uses of utility commands.
|
|
|
|
|
|
|
|
|
| |
the column by table OID and column number, if it's a simple column
reference. Along the way, get rid of reskey/reskeyop fields in Resdoms.
Turns out that representation was not convenient for either the planner
or the executor; we can make the planner deliver exactly what the
executor wants with no more effort.
initdb forced due to change in stored rule representation.
|
|
|
|
| |
Only lightly tested as yet, since libpq doesn't know anything about 'em.
|
|
|
|
|
|
| |
for tableID/columnID in RowDescription. (The latter isn't really
implemented yet though --- the backend always sends zeroes, and libpq
just throws away the data.)
|
|
|
|
|
|
| |
end of a btree index. This isn't super-effective, since we won't move
nondeletable pages, but it's better than nothing. Also, improve stats
displayed during VACUUM VERBOSE.
|
|
|
|
|
|
| |
deleting multiple index entries on a single index page. This makes for
a very substantial reduction in the amount of WAL traffic during a
large delete operation.
|
| |
|
|
|
|
| |
to fix, but it seems to basically work...
|
|
|
|
|
|
|
|
| |
now knows what to do upon hitting a dead page (in theory anyway, it's
untested...). Add a post-VACUUM-cleanup entry point for index AMs, to
provide a place for dead-page scavenging to happen.
Also, fix oversight that broke btpo_prev links in temporary indexes.
initdb forced due to additions in pg_am.
|
|
|
|
|
|
|
|
|
|
|
| |
support btree compaction, as per proposal of a few days ago. btree index
pages no longer store parent links, instead they have a level indicator
(counting up from zero for leaf pages). The FixBTree recovery logic is
removed, and replaced by code that detects missing parent-level insertions
during WAL replay. Also, generate appropriate WAL entries when updating
btree metapage and when building a btree index from scratch. I believe
btree indexes are now completely WAL-legal for the first time.
initdb forced due to index and WAL changes.
|
|
|
|
|
|
|
|
|
| |
longer works -- IncrHeapAccessStat() didn't actually *do* anything
anymore, so no reason to keep it around AFAICS. I also fixed a
grammatical error in a comment.
Neil Conway
|
|
|
|
|
|
|
|
| |
that's selecting into a RECORD variable returns zero rows, make it
assign an all-nulls row to the RECORD; this is consistent with what
happens when the SELECT INTO target is not a RECORD. In support of
this, tweak the SPI code so that a valid tuple descriptor is returned
even when a SPI select returns no rows.
|
| |
|
|
|
|
|
|
| |
the index AM when we know we are fetching a unique row. However, this
logic did not consider the possibility that it would be asked to fetch
backwards. Also fix mark/restore to work correctly in this scenario.
|
|
|
|
| |
but do it correctly now.
|
|
|
|
| |
few WAL files.
|
| |
|
|
|
|
|
|
| |
that they'd get to commit immediately on finishing. There's now a
centralized routine PreventTransactionChain() that implements the
necessary tests.
|
|
|
|
|
|
| |
recent WAL activity has occurred. Without this, it's possible that a
later crash might leave tuples on disk with un-updated commit status
bits.
|
|
|
|
|
|
|
|
|
| |
VACUUM FULL tuple moves. Store full-width t_infomask in WAL, rather
than storing low 8 bits and expecting to be able to reconstruct upper
bits. While at it, remove redundant t_oid field from WAL headers
(the OID, if present, is now recorded in the data portion of the tuple).
WAL version number bumped --- this does not force an initdb, you can
instead run pg_resetxlog after a clean shutdown of the old postmaster.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
(overlaying low byte of page size) and add HEAP_HASOID bit to t_infomask,
per earlier discussion. Simplify scheme for overlaying fields in tuple
header (no need for cmax to live in more than one place). Don't try to
clear infomask status bits in tqual.c --- not safe to do it there. Don't
try to force output table of a SELECT INTO to have OIDs, either. Get rid
of unnecessarily complex three-state scheme for TupleDesc.tdhasoids, which
has already caused one recent failure. Improve documentation.
|
|
|
|
|
|
|
|
|
| |
to false provides more SQL-spec-compliant behavior than we had before.
I am not sure that setting it false is actually a good idea yet; there
is a lot of client-side code that will probably be broken by turning
autocommit off. But it's a start.
Loosely based on a patch by David Van Wie.
|
|
|
|
|
| |
* Remove wal_files postgresql.conf option because WAL files are
now recycled
|
|
|
|
|
|
|
|
|
|
| |
width types and varlena types, since with the introduction of CSTRING as
a more-or-less-real type, these concepts aren't identical. I've tried to
use varlena consistently to denote datatypes with typlen = -1, ie, they
have a length word and are potentially TOASTable; while the term variable
width covers both varlena and cstring (and, perhaps, someday other types
with other rules for computing the actual width). No code changes in this
commit except for renaming a couple macros.
|
|
|
|
|
|
|
| |
value '-2' is used to indicate a variable-width type whose width is
computed as strlen(datum)+1. Everything that looks at typlen is updated
except for array support, which Joe Conway is working on; at the moment
it wouldn't work to try to create an array of cstring.
|