| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
| |
This ensures that a standby such as pg_receivexlog will not be selected
as sync standby - which would cause the master to block waiting for
a location that could never happen.
Fujii Masao
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Per bug #6593, REASSIGN OWNED fails when the affected role has created
an extension. Even though the user related to the extension is not
nominally the owner, its OID appears on pg_shdepend and thus causes
problems when the user is to be dropped.
This commit adds code to change the "ownership" of the extension itself,
not of the contained objects. This is fine because it's currently only
called from REASSIGN OWNED, which would also modify the ownership of the
contained objects. However, this is not sufficient for a working ALTER
OWNER implementation extension.
Back-patch to 9.1, where extensions were introduced.
Bug #6593 reported by Emiliano Leporati.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When (re) loading the typcache comparison cache for an enum type's values,
use an up-to-date MVCC snapshot, not the transaction's existing snapshot.
This avoids problems if we encounter an enum OID that was created since our
transaction started. Per report from Andres Freund and diagnosis by Robert
Haas.
To ensure this is safe even if enum comparison manages to get invoked
before we've set a transaction snapshot, tweak GetLatestSnapshot to
redirect to GetTransactionSnapshot instead of throwing error when
FirstSnapshotSet is false. The existing uses of GetLatestSnapshot (in
ri_triggers.c) don't care since they couldn't be invoked except in a
transaction that's already done some work --- but it seems just conceivable
that this might not be true of enums, especially if we ever choose to use
enums in system catalogs.
Note that the comparable coding in enum_endpoint and enum_range_internal
remains GetTransactionSnapshot; this is perhaps debatable, but if we
changed it those functions would have to be marked volatile, which doesn't
seem attractive.
Back-patch to 9.1 where ALTER TYPE ADD VALUE was added.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a CHECK constraint or index definition contained a whole-row Var (that
is, "table.*"), an attempt to copy that definition via CREATE TABLE LIKE or
table inheritance produced incorrect results: the copied Var still claimed
to have the rowtype of the source table, rather than the created table.
For the LIKE case, it seems reasonable to just throw error for this
situation, since the point of LIKE is that the new table is not permanently
coupled to the old, so there's no reason to assume its rowtype will stay
compatible. In the inheritance case, we should ideally allow such
constraints, but doing so will require nontrivial refactoring of CREATE
TABLE processing (because we'd need to know the OID of the new table's
rowtype before we adjust inherited CHECK constraints). In view of the lack
of previous complaints, that doesn't seem worth the risk in a back-patched
bug fix, so just make it throw error for the inheritance case as well.
Along the way, replace change_varattnos_of_a_node() with a more robust
function map_variable_attnos(), which is capable of being extended to
handle insertion of ConvertRowtypeExpr whenever we get around to fixing
the inheritance case nicely, and in the meantime it returns a failure
indication to the caller so that a helpful message with some context can be
thrown. Also, this code will do the right thing with subselects (if we
ever allow them in CHECK or indexes), and it range-checks varattnos before
using them to index into the map array.
Per report from Sergey Konoplev. Back-patch to all supported branches.
|
|
|
|
|
|
|
| |
This bug was introduced by commit 20d98ab6e4110087d1816cd105a40fcc8ce0a307,
so backpatch this to 9.0-9.2 like that one.
This fixes bug #6710, reported by Tarvi Pillessaar
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LISTEN/NOTIFY subsystem got confused if SimpleLruZeroPage failed,
which would typically happen as a result of a write() failure while
attempting to dump a dirty pg_notify page out of memory. Subsequently,
all attempts to send more NOTIFY messages would fail with messages like
"Could not read from file "pg_notify/nnnn" at offset nnnnn: Success".
Only restarting the server would clear this condition. Per reports from
Kevin Grittner and Christoph Berg.
Back-patch to 9.0, where the problem was introduced during the
LISTEN/NOTIFY rewrite.
|
|
|
|
|
|
|
|
|
|
|
| |
Repeated execution of an uncorrelated ARRAY_SUBLINK sub-select (which
I think can only happen if the sub-select is embedded in a larger,
correlated subquery) would leak memory for the duration of the query,
due to not reclaiming the array generated in the previous execution.
Per bug #6698 from Armando Miraglia. Diagnosis and fix idea by Heikki,
patch itself by me.
This has been like this all along, so back-patch to all supported versions.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This prevents a pg_basebackup backup session that just does a base
backup (no xlog involved at all) from becoming the synchronous slave
and thus blocking all access while it runs.
Also fixes the problem when a higher priority slave shows up it would
become the sync standby before it has reached the STREAMING state, by
making sure we can only switch to a walsender that's actually STREAMING.
Fujii Masao
|
|
|
|
|
|
|
|
|
|
| |
When HS startup is deferred because of overflowed subtransactions, ensure
that we re-initialize KnownAssignedXids for when both existing and incoming
snapshots have non-zero qualifying xids.
Fixes bug #6661 reported by Valentine Gogichashvili.
Analysis and fix by Andres Freund
|
|
|
|
|
|
|
|
|
| |
WALSender now woken up after each background flush by WALwriter, avoiding
multi-second replication delay for an all-async commit workload.
Replication delay reduced from 7s with default settings to 200ms, allowing
significantly reduced data loss at failover.
Andres Freund and Simon Riggs
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we allowed read-only transactions to skip assigning XIDs
we introduced the possibility that a fully deleted btree page
could be reused. This broke the index link sequence which could
then lead to indexscans silently returning fewer rows than would
have been correct. The actual incidence of silent errors from
this is thought to be very low because of the exact workload
required and locking pre-conditions. Fix is to remove pages only
if index page opaque->btpo.xact precedes RecentGlobalXmin.
Noah Misch, reviewed by Simon Riggs
|
| |
|
|
|
|
|
|
|
|
| |
Per discussion, it does not seem like a good idea to change the behavior of
age(xid) in a minor release, even though the old definition causes the
function to fail on hot standby slaves. Therefore, revert commit
5829387381d2e4edf84652bb5a712f6185860670 and follow-on commits in the back
branches only.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's not very sensible to set such attributes on a handler function;
but if one were to do so, fmgr.c went into infinite recursion because
it would call fmgr_security_definer instead of the handler function proper.
There is no way for fmgr_security_definer to know that it ought to call the
handler and not the original function referenced by the FmgrInfo's fn_oid,
so it tries to do the latter, causing the whole process to start over
again.
Ordinarily such misconfiguration of a procedural language's handler could
be written off as superuser error. However, because we allow non-superuser
database owners to create procedural languages and the handler for such a
language becomes owned by the database owner, it is possible for a database
owner to crash the backend, which ideally shouldn't be possible without
superuser privileges. In 9.2 and up we will adjust things so that the
handler functions are always owned by superusers, but in existing branches
this is a minor security fix.
Problem noted by Noah Misch (after several of us had failed to detect
it :-(). This is CVE-2012-2655.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to only allow offsets less than +/-13 hours, then it was +/14,
then it was +/-15. That's still not good enough though, as per today's bug
report from Patric Bechtel. This time I actually looked through the Olson
timezone database to find the largest offsets used anywhere. The winners
are Asia/Manila, at -15:56:00 until 1844, and America/Metlakatla, at
+15:13:42 until 1867. So we'd better allow offsets less than +/-16 hours.
Given the history, we are way overdue to have some greppable #define
symbols controlling this, so make some ... and also remove an obsolete
comment that didn't get fixed the last time.
Back-patch to all supported branches.
|
|
|
|
|
|
|
|
|
|
|
|
| |
AbortOutOfAnyTransaction failed to do anything if the state it saw on
entry corresponded to failing partway through StartTransaction. I fixed
AbortCurrentTransaction to cope with that case way back in commit
60b2444cc3ba037630c9b940c3c9ef01b954b87b, but evidently overlooked that
AbortOutOfAnyTransaction should do likewise.
Back-patch to all supported branches. It's not clear that this omission
has any more-than-cosmetic consequences, but it's also not clear that it
doesn't, so back-patching seems the least risky choice.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only interesting-for-performance case wherein we force heapscan here
is when we're rebuilding the relcache init file, and the only such case
that is likely to be examining a catalog big enough to be syncscanned is
RelationBuildTupleDesc. But the early-exit optimization in that code gets
broken if we start the scan at a random place within the catalog, so that
allowing syncscan is actually a big deoptimization if pg_attribute is large
(at least for the normal case where the rows for core system catalogs have
never been changed since initdb). Hence, prevent syncscan here. Per my
testing pursuant to complaints from Jeff Frost and Greg Sabino Mullane,
though neither of them seem to have actually hit this specific problem.
Back-patch to 8.3, where syncscan was introduced.
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, casts to name could generate invalidly-encoded results.
Also, make these functions match namein() more exactly, by consistently
using palloc0() instead of ad-hoc zeroing code.
Back-patch to all supported branches.
Karl Schnaitter and Tom Lane
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a seqscan encounters many consecutive pages containing only dead tuples,
it can remain in the loop in heapgettup for a long time, and there was no
CHECK_FOR_INTERRUPTS anywhere in that loop. This meant there were
real-world situations where a query would be effectively uncancelable for
long stretches. Add a check placed to occur once per page, which should be
enough to provide reasonable response time without adding any measurable
overhead.
Report and patch by Merlin Moncure (though I tweaked it a bit).
Back-patch to all supported branches.
|
|
|
|
|
|
|
|
| |
When the column name is an unqualified name, rather than table.column,
the error message complains about too many dotted names, which is
wrong. Report by Peter Eisentraut based on examination of the
sepgsql regression test output, but the problem also affects COMMENT.
New wording as suggested by Tom Lane.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no reason to do this as early as possible in postmaster startup,
and good reason not to do it until we have completely created the
postmaster's lock file, namely that it might contribute to pg_ctl thinking
that postmaster startup has timed out. (This would require a rather
unusual amount of time to be spent scanning temp file directories, but we
have at least one field report of it happening reproducibly.)
Back-patch to 9.1. Before that, pg_ctl didn't wait for additional info to
be added to the lock file, so it wasn't a problem.
Note that this is not a complete fix to the slow-start issue in 9.1,
because we still had identify_system_timezone being run during postmaster
start in 9.1. But that's at least a reasonably well-defined delay, with
an easy workaround if needed, whereas the temp-files scan is not so
predictable and cannot be avoided.
|
|
|
|
|
| |
We were using memcpy() to copy to a possibly overlapping memory region,
which is a no-no. Use memmove() instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation. However, we forgot that the pg_tblspc
symlink might still be there. We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.
There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.
Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
|
| |
|
|
|
|
| |
Fixes bug #6635, reported by Akira Kurosawa.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When inserting the downlinks for a split gist page, we used hold the locks
on the child pages until the insertion into the parent - and recursively its
parent if it had to be split too - were all completed. Change that so that
the locks on child pages are released after the insertion in the immediate
parent is done, before recursing further up the tree.
This reduces the number of lwlocks that are held simultaneously. Holding
many locks is bad for concurrency, and in extreme cases you can even hit
the limit of 100 simultaneously held lwlocks in a backend. If you're really
unlucky, you can hit the limit while in a critical section, which brings
down the whole system.
This fixes bug #6629 reported by Tom Forbes. Backpatch to 9.1. The page
splitting code was rewritten in 9.1, and the old code did not have this
problem.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original coding failed to reset ImmediateInterruptOK before returning,
which would potentially allow a subsequent query-cancel interrupt to be
accepted at an unsafe point. This is a really nasty bug since it's so hard
to predict the consequences, but they could be unpleasant.
Also, ensure that signal handlers are serviced before this function
returns, even if the semaphore is already set. This should make the
behavior more like Unix.
Back-patch to all supported versions.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 62c7bd31c8878dd45c9b9b2429ab7a12103f3590 had assorted problems, most
visibly that it broke PREPARE TRANSACTION in the presence of session-level
advisory locks (which should be ignored by PREPARE), as per a recent
complaint from Stephen Rees. More abstractly, the patch made the
LockMethodData.transactional flag not merely useless but outright
dangerous, because in point of fact that flag no longer tells you anything
at all about whether a lock is held transactionally. This fix therefore
removes that flag altogether. We now rely entirely on the convention
already in use in lock.c that transactional lock holds must be owned by
some ResourceOwner, while session holds are never so owned. Setting the
locallock struct's owner link to NULL thus denotes a session hold, and
there is no redundant marker for that.
PREPARE TRANSACTION now works again when there are session-level advisory
locks, and it is also able to transfer transactional advisory locks to the
prepared transaction, but for implementation reasons it throws an error if
we hold both types of lock on a single lockable object. Perhaps it will be
worth improving that someday.
Assorted other minor cleanup and documentation editing, as well.
Back-patch to 9.1, except that in the 9.1 branch I did not remove the
LockMethodData.transactional flag for fear of causing an ABI break for
any external code that might be examining those structs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally whole-row Vars are printed as "tabname.*". However, that does not
work at top level of a targetlist, because per SQL standard the parser will
think that the "*" should result in column-by-column expansion; which is
not at all what a whole-row Var implies. We used to just print the table
name in such cases, which works most of the time; but it fails if the table
name matches a column name available anywhere in the FROM clause. This
could lead for instance to a view being interpreted differently after dump
and reload. Adding parentheses doesn't fix it, but there is a reasonably
simple kluge we can use instead: attach a no-op cast, so that the "*" isn't
syntactically at top level anymore. This makes the printing of such
whole-row Vars a lot more consistent with other Vars, and may indeed fix
more cases than just the reported one; I'm suspicious that cases involving
schema qualification probably didn't work properly before, either.
Per bug report and fix proposal from Abbas Butt, though this patch is quite
different in detail from his.
Back-patch to all supported versions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If it fails to open a new log file, the syslogger assumes there's something
wrong with its parameters (such as log_directory), and stops attempting
automatic time-based or size-based log file rotations. Sending it SIGHUP
is supposed to start that up again. However, the original coding for that
was really bogus, involving clobbering a couple of GUC variables and hoping
that SIGHUP processing would restore them. Get rid of that technique in
favor of maintaining a separate flag showing we've turned rotation off.
Per report from Mark Kirkwood.
Also, the syslogger will automatically attempt to create the log_directory
directory if it doesn't exist, but that was only happening at startup.
For consistency and ease of use, it should do the same whenever the value
of log_directory is changed by SIGHUP.
Back-patch to all supported branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
setrefs.c failed to do "rtoffset" adjustment of Vars in RETURNING lists,
which meant they were left with the wrong varnos when the RETURNING list
was in a subquery. That was never possible before writable CTEs, of
course, but now it's broken. The executor fails to notice any problem
because ExecEvalVar just references the ecxt_scantuple for any normal
varno; but EXPLAIN breaks when the varno is wrong, as illustrated in a
recent complaint from Bartosz Dmytrak.
Since the eventual rtoffset of the subquery is not known at the time
we are preparing its plan node, the previous scheme of executing
set_returning_clause_references() at that time cannot handle this
adjustment. Fortunately, it turns out that we don't really need to do it
that way, because all the needed information is available during normal
setrefs.c execution; we just have to dig it out of the ModifyTable node.
So, do that, and get rid of the kluge of early setrefs processing of
RETURNING lists. (This is a little bit of a cheat in the case of inherited
UPDATE/DELETE, because we are not passing a "root" struct that corresponds
exactly to what the subplan was built with. But that doesn't matter, and
anyway this is less ugly than early setrefs processing was.)
Back-patch to 9.1, where the problem became possible to hit.
|
|
|
|
| |
Noah Misch
|
|
|
|
|
|
|
|
| |
When using synchronous replication, we waited for the commit record to be
replicated, but if we our transaction didn't write any other WAL records,
that's not required because we don't even flush the WAL locally to disk in
that case. This lead to long waits when committing a transaction that only
modified a temporary table. Bug spotted by Thom Brown.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cost_index tries to estimate the per-tuple costs of evaluating filter
conditions (a/k/a qpquals) by subtracting the estimated cost of the
indexqual conditions from that of the baserestrictinfo conditions. This is
correct so long as the indexquals list is a subset of the baserestrictinfo
list. However, in the presence of derived indexable conditions it's
completely wrong, leading to bogus or even negative scan cost estimates,
as seen for example in bug #6579 from Istvan Endredy. In practice the
problem isn't severe except in the specific case of a LIKE optimization on
a functional index containing a very expensive function.
A proper fix for this might change cost estimates by more than people would
like for stable branches, so in the back branches let's just clamp the cost
difference to be not less than zero. That will at least prevent completely
insane behavior, while not changing the results normally.
|
|
|
|
|
|
| |
This aligns 9.1's behavior with that of older branches. HEAD is now even
laxer, ignoring missing schemas all the time, but that seems like too big
a change for a released branch. Per complaint from Robert Haas.
|
|
|
|
|
|
|
|
|
| |
estimate_num_groups() gets unhappy with
create table empty();
select * from empty except select * from empty e2;
I can't see any actual use-case for such a query (and the table is illegal
per SQL spec), but it seems like a good idea that it not cause an assert
failure.
|
|
|
|
|
|
| |
This was a thinko in previous commit. Now that stack base pointer is now set
in PostmasterMain and SubPostmasterMain, it doesn't need to be set in
PostgresMain anymore.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to only initialize the stack base pointer when starting up a regular
backend, not in other processes. In particular, autovacuum workers can run
arbitrary user code, and without stack-depth checking, infinite recursion
in e.g an index expression will bring down the whole cluster.
The comment about PL/Java using set_stack_base() is not yet true. As the
code stands, PL/java still modifies the stack_base_ptr variable directly.
However, it's been discussed in the PL/Java mailing list that it should be
changed to use the function, because PL/Java is currently oblivious to the
register stack used on Itanium. There's another issues with PL/Java, namely
that the stack base pointer it sets is not really the base of the stack, it
could be something close to the bottom of the stack. That's a separate issue
that might need some further changes to this code, but that's a different
story.
Backpatch to all supported releases.
|
|
|
|
|
|
|
|
|
|
|
| |
XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed
with a list link field labeled as "blkno", which was confusing, especially
when the link was empty (InvalidBlockNumber). Print the metapage block
number instead, since that's what's actually being updated. We could
include the link values too as a separate field, but not clear it's worth
the trouble.
Back-patch to 8.4 where the dubious code was added.
|
|
|
|
|
|
|
|
|
| |
The original coding of the syslogger had an arbitrary limit of 20 large
messages concurrently in progress, after which it would just punt and dump
message fragments to the output file separately. Our ambitions are a bit
higher than that now, so allow the data structure to expand as necessary.
Reported and patched by Andrew Dunstan; some editing by Tom
|
|
|
|
|
|
|
|
|
| |
Initialise ckptXidEpoch from starting checkpoint and maintain the correct
value as we roll forwards. This allows GetNextXidAndEpoch() to return the
correct epoch when executed during recovery. Backpatch to 9.0 when the
problem is first observable by a user.
Bug report from Daniel Farina
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The COPY documentation says "COPY FROM matches the input against the null
string before removing backslashes". It is therefore reasonable to presume
that null markers like E'\\0' will work ... and they did, until someone put
the tests in the wrong order during microoptimization-driven rewrites.
Since then, we've been failing if the null marker is something that would
de-escape to an invalidly-encoded string. Since null markers generally
need to be something that can't appear in the data, this represents a
nontrivial loss of functionality; surprising nobody noticed it earlier.
Per report from Jeff Davis. Backpatch to 8.4 where this got broken.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For some reason, in the original coding of the PlaceHolderVar mechanism
I had supposed that PlaceHolderVars couldn't propagate into subqueries.
That is of course entirely possible. When it happens, we need to treat
an outer-level PlaceHolderVar much like an outer Var or Aggref, that is
SS_replace_correlation_vars() needs to replace the PlaceHolderVar with
a Param, and then when building the finished SubPlan we have to provide
the PlaceHolderVar expression as an actual parameter for the SubPlan.
The handling of the contained expression is a bit delicate but it can be
treated exactly like an Aggref's expression.
In addition to the missing logic in subselect.c, prepjointree.c was failing
to search subqueries for PlaceHolderVars that need their relids adjusted
during subquery pullup. It looks like everyplace else that touches
PlaceHolderVars got it right, though.
Per report from Mark Murawski. In 9.1 and HEAD, queries affected by this
oversight would fail with "ERROR: Upper-level PlaceHolderVar found where
not expected". But in 9.0 and 8.4, you'd silently get possibly-wrong
answers, since the value transmitted into the subquery wouldn't go to null
when it should.
|
|
|
|
| |
Per compiler warnings on buildfarm member black_firefly.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was never intended to be allowed, and is blocked for an ordinary
CREATE TABLE, but CREATE TABLE AS slipped through the cracks. This
commit won't do anything to fix existing cases where this has loophole
has been exploited, but it still seems prudent to lock it down going
forward.
Back-branch commit only, as this problem has been refactored away
on the master branch.
Andres Freund
|
|
|
|
|
|
|
|
|
|
|
| |
The problem was that ResetLatch was not being called in the walsender loop
if the connection was terminated, so WaitLatch never sleeps until the
terminated connection is detected. In the master-branch, this was already
fixed as a side-effect of some refactoring of the loop. This commit
backports that refactoring to 9.1. 9.0 does not have this bug, because we
didn't use latches back then.
Fujii Masao
|