| Commit message (Collapse) | Author | Age |
... | |
|
|
|
| |
Fixes bug #6635, reported by Akira Kurosawa.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When inserting the downlinks for a split gist page, we used hold the locks
on the child pages until the insertion into the parent - and recursively its
parent if it had to be split too - were all completed. Change that so that
the locks on child pages are released after the insertion in the immediate
parent is done, before recursing further up the tree.
This reduces the number of lwlocks that are held simultaneously. Holding
many locks is bad for concurrency, and in extreme cases you can even hit
the limit of 100 simultaneously held lwlocks in a backend. If you're really
unlucky, you can hit the limit while in a critical section, which brings
down the whole system.
This fixes bug #6629 reported by Tom Forbes. Backpatch to 9.1. The page
splitting code was rewritten in 9.1, and the old code did not have this
problem.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original coding failed to reset ImmediateInterruptOK before returning,
which would potentially allow a subsequent query-cancel interrupt to be
accepted at an unsafe point. This is a really nasty bug since it's so hard
to predict the consequences, but they could be unpleasant.
Also, ensure that signal handlers are serviced before this function
returns, even if the semaphore is already set. This should make the
behavior more like Unix.
Back-patch to all supported versions.
|
|
|
|
|
|
| |
Show a proper error message instead of a SQL error.
Josh Kupershmidt
|
|
|
|
|
|
| |
out of memory when looping through large numbers of rows.
Flag the converted tuples to be freed. Complaint and patch
by Joe.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 62c7bd31c8878dd45c9b9b2429ab7a12103f3590 had assorted problems, most
visibly that it broke PREPARE TRANSACTION in the presence of session-level
advisory locks (which should be ignored by PREPARE), as per a recent
complaint from Stephen Rees. More abstractly, the patch made the
LockMethodData.transactional flag not merely useless but outright
dangerous, because in point of fact that flag no longer tells you anything
at all about whether a lock is held transactionally. This fix therefore
removes that flag altogether. We now rely entirely on the convention
already in use in lock.c that transactional lock holds must be owned by
some ResourceOwner, while session holds are never so owned. Setting the
locallock struct's owner link to NULL thus denotes a session hold, and
there is no redundant marker for that.
PREPARE TRANSACTION now works again when there are session-level advisory
locks, and it is also able to transfer transactional advisory locks to the
prepared transaction, but for implementation reasons it throws an error if
we hold both types of lock on a single lockable object. Perhaps it will be
worth improving that someday.
Assorted other minor cleanup and documentation editing, as well.
Back-patch to 9.1, except that in the 9.1 branch I did not remove the
LockMethodData.transactional flag for fear of causing an ABI break for
any external code that might be examining those structs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally whole-row Vars are printed as "tabname.*". However, that does not
work at top level of a targetlist, because per SQL standard the parser will
think that the "*" should result in column-by-column expansion; which is
not at all what a whole-row Var implies. We used to just print the table
name in such cases, which works most of the time; but it fails if the table
name matches a column name available anywhere in the FROM clause. This
could lead for instance to a view being interpreted differently after dump
and reload. Adding parentheses doesn't fix it, but there is a reasonably
simple kluge we can use instead: attach a no-op cast, so that the "*" isn't
syntactically at top level anymore. This makes the printing of such
whole-row Vars a lot more consistent with other Vars, and may indeed fix
more cases than just the reported one; I'm suspicious that cases involving
schema qualification probably didn't work properly before, either.
Per bug report and fix proposal from Abbas Butt, though this patch is quite
different in detail from his.
Back-patch to all supported versions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If it fails to open a new log file, the syslogger assumes there's something
wrong with its parameters (such as log_directory), and stops attempting
automatic time-based or size-based log file rotations. Sending it SIGHUP
is supposed to start that up again. However, the original coding for that
was really bogus, involving clobbering a couple of GUC variables and hoping
that SIGHUP processing would restore them. Get rid of that technique in
favor of maintaining a separate flag showing we've turned rotation off.
Per report from Mark Kirkwood.
Also, the syslogger will automatically attempt to create the log_directory
directory if it doesn't exist, but that was only happening at startup.
For consistency and ease of use, it should do the same whenever the value
of log_directory is changed by SIGHUP.
Back-patch to all supported branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before 9.1, PL/Python functions returning composite types could return
a string and it would be parsed using record_in. The 9.1 changes made
PL/Python only expect dictionaries, tuples, or objects supporting
getattr as output of composite functions, resulting in a regression
and a confusing error message, as the strings were interpreted as
sequences and the code for transforming lists to database tuples was
used. Fix this by treating strings separately as before, before
checking for the other types.
The reason why it's important to support string to database tuple
conversion is that trigger functions on tables with composite columns
get the composite row passed in as a string (from record_out).
Without supporting converting this back using record_in, this makes it
impossible to implement pass-through behavior for these columns, as
PL/Python no longer accepts strings for composite values.
A better solution would be to fix the code that transforms composite
inputs into Python objects to produce dictionaries that would then be
correctly interpreted by the Python->PostgreSQL counterpart code. But
that would be too invasive to backpatch to 9.1, and it is too late in
the 9.2 cycle to attempt it. It should be revisited in the future,
though.
Reported as bug #6559 by Kirill Simonov.
Jan Urbański
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
setrefs.c failed to do "rtoffset" adjustment of Vars in RETURNING lists,
which meant they were left with the wrong varnos when the RETURNING list
was in a subquery. That was never possible before writable CTEs, of
course, but now it's broken. The executor fails to notice any problem
because ExecEvalVar just references the ecxt_scantuple for any normal
varno; but EXPLAIN breaks when the varno is wrong, as illustrated in a
recent complaint from Bartosz Dmytrak.
Since the eventual rtoffset of the subquery is not known at the time
we are preparing its plan node, the previous scheme of executing
set_returning_clause_references() at that time cannot handle this
adjustment. Fortunately, it turns out that we don't really need to do it
that way, because all the needed information is available during normal
setrefs.c execution; we just have to dig it out of the ModifyTable node.
So, do that, and get rid of the kluge of early setrefs processing of
RETURNING lists. (This is a little bit of a cheat in the case of inherited
UPDATE/DELETE, because we are not passing a "root" struct that corresponds
exactly to what the subplan was built with. But that doesn't matter, and
anyway this is less ugly than early setrefs processing was.)
Back-patch to 9.1, where the problem became possible to hit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to rather sloppy thinking (on my part, I'm afraid) about the
appropriate behavior for boundary conditions, pg_next_dst_boundary() gave
undefined, platform-dependent results when the input time is exactly the
last recorded DST transition time for the specified time zone, as a result
of fetching values one past the end of its data arrays.
Change its specification to be that it always finds the next DST boundary
*after* the input time, and adjust code to match that. The sole existing
caller, DetermineTimeZoneOffset, doesn't actually care about this
distinction, since it always uses a probe time earlier than the instant
that it does care about. So it seemed best to me to change the API to make
the result=1 and result=0 cases more consistent, specifically to ensure
that the "before" outputs always describe the state at the given time,
rather than hacking the code to obey the previous API comment exactly.
Per bug #6605 from Sergey Burladyan. Back-patch to all supported versions.
|
| |
|
| |
|
|
|
|
| |
Noah Misch
|
|
|
|
|
|
|
|
|
| |
A number of utility programs were rather careless about paremeters
that can be set via both an option argument and a positional
argument. This leads to results which can violate the Principal
Of Least Astonishment. These changes refuse to use positional
arguments to override settings that have been made via positional
arguments. The changes are backpatched to all live branches.
|
|
|
|
|
|
|
|
| |
When using synchronous replication, we waited for the commit record to be
replicated, but if we our transaction didn't write any other WAL records,
that's not required because we don't even flush the WAL locally to disk in
that case. This lead to long waits when committing a transaction that only
modified a temporary table. Bug spotted by Thom Brown.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cost_index tries to estimate the per-tuple costs of evaluating filter
conditions (a/k/a qpquals) by subtracting the estimated cost of the
indexqual conditions from that of the baserestrictinfo conditions. This is
correct so long as the indexquals list is a subset of the baserestrictinfo
list. However, in the presence of derived indexable conditions it's
completely wrong, leading to bogus or even negative scan cost estimates,
as seen for example in bug #6579 from Istvan Endredy. In practice the
problem isn't severe except in the specific case of a LIKE optimization on
a functional index containing a very expensive function.
A proper fix for this might change cost estimates by more than people would
like for stable branches, so in the back branches let's just clamp the cost
difference to be not less than zero. That will at least prevent completely
insane behavior, while not changing the results normally.
|
|
|
|
|
|
| |
This aligns 9.1's behavior with that of older branches. HEAD is now even
laxer, ignoring missing schemas all the time, but that seems like too big
a change for a released branch. Per complaint from Robert Haas.
|
|
|
|
|
|
|
|
|
| |
estimate_num_groups() gets unhappy with
create table empty();
select * from empty except select * from empty e2;
I can't see any actual use-case for such a query (and the table is illegal
per SQL spec), but it seems like a good idea that it not cause an assert
failure.
|
|
|
|
|
|
| |
This was a thinko in previous commit. Now that stack base pointer is now set
in PostmasterMain and SubPostmasterMain, it doesn't need to be set in
PostgresMain anymore.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to only initialize the stack base pointer when starting up a regular
backend, not in other processes. In particular, autovacuum workers can run
arbitrary user code, and without stack-depth checking, infinite recursion
in e.g an index expression will bring down the whole cluster.
The comment about PL/Java using set_stack_base() is not yet true. As the
code stands, PL/java still modifies the stack_base_ptr variable directly.
However, it's been discussed in the PL/Java mailing list that it should be
changed to use the function, because PL/Java is currently oblivious to the
register stack used on Itanium. There's another issues with PL/Java, namely
that the stack base pointer it sets is not really the base of the stack, it
could be something close to the bottom of the stack. That's a separate issue
that might need some further changes to this code, but that's a different
story.
Backpatch to all supported releases.
|
|
|
|
|
|
|
|
|
|
|
| |
XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed
with a list link field labeled as "blkno", which was confusing, especially
when the link was empty (InvalidBlockNumber). Print the metapage block
number instead, since that's what's actually being updated. We could
include the link values too as a separate field, but not clear it's worth
the trouble.
Back-patch to 8.4 where the dubious code was added.
|
|
|
|
|
|
|
|
|
| |
The original coding of the syslogger had an arbitrary limit of 20 large
messages concurrently in progress, after which it would just punt and dump
message fragments to the output file separately. Our ambitions are a bit
higher than that now, so allow the data structure to expand as necessary.
Reported and patched by Andrew Dunstan; some editing by Tom
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Combining the loop workspace with the record of already-processed objects
might have been a cute trick, but it behaves horridly if there are many
dependency loops to repair: the time spent in the first step of findLoop()
grows as O(N^2). Instead use a separate flag array indexed by dump ID,
which we can check in constant time. The length of the workspace array
is now never more than the actual length of a dependency chain, which
should be reasonably short in all cases of practical interest. The code
is noticeably easier to understand this way, too.
Per gripe from Mike Roest. Since this is a longstanding performance bug,
backpatch to all supported versions.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The loop that matched owned sequences to their owning tables required time
proportional to number of owned sequences times number of tables; although
this work was only expended in selective-dump situations, which is probably
why the issue wasn't recognized long since. Refactor slightly so that we
can perform this work after the index array for findTableByOid has been
set up, reducing the time to O(M log N).
Per gripe from Mike Roest. Since this is a longstanding performance bug,
backpatch to all supported versions.
|
|
|
|
|
|
|
|
|
| |
Initialise ckptXidEpoch from starting checkpoint and maintain the correct
value as we roll forwards. This allows GetNextXidAndEpoch() to return the
correct epoch when executed during recovery. Backpatch to 9.0 when the
problem is first observable by a user.
Bug report from Daniel Farina
|
|
|
|
| |
Thomas Ogrisegg and Fujii Masao
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The COPY documentation says "COPY FROM matches the input against the null
string before removing backslashes". It is therefore reasonable to presume
that null markers like E'\\0' will work ... and they did, until someone put
the tests in the wrong order during microoptimization-driven rewrites.
Since then, we've been failing if the null marker is something that would
de-escape to an invalidly-encoded string. Since null markers generally
need to be something that can't appear in the data, this represents a
nontrivial loss of functionality; surprising nobody noticed it earlier.
Per report from Jeff Davis. Backpatch to 8.4 where this got broken.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For some reason, in the original coding of the PlaceHolderVar mechanism
I had supposed that PlaceHolderVars couldn't propagate into subqueries.
That is of course entirely possible. When it happens, we need to treat
an outer-level PlaceHolderVar much like an outer Var or Aggref, that is
SS_replace_correlation_vars() needs to replace the PlaceHolderVar with
a Param, and then when building the finished SubPlan we have to provide
the PlaceHolderVar expression as an actual parameter for the SubPlan.
The handling of the contained expression is a bit delicate but it can be
treated exactly like an Aggref's expression.
In addition to the missing logic in subselect.c, prepjointree.c was failing
to search subqueries for PlaceHolderVars that need their relids adjusted
during subquery pullup. It looks like everyplace else that touches
PlaceHolderVars got it right, though.
Per report from Mark Murawski. In 9.1 and HEAD, queries affected by this
oversight would fail with "ERROR: Upper-level PlaceHolderVar found where
not expected". But in 9.0 and 8.4, you'd silently get possibly-wrong
answers, since the value transmitted into the subquery wouldn't go to null
when it should.
|
|
|
|
| |
Per compiler warnings on buildfarm member black_firefly.
|
|
|
|
|
|
|
|
|
|
|
| |
An incorrect and entirely unnecessary "safety check" in exec_stmt_getdiag()
caused the code to treat an assignment to a variable with dno zero as a
no-op. Unfortunately, that's a perfectly valid dno. This has been broken
since GET DIAGNOSTICS was invented. It's not terribly surprising that the
bug went unnoticed for so long, since in most cases you probably wouldn't
use the function's first-created variable (normally its first parameter)
as a GET DIAGNOSTICS target. Nonetheless, it's broken. Per bug #6551
from Adam Buraczewski.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was never intended to be allowed, and is blocked for an ordinary
CREATE TABLE, but CREATE TABLE AS slipped through the cracks. This
commit won't do anything to fix existing cases where this has loophole
has been exploited, but it still seems prudent to lock it down going
forward.
Back-branch commit only, as this problem has been refactored away
on the master branch.
Andres Freund
|
|
|
|
|
|
|
|
|
|
|
| |
The problem was that ResetLatch was not being called in the walsender loop
if the connection was terminated, so WaitLatch never sleeps until the
terminated connection is detected. In the master-branch, this was already
fixed as a side-effect of some refactoring of the loop. This commit
backports that refactoring to 9.1. 9.0 does not have this bug, because we
didn't use latches back then.
Fujii Masao
|
|
|
|
|
|
|
|
|
|
| |
Failing to do so causes trigger invocation to fail when they are nested
within a function invocation that changes the current package.
Backpatch to 9.1; previous releases used a different method to obtain
_TD. Per bug report from Mark Murawski (bug #6511)
Author: Alex Hunsaker
|
|
|
|
|
|
|
|
|
| |
When converting source files, pg_regress' inputdir and outputdir options were
ignored when computing the locations of the destination files. In consequence,
these options were effectively unusable when the regression inputs need to
be adjusted by pg_regress. This patch makes pg_regress put the converted files
in the same place that these options specify non-converted input or results
files are to be found. Backpatched to all live branches.
|
|
|
|
| |
found by Coverity
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 57664ed25e5dea117158a2e663c29e60b3546e1c I tried to fix a bug
reported by Teodor Sigaev by making non-simple-Var output columns distinct
(by wrapping their expressions with dummy PlaceHolderVar nodes). This did
not work too well. Commit b28ffd0fcc583c1811e5295279e7d4366c3cae6c fixed
some ensuing problems with matching to child indexes, but per a recent
report from Claus Stadler, constraint exclusion of UNION ALL subqueries was
still broken, because constant-simplification didn't handle the injected
PlaceHolderVars well either. On reflection, the original patch was quite
misguided: there is no reason to expect that EquivalenceClass child members
will be distinct. So instead of trying to make them so, we should ensure
that we can cope with the situation when they're not.
Accordingly, this patch reverts the code changes in the above-mentioned
commits (though the regression test cases they added stay). Instead, I've
added assorted defenses to make sure that duplicate EC child members don't
cause any problems. Teodor's original problem ("MergeAppend child's
targetlist doesn't match MergeAppend") is addressed more directly by
revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort
list guide creation of each child's sort list.
In passing, get rid of add_sort_column; as far as I can tell, testing for
duplicate sort keys at this stage is dead code. Certainly it doesn't
trigger often enough to be worth expending cycles on in ordinary queries.
And keeping the test would've greatly complicated the new logic in
prepare_sort_from_pathkeys, because comparing pathkey list entries against
a previous output array requires that we not skip any entries in the list.
Back-patch to 9.1, like the previous patches. The only known issue in
this area that wasn't caused by the ill-advised previous patches was the
MergeAppend planning failure, which of course is not relevant before 9.1.
It's possible that we need some of the new defenses against duplicate child
EC entries in older branches, but until there's some clear evidence of that
I'm going to refrain from back-patching further.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dave Malcolm of Red Hat is working on a static code analysis tool for
Python-related C code. It reported a number of problems in plpython,
most of which were failures to check for NULL results from object-creation
functions, so would only be an issue in very-low-memory situations.
Patch in HEAD and 9.1. We could go further back but it's not clear that
these issues are important enough to justify the work.
Jan Urbański
|
|
|
|
|
|
|
| |
In a rare case, one byte past the end of memory belonging to the
sqlca_t structure would be written to.
found by Coverity
|
|
|
|
| |
found by Coverity
|
|
|
|
|
|
|
|
| |
Due to an apparent thinko, when printing a table in expanded mode
(\x), space would be allocated for 1 slot plus 1 byte per line,
instead of 1 slot per line plus 1 slot for the NULL terminator. When
the line count is small, reading or writing the terminator would
therefore access memory beyond what was allocated.
|
|
|
|
|
|
| |
If a client encoding is specified as a connection parameter (or
environment variable), internal storage allocated for it would never
be freed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Phil Sorber reported that a rewriting ALTER TABLE within an extension
update script failed, because it creates and then drops a placeholder
table; the drop was being disallowed because the table was marked as an
extension member. We could hack that specific case but it seems likely
that there might be related cases now or in the future, so the most
practical solution seems to be to create an exception to the general rule
that extension member objects can only be dropped by dropping the owning
extension. To wit: if the DROP is issued within the extension's own
creation or update scripts, we'll allow it, implicitly performing an
"ALTER EXTENSION DROP object" first. This will simplify cases such as
extension downgrade scripts anyway.
No docs change since we don't seem to have documented the idea that you
would need ALTER EXTENSION DROP for such an action to begin with.
Also, arrange for explicitly temporary tables to not get linked as
extension members in the first place, and the same for the magic
pg_temp_nnn schemas that are created to hold them. This prevents assorted
unpleasant results if an extension script creates a temp table: the forced
drop at session end would either fail or remove the entire extension, and
neither of those outcomes is desirable. Note that this doesn't fix the
ALTER TABLE scenario, since the placeholder table is not temp (unless the
table being rewritten is).
Back-patch to 9.1.
|
|
|
|
|
|
|
| |
Commit 3e9a2672d25aed15ae6b4a09decbd8927d069868 fixed this for master,
but REL9_1_STABLE also needs fixing; this is a back-branch commit only.
Tomas Vondra
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes an oversight in commit 11cad29c91524aac1d0b61e0ea0357398ab79bf8,
which introduced MergeAppend plans. Before that happened, we never
particularly cared about the sort ordering of scans of inheritance child
relations, since appending their outputs together would destroy any
ordering anyway. But now it's important to be able to match child relation
sort orderings to those of the surrounding query. The original coding of
add_child_rel_equivalences skipped ec_has_const EquivalenceClasses, on the
originally-correct grounds that adding child expressions to them was
useless. The effect of this is that when a parent variable is equated to
a constant, we can't recognize that index columns on the equivalent child
variables are not sort-significant; that is, we can't recognize that a
child index on, say, (x, y) is able to generate output in "ORDER BY y"
order when there is a clause "WHERE x = constant". Adding child
expressions to the (x, constant) EquivalenceClass fixes this, without any
downside that I can see other than a few more planner cycles expended on
such queries.
Per recent gripe from Robert McGehee. Back-patch to 9.1 where MergeAppend
was introduced.
|
|
|
|
|
|
|
|
|
|
|
|
| |
A prepared transaction can get new conflicts in and out after preparing, so
we cannot rely on the in- and out-flags stored in the statefile at prepare-
time. As a quick fix, make the conservative assumption that after a restart,
all prepared transactions are considered to have both in- and out-conflicts.
That can lead to unnecessary rollbacks after a crash, but that shouldn't be
a big problem in practice; you don't want prepared transactions to hang
around for a long time anyway.
Dan Ports
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 4016bdef8aded77b4903c457050622a5a1815c16 I fixed a bunch of
ginxlog.c bugs having to do with not handling XLogReadBuffer failures
correctly. However, in ginRedoUpdateMetapage and ginRedoDeleteListPages,
I unaccountably thought that failure to read the metapage would be
impossible and just put in an elog(PANIC) call. This is of course wrong:
failure is exactly what will happen if the index got dropped (or rebuilt)
between creation of the WAL record and the crash we're trying to recover
from. I believe this explains Nicholas Wilson's recent report of these
errors getting reached.
Also, fix memory leak in forgetIncompleteSplit. This wasn't of much
concern when the code was written, but in a long-running standby server
page split records could be expected to accumulate indefinitely.
Back-patch to 8.4 --- before that, GIN didn't have a metapage.
|