| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted some time ago, the original coding had a typo ("|" for "^")
that made the result less unique than intended. Even the intended
behavior is obsolete since it was based on wanting to produce a
usable value even if we didn't have int64 arithmetic --- a limitation
we stopped supporting years ago. Instead, let's redefine the system
identifier as tv_sec in the upper 32 bits (same as before), tv_usec
in the next 20 bits, and the low 12 bits of getpid() in the remaining
bits. This is still hardly guaranteed-universally-unique, but it's
noticeably better than before. Per my proposal at
<29019.1374535940@sss.pgh.pa.us>
|
|
|
|
|
|
|
|
|
|
| |
We should use exprTypmod() to extract the typmod of the expression,
instead of just blindly storing -1. This seems to have been an aboriginal
oversight in commit fc8d970cbcdd6f025475822a4cf01dfda0873226 which
introduced general-expression indexes. The consequences are only cosmetic
at present, since the index machinery doesn't really look at typmod for
index columns; but still it seems best to describe the column type as
precisely as we can. Per off-list complaint from Thomas Fanghaenel.
|
|
|
|
|
|
|
| |
Original coding failed to enlarge the array as required if
the requested tranche_id was equal to LWLockTranchesAllocated.
In passing, fix poor style of not casting the result of (re)palloc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a tuple is locked, and this lock is later upgraded either to an
update or to a stronger lock, and in the meantime some other process
tries to lock, update or delete the same tuple, it (the tuple) could end
up being updated twice, or having conflicting locks held.
The reason for this is that the second updater checks for a change in
Xmax value, or in the HEAP_XMAX_IS_MULTI infomask bit, after noticing
the first lock; and if there's a change, it restarts and re-evaluates
its ability to update the tuple. But it neglected to check for changes
in lock strength or in lock-vs-update status when those two properties
stayed the same. This would lead it to take the wrong decision and
continue with its own update, when in reality it shouldn't do so but
instead restart from the top.
This could lead to either an assertion failure much later (when a
multixact containing multiple updates is detected), or duplicate copies
of tuples.
To fix, make sure to compare the other relevant infomask bits alongside
the Xmax value and HEAP_XMAX_IS_MULTI bit, and restart from the top if
necessary.
Also, in the belt-and-suspenders spirit, add a check to
MultiXactCreateFromMembers that a multixact being created does not have
two or more members that are claimed to be updates. This should protect
against other bugs that might cause similar bogus situations.
Backpatch to 9.3, where the possibility of multixacts containing updates
was introduced. (In prior versions it was possible to have the tuple
lock upgraded from shared to exclusive, and an update would not restart
from the top; yet we're protected against a bug there because there's
always a sleep to wait for the locking transaction to complete before
continuing to do anything. Really, the fact that tuple locks always
conflicted with concurrent updates is what protected against bugs here.)
Per report from Andrew Dunstan and Josh Berkus in thread at
http://www.postgresql.org/message-id/534C8B33.9050807@pgexperts.com
Bug analysis by Andres Freund.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Once we've completed a PREPARE, our session is not running a transaction,
so its entry in pg_stat_activity should show xact_start as null, rather
than leaving the value as the start time of the now-prepared transaction.
I think possibly this oversight was triggered by faulty extrapolation
from the adjacent comment that says PrepareTransaction should not call
AtEOXact_PgStat, so tweak the wording of that comment.
Noted by Andres Freund while considering bug #10123 from Maxim Boguk,
although this error doesn't seem to explain that report.
Back-patch to all active branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before 9.4, such an aggregate couldn't be declared, because its final
function would have to have polymorphic result type but no polymorphic
argument, which CREATE FUNCTION would quite properly reject. The
ordered-set-aggregate patch found a workaround: allow the final function
to be declared as accepting additional dummy arguments that have types
matching the aggregate's regular input arguments. However, we failed
to notice that this problem applies just as much to regular aggregates,
despite the fact that we had a built-in regular aggregate array_agg()
that was known to be undeclarable in SQL because its final function
had an illegal signature. So what we should have done, and what this
patch does, is to decouple the extra-dummy-arguments behavior from
ordered-set aggregates and make it generally available for all aggregate
declarations. We have to put this into 9.4 rather than waiting till
later because it slightly alters the rules for declaring ordered-set
aggregates.
The patch turned out a bit bigger than I'd hoped because it proved
necessary to record the extra-arguments option in a new pg_aggregate
column. I'd thought we could just look at the final function's pronargs
at runtime, but that didn't work well for variadic final functions.
It's probably just as well though, because it simplifies life for pg_dump
to record the option explicitly.
While at it, fix array_agg() to have a valid final-function signature,
and add an opr_sanity test to notice future deviations from polymorphic
consistency. I also marked the percentile_cont() aggregates as not
needing extra arguments, since they don't.
|
|
|
|
| |
We no longer have a TLI field in the page header.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When marking a branch as half-dead, a pointer to the top of the branch is
stored in the leaf block's hi-key. During normal operation, the high key
was left in place, and the block number was just stored in the ctid field
of the high key tuple, but in WAL replay, the high key was recreated as a
truncated tuple with zero columns. For the sake of easier debugging, also
truncate the tuple in normal operation, so that the page is identical
after WAL replay. Also, rename the 'downlink' field in the WAL record to
'topparent', as that seems like a more descriptive name. And make sure
it's set to invalid when unlinking the leaf page.
|
|
|
|
|
|
|
| |
Some ancient comments claimed that fn_nargs could be -1 to indicate a
variable number of input arguments; but this was never implemented, and
is at variance with what we ultimately did with "variadic" functions.
Update the comments.
|
|
|
|
|
| |
It's blatantly obvious that commit 4d0d607a454ee832574afd52a3c515099cc85eb3
wasn't tested. The leak's real enough, though.
|
|
|
|
| |
Revert due to contrib/test_decoding regression failure
|
|
|
|
| |
Patch by Ants Aasma
|
|
|
|
|
|
| |
Forgot to update LSN of left sibling's page, when creating a new root.
I fixed this for regular insertions and page splits earlier, but missed
new root creation.
|
|
|
|
|
|
|
| |
The README incorrectly claimed that GIN posting tree pages contain an array
of uncompressed items in addition to compressed posting lists. Earlier
versions of the GIN posting list compression patch worked that way, but not
the one that was committed.
|
|
|
|
|
| |
When modifying a page, must hold an exclusive lock. A shared lock is
obviously not good enough.
|
|
|
|
|
| |
It makes no difference to the system, but minimizing the differences
between a master and standby makes debugging simpler.
|
|
|
|
| |
A couple of typos from my refactoring of the page deletion patch.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't use simple_heap_insert to insert the tuple to a sequence relation.
simple_heap_insert creates a heap insertion WAL record, and replaying that
will create a regular heap page without the special area containing the
sequence magic constant, which is wrong for a sequence. That was not a bug
because we always created a sequence WAL record after that, and replaying
that overwrote the bogus heap page, and the transient state could never be
seen by another backend because it was only done when creating a new
sequence relation. But it's simpler and cleaner to avoid that in the first
place.
|
|
|
|
| |
Etsuro Fujita
|
|
|
|
| |
Amit Langote
|
|
|
|
|
|
|
| |
Permissions might prevent the existence of the trigger file from being
checked.
Per report from Andres Freund
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we set the all-visible flag after writing WAL record, and XLogInsert
takes a full-page image of the page, the image would not include the flag.
We will then proceed to set the VM bit, which would then be set without the
corresponding all-visible flag on the heap page.
Found by comparing page images on master and standby, after writing/replaying
each WAL record. (There is still a discrepancy: the all-visible flag won't
be set after replaying the HEAP_CLEAN record, even though it is set in the
master. However, it will be set when replaying the HEAP2_VISIBLE record and
setting the VM bit, so the all-visible flag and VM bit are always consistent
on the standby, even though they are momentarily out-of-sync with master)
Backpatch to 9.3 where this code was introduced.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that EXPLAIN also outputs a "planning time" measurement, the use of
"total" here seems rather confusing: it sounds like it might include the
planning time which of course it doesn't. Majority opinion was that
"execution time" is a better label, so we'll call it that.
This should be noted as a backwards incompatibility for tools that examine
EXPLAIN ANALYZE output.
In passing, I failed to resist the temptation to do a little editing on the
materialized-view example affected by this change.
|
|
|
|
|
|
|
| |
We were neglecting to schema-qualify them.
Backpatch to 9.3, where object identities were introduced as a concept
by commit f8348ea32ec8.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the Single Unix Spec and assorted man pages, you're supposed
to use the constants named AF_xxx when setting ai_family for a getaddrinfo
call. In a few places we were using PF_xxx instead. Use of PF_xxx
appears to be an ancient BSD convention that was not adopted by later
standardization. On BSD and most later Unixen, it doesn't matter much
because those constants have equivalent values anyway; but nonetheless
this code is not per spec.
In the same vein, replace PF_INET by AF_INET in one socket() call, which
wasn't even consistent with the other socket() call in the same function
let alone the remainder of our code.
Per investigation of a Cygwin trouble report from Marco Atzeri. It's
probably a long shot that this will fix his issue, but it's wrong in
any case.
|
|
|
|
|
|
|
|
|
| |
These are natural complements to the functions added by commit
0886fc6a5c75b294544263ea979b9cf6195407d9, but they weren't included
in the original patch for some reason. Add them.
Patch by me, per a complaint by Tom Lane. Review by Tatsuo
Ishii.
|
|
|
|
|
|
|
|
| |
Apparently, Windows can sometimes return an error code even when the
operation actually worked just fine. Rearrange the order of checks
according to what appear to be the best practices in this area.
Amit Kapila
|
|
|
|
|
|
|
|
| |
Previously, in some places, socket creation errors were checked for
negative values, which is not true for Windows because sockets are
unsigned. This masked socket creation errors on Windows.
Backpatch through 9.0. 8.4 doesn't have the infrastructure to fix this.
|
|
|
|
|
|
| |
I mixed up BLCKSZ and XLOG_BLCKSZ when I changed the way the buffer is
allocated a couple of weeks ago. With the default settings, they are both
8k, but they can be changed at compile-time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows squeezing out the unused space in full-page writes. And more
importantly, it can be a useful debugging aid.
In hindsight we should've done this back when GIN was added - we wouldn't
need the 'maxoff' field in the page opaque struct if we had used pd_lower
and pd_upper like on normal pages. But as long as there can be pages in the
index that have been binary-upgraded from pre-9.4 versions, we can't rely
on that, and have to continue using 'maxoff'.
Most of the code churn comes from renaming some macros, now that they're
used on internal pages, too.
This change is completely backwards-compatible, no effect on pg_upgrade.
|
|
|
|
|
|
|
|
|
|
|
| |
Make sure we throw an error instead of silently doing the wrong thing when
fed a strategy number we don't recognize. Also, in the places that did
already throw an error, spell the error message in a way more consistent
with our message style guidelines.
Per report from Paul Jones. Although this is a bug, it won't occur unless
a superuser tries to do something he shouldn't, so it doesn't seem worth
back-patching.
|
|
|
|
|
| |
In some places, the function assumes the left page is valid, and in others,
it checks if it is valid. Remove all the checks.
|
|
|
|
|
|
| |
The entry B-tree pages all follow the standard page layout. The 9.3 code has
this right. I inadvertently changed this at some point during the big
refactorings in git master.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Repositioning the tuplestore seek pointer in window_gettupleslot() turns
out to be a very significant expense when the window frame is sizable and
the frame end can move. To fix, introduce a tuplestore function for
skipping an arbitrary number of tuples in one call, parallel to the one we
introduced for tuplesort objects in commit 8d65da1f. This reduces the cost
of window_gettupleslot() to O(1) if the tuplestore has not spilled to disk.
As in the previous commit, I didn't try to do any real optimization of
tuplestore_skiptuples for the case where the tuplestore has spilled to
disk. There is probably no practical way to get the cost to less than O(N)
anyway, but perhaps someone can think of something later.
Also fix PersistHoldablePortal() to make use of this API now that we have
it.
Based on a suggestion by Dean Rasheed, though this turns out not to look
much like his patch.
|
|
|
|
|
|
|
|
|
|
| |
Given that ALTER TABLESPACE has moved on from just existing for
general purpose rename/owner changes, it deserves its own top-level
production in the grammar. This also cleans up the RenameStmt to
only ever be used for actual RENAMEs again- it really wasn't
appropriate to hide non-RENAME productions under there.
Noted by Alvaro.
|
|
|
|
| |
David Rowley and Florian Pflug, reviewed by Dean Rasheed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Views which are marked as security_barrier must have their quals
applied before any user-defined quals are called, to prevent
user-defined functions from being able to see rows which the
security barrier view is intended to prevent them from seeing.
Remove the restriction on security barrier views being automatically
updatable by adding a new securityQuals list to the RTE structure
which keeps track of the quals from security barrier views at each
level, independently of the user-supplied quals. When RTEs are
later discovered which have securityQuals populated, they are turned
into subquery RTEs which are marked as security_barrier to prevent
any user-supplied quals being pushed down (modulo LEAKPROOF quals).
Dean Rasheed, reviewed by Craig Ringer, Simon Riggs, KaiGai Kohei
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First installment of the promised moving-aggregate support in built-in
aggregates: count(), sum(), avg(), stddev() and variance() for
assorted datatypes, though not for float4/float8.
In passing, remove a 2001-vintage kluge in interval_accum(): interval
array elements have been properly aligned since around 2003, but
nobody remembered to take out this workaround. Also, fix a thinko
in the opr_sanity tests for moving-aggregate catalog entries.
David Rowley and Florian Pflug, reviewed by Dean Rasheed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now, when executing an aggregate function as a window function
within a window with moving frame start (that is, any frame start mode
except UNBOUNDED PRECEDING), we had to recalculate the aggregate from
scratch each time the frame head moved. This patch allows an aggregate
definition to include an alternate "moving aggregate" implementation
that includes an inverse transition function for removing rows from
the aggregate's running state. As long as this can be done successfully,
runtime is proportional to the total number of input rows, rather than
to the number of input rows times the average frame length.
This commit includes the core infrastructure, documentation, and regression
tests using user-defined aggregates. Follow-on commits will update some
of the built-in aggregates to use this feature.
David Rowley and Florian Pflug, reviewed by Dean Rasheed; additional
hacking by me
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were a couple of bugs here. First, if the fuzzy limit was exceeded,
the loop in entryGetItem might drop out too soon if a whole block needs to
be skipped because it's < advancePast ("continue" in a while-loop checks the
loop condition too). Secondly, the loop checked when stepping to a new page
that there is at least one offset on the page < advancePast, but we cannot
rely on that on subsequent calls of entryGetItem, because advancePast might
change in between. That caused the skipping loop to read bogus items in the
TbmIterateResult's offset array.
First item and fix by Alexander Korotkov, second bug pointed out by FabrÃzio
de Royes Mello, by a small variation of Alexander's test query.
|
|
|
|
|
|
| |
And explain why.
Per report from Pavel Stehule
|
|
|
|
| |
Tomonari Katsumata
|
| |
|
| |
|
|
|
|
|
|
| |
This is more cleanup from commit 11a65eed1637a05b03e174700799b024e104bfb4.
Amit Kapila
|
|
|
|
|
| |
I'm not sure if this is what's causing the Windows buildfarm members
to get unhappy, but I don't think it can be helping anything...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This operator class can accelerate subnet/supernet tests as well as
btree-equivalent ordered comparisons. It also handles a new network
operator inet && inet (overlaps, a/k/a "is supernet or subnet of"),
which is expected to be useful in exclusion constraints.
Ideally this opclass would be the default for GiST with inet/cidr data,
but we can't mark it that way until we figure out how to do a more or
less graceful transition from the current situation, in which the
really-completely-bogus inet/cidr opclasses in contrib/btree_gist are
marked as default. Having the opclass in core and not default is better
than not having it at all, though.
While at it, add new documentation sections to allow us to officially
document GiST/GIN/SP-GiST opclasses, something there was never a clear
place to do before. I filled these in with some simple tables listing
the existing opclasses and the operators they support, but there's
certainly scope to put more information there.
Emre Hasegeli, reviewed by Andreas Karlsson, further hacking by me
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of storing the ID of the dynamic shared memory control
segment in a file within the data directory, store it in the main
control segment. This avoids a number of nasty corner cases,
most seriously that doing an online backup and then using it on
the same machine (e.g. to fire up a standby) would result in the
standby clobbering all of the master's dynamic shared memory
segments.
Per complaints from Heikki Linnakangas, Fujii Masao, and Tom
Lane.
|
|
|
|
|
|
|
|
|
| |
These functions won't throw an error if the object doesn't exist,
or if (for functions and operators) there's more than one matching
object.
Yugo Nagata and Nozomi Anzai, reviewed by Amit Khandekar, Marti
Raudsepp, Amit Kapila, and me.
|