| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
| |
On a big NUMA machine with 1000 connections in saturation load
there was a performance regression due to spinlock contention, for
acquiring values which were never used. Just fill with dummy
values if we're not going to use them.
This patch has not been benchmarked yet on a big NUMA machine, but
it seems like a good idea on general principle, and it seemed to
prevent an apparent 2.2% regression on a single-socket i7 box
running 200 connections at saturation load.
|
|
|
|
|
|
|
|
|
|
| |
Rename this function to GenericXLogRegisterBuffer() to make it clearer
what it does, and leave room for other sorts of "register" actions in
future. Also, replace its "bool isNew" argument with an integer flags
argument, so as to allow adding more flags in future without an API
break.
Alexander Korotkov, adjusted slightly by me
|
|
|
|
|
|
|
|
|
|
|
| |
The previous coding could allow the contents of the "hole" between pd_lower
and pd_upper to diverge during replay from what it had been when the update
was originally applied. This would pose a problem if checksums were in
use, and in any case would complicate forensic comparisons between master
and slave servers. So force the "hole" to contain zeroes, both at initial
application of a generically-logged action, and at replay.
Alexander Korotkov, adjusted slightly by me
|
|
|
|
|
|
| |
It's 2016 these days (no, not entirely sure how we got here either).
Pointed out by Amit Langote
|
| |
|
|
|
|
|
|
|
|
|
| |
When IF NOT EXISTS was added to CREATE TABLE AS, this logic didn't get
the memo, possibly resulting in an Assert failure. It looks like there
would have been no ill effects in a non-Assert build, though. Back-patch
to 9.5 where the IF NOT EXISTS option was added.
Stas Kelvich
|
|
|
|
|
|
|
|
|
|
|
|
| |
I was initially concerned that the some of the hundreds of
references to BufferGetPage() where the literal
BGP_NO_SNAPSHOT_TEST were passed might not optimize as well as a
macro, leading to some hard-to-find performance regressions in
corner cases. Inspection of disassembled code has shown identical
code at all inspected locations, and the size difference doesn't
amount to even one byte per such call. So make it readable.
Per gripes from Álvaro Herrera and Tom Lane
|
|
|
|
|
|
|
|
|
| |
It was incorrectly declared as a volatile pointer to a non-volatile
structure. Eliminate the OldSnapshotControl struct definition; it
is really not needed. Pointed out by Tom Lane.
While at it, add OldSnapshotControlData to pgindent's list of
structures.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The existing code would either Assert or generate an invalid
SyncRepConfig variable, neither of which is desirable. A regular
error should be thrown instead.
This commit silences compiler warning in non assertion-enabled builds.
Per report from Jeff Janes.
Suggested fix by Tom Lane.
|
|
|
|
|
|
|
| |
Coverity complained about an apparent missing "break" in a switch
added by bb140506df605fab. The human-readable comments are pretty
clear that this is intentional, but add a standard /* FALL THRU */
comment to make it clear to tools too.
|
|
|
|
|
|
|
| |
Coverity complained that the code added by 015e88942aa50f0d lacked an
error check for SearchSysCache1 failures, which it should have. But
the code was pretty duff in other ways too, including failure to think
about whether it could really cope with arrays of different lengths.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we used a spinlock, in adition to the atomically manipulated
->state field, to protect the wait queue. But it's pretty simple to
instead perform the locking using a flag in state.
Due to 6150a1b0 BufferDescs, on platforms (like PPC) with > 1 byte
spinlocks, increased their size above 64byte. As 64 bytes are the size
we pad allocated BufferDescs to, this can increase false sharing;
causing performance problems in turn. Together with the previous commit
this reduces the size to <= 64 bytes on all common platforms.
Author: Andres Freund
Discussion: CAA4eK1+ZeB8PMwwktf+3bRS0Pt4Ux6Rs6Aom0uip8c6shJWmyg@mail.gmail.com
20160327121858.zrmrjegmji2ymnvr@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pinning/Unpinning a buffer is a very frequent operation; especially in
read-mostly cache resident workloads. Benchmarking shows that in various
scenarios the spinlock protecting a buffer header's state becomes a
significant bottleneck. The problem can be reproduced with pgbench -S on
larger machines, but can be considerably worse for queries which touch
the same buffers over and over at a high frequency (e.g. nested loops
over a small inner table).
To allow atomic operations to be used, cram BufferDesc's flags,
usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable;
that allows to manipulate them together using 32bit compare-and-swap
operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could
be lifted by using a 64bit field, but it's not a realistic configuration
atm).
As not all operations can easily implemented in a lockfree manner,
implement the previous buf_hdr_lock via a flag bit in the atomic
variable. That way we can continue to lock the header in places where
it's needed, but can get away without acquiring it in the more frequent
hot-paths. There's some additional operations which can be done without
the lock, but aren't in this patch; but the most important places are
covered.
As bufmgr.c now essentially re-implements spinlocks, abstract the delay
logic from s_lock.c into something more generic. It now has already two
users, and more are coming up; there's a follupw patch for lwlock.c at
least.
This patch is based on a proof-of-concept written by me, which Alexander
Korotkov made into a fully working patch; the committed version is again
revised by me. Benchmarking and testing has, amongst others, been
provided by Dilip Kumar, Alexander Korotkov, Robert Haas.
On a large x86 system improvements for readonly pgbench, with a high
client count, of a factor of 8 have been observed.
Author: Alexander Korotkov and Andres Freund
Discussion: 2400449.GjM57CE0Yg@dinodell
|
|
|
|
|
|
|
|
|
|
| |
I used the wrong variable here. Doesn't make a difference today because
the only plausible caller passes a non-NULL variable, but someday it
will be wrong, and even today's correctness is subtle: the caller that
does pass a NULL is never invoked because of object type constraints.
Surely not a condition to rely on.
Noted by Coverity
|
|
|
|
|
|
|
|
|
|
|
| |
Since we're requiring pages handled by generic_xlog.c to be standard
format, specify REGBUF_STANDARD when doing a full-page image, so that
xloginsert.c can compress out the "hole" between pd_lower and pd_upper.
Given the current API in which this path will be taken only for a newly
initialized page, the hole is likely to be particularly large in such
cases, so that this oversight could easily be performance-significant.
I don't notice any particular change in the runtime of contrib/bloom's
regression test, though.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the inner comparison loops of computeDelta() as tight as possible by
pulling considerations of valid and invalid ranges out of the inner loops,
and extending a match or non-match detection as far as possible before
deciding what to do next. To keep this tractable, give up the possibility
of merging fragments across the pd_lower to pd_upper gap. The fraction of
pages where that could happen (ie, there are 4 or fewer bytes in the gap,
*and* data changes immediately adjacent to it on both sides) is too small
to be worth spending cycles on.
Also, avoid two BLCKSZ-length memcpy()s by computing the delta before
moving data into the target buffer, instead of after. This doesn't save
nearly as many cycles as being tenser about computeDelta(), but it still
seems worth doing.
On my machine, this patch cuts a full 40% off the runtime of
contrib/bloom's regression test.
|
|
|
|
|
|
|
|
|
|
|
| |
This routine is unsafe as implemented, because it invalidates the page
image pointers returned by previous GenericXLogRegister() calls.
Rather than complicate the API or the implementation to avoid that,
let's just get rid of it; the use-case for having it seems much
too thin to justify a lot of work here.
While at it, do some wordsmithing on the SGML docs for generic WAL.
|
|
|
|
|
|
|
|
|
| |
Improve commentary, use more specific names for the delta fields,
const-ify pointer arguments where possible, avoid assuming that
initializing only the first element of a local array will guarantee
that the remaining elements end up as we need them. (I think that
code in generic_redo actually worked, but only because InvalidBuffer
is zero; this is a particularly ugly way of depending on that ...)
|
|
|
|
|
| |
This code desperately needs some micro-optimization, and I'd like it
to be formatted a bit more nicely while I work on it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously bcac23d exposed a subset of support functions, namely the
ones Kaigai found useful. In
20160304193704.elq773pyg5fyl3mi@alap3.anarazel.de I mentioned that
there's some functions missing to use the facility in an external
project.
To avoid having to add functions piecemeal, add all the functions which
are used to define READ_* and WRITE_* macros; users of the extensible
node functionality are likely to need these. Additionally expose
outDatum(), which doesn't have it's own WRITE_ macro, as it needs
information from the embedding struct.
Discussion: 20160304193704.elq773pyg5fyl3mi@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This creates an initial set of default roles which administrators may
use to grant access to, historically, superuser-only functions. Using
these roles instead of granting superuser access reduces the number of
superuser roles required for a system. Documention for each of the
default roles has been added to user-manag.sgml.
Bump catversion to 201604082, as we had a commit that bumped it to
201604081 and another that set it back to 201604071...
Reviews by José Luis Tallón and Robert Haas
|
|
|
|
|
|
|
|
|
| |
This will prevent users from creating roles which begin with "pg_" and
will check for those roles before allowing an upgrade using pg_upgrade.
This will allow for default roles to be provided at initdb time.
Reviews by José Luis Tallón and Robert Haas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This feature is controlled by a new old_snapshot_threshold GUC. A
value of -1 disables the feature, and that is the default. The
value of 0 is just intended for testing. Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect. The xmin associated with a transaction ID does
still protect dead tuples. A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.
This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in. It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point. This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third. The follow-on patch will change the places where the test
needs to be made.
|
|
|
|
|
|
| |
It's not ready yet, revert two commits
690c543550b0d2852060c18d270cdb534d339d9a - unstable test output
386e3d7609c49505e079c40c65919d99feb82505 - patch itself
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These parameters are available for SSPI authentication only, to make
it possible to make it behave more like "normal gssapi", while
making it possible to maintain compatibility.
compat_realm is on by default, but can be turned off to make the
authentication use the full Kerberos realm instead of the NetBIOS name.
upn_username is off by default, and can be turned on to return the users
Kerberos UPN rather than the SAM-compatible name (a user in Active
Directory can have both a legacy SAM-compatible username and a new
Kerberos one. Normally they are the same, but not always)
Author: Christian Ullrich
Reviewed by: Robbie Harwood, Alvaro Herrera, me
|
|
|
|
|
|
|
| |
Found during investigation of failure of skink buildfarm member and its
valgrind report.
Backpatch to all supported branches
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenSSL has an unfortunate tendency to mix per-session state error
handling with per-thread error handling. This can cause problems when
programs that link to libpq with OpenSSL enabled have some other use of
OpenSSL; without care, one caller of OpenSSL may cause problems for the
other caller. Backend code might similarly be affected, for example
when a third party extension independently uses OpenSSL without taking
the appropriate precautions.
To fix, don't trust other users of OpenSSL to clear the per-thread error
queue. Instead, clear the entire per-thread queue ahead of certain I/O
operations when it appears that there might be trouble (these I/O
operations mostly need to call SSL_get_error() to check for success,
which relies on the queue being empty). This is slightly aggressive,
but it's pretty clear that the other callers have a very dubious claim
to ownership of the per-thread queue. Do this is both frontend and
backend code.
Finally, be more careful about clearing our own error queue, so as to
not cause these problems ourself. It's possibly that control previously
did not always reach SSLerrmessage(), where ERR_get_error() was supposed
to be called to clear the queue's earliest code. Make sure
ERR_get_error() is always called, so as to spare other users of OpenSSL
the possibility of similar problems caused by libpq (as opposed to
problems caused by a third party OpenSSL library like PHP's OpenSSL
extension). Again, do this is both frontend and backend code.
See bug #12799 and https://bugs.php.net/bug.php?id=68276
Based on patches by Dave Vitek and Peter Eisentraut.
From: Peter Geoghegan <pg@bowt.ie>
|
|
|
|
|
|
|
|
| |
Create a "bsd" auth method that works the same as "password" so far as
clients are concerned, but calls the BSD Authentication service to
check the password. This is currently only available on OpenBSD.
Marisa Emerson, reviewed by Thomas Munro
|
|
|
|
|
|
|
|
|
| |
This allows parallel aggregation to use them. It may seem surprising
that we use float8_combine for both float4_accum and float8_accum
transition functions, but that's because those functions differ only
in the type of the non-transition-state argument.
Haribabu Kommi, reviewed by David Rowley and Tomas Vondra
|
|
|
|
|
|
|
|
|
| |
As noticed by Tom Lane changing operation's number in commit
bb140506df605fab58f48926ee1db1f80bdafb59 causes on-disk format incompatibility.
Revert to previous numbering, that is reason to add special array to store
priorities of operation. Also it reverts order of tsquery to previous.
Author: Dmitry Ivanov
|
|
|
|
|
|
|
|
|
|
| |
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.
Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
|
|
|
|
| |
see also ce8d7bb6440710058503d213b2aafcdf56a5b481
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't try to examine S_ISLNK(st.st_mode) after a failed lstat().
It's undefined.
Also, if the lstat() reported ENOENT, we do not wish that to be a hard
error, but the code might nonetheless treat it as one (giving an entirely
misleading error message, too) depending on luck-of-the-draw as to what
S_ISLNK() returned.
Don't throw error for ENOENT from rmdir(), either. (We're not really
expecting ENOENT because we just stat'd the file successfully; but
if we're going to allow ENOENT in the symlink code path, surely the
directory code path should too.)
Generate an appropriate errcode for its-the-wrong-type-of-file complaints.
(ERRCODE_SYSTEM_ERROR doesn't seem appropriate, and failing to write
errcode() around it certainly doesn't work, and not writing an errcode
at all is not per project policy.)
Valgrind noticed the undefined S_ISLNK result; the other problems emerged
while reading the code in the area.
All of this appears to have been introduced in 8f15f74a44f68f9c.
Back-patch to 9.5 where that commit appeared.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Benchmarking has shown that the current number of clog buffers limits
scalability. We've previously increased the number in 33aaa139, but
that's not sufficient with a large number of clients.
We've benchmarked the cost of increasing the limit by benchmarking worst
case scenarios; testing showed that 128 buffers don't cause a
regression, even in contrived scenarios, whereas 256 does
There are a number of more complex patches flying around to address
various clog scalability problems, but this is simple enough that we can
get it into 9.6; and is beneficial even after those patches have been
applied.
It is a bit unsatisfactory to increase this in small steps every few
releases, but a better solution seems to require a rewrite of slru.c;
not something done quickly.
Author: Amit Kapila and Andres Freund
Discussion: CAA4eK1+-=18HOrdqtLXqOMwZDbC_15WTyHiFruz7BvVArZPaAw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
The code that estimates what parallel degree should be uesd for the
scan of a relation is currently rather stupid, so add a parallel_degree
reloption that can be used to override the planner's rather limited
judgement.
Julien Rouhaud, reviewed by David Rowley, James Sewell, Amit Kapila,
and me. Some further hacking by me.
|
|
|
|
| |
Per Tom Lane and the buildfarm.
|
|
|
|
|
|
|
|
|
| |
The PAM_RHOST item is set to the remote IP address or host name and can
be used by PAM modules. A pg_hba.conf option is provided to choose
between IP address and resolved host name.
From: Grzegorz Sampolski <grzsmp@gmail.com>
Reviewed-by: Haribabu Kommi <kommi.haribabu@gmail.com>
|
|
|
|
|
|
|
| |
Rename comparePos() to compareWordEntryPos() to prevent export of too
generic name.
Per gripe from Tom Lane.
|
|
|
|
|
|
|
|
|
|
|
| |
We still use replacement selection for the first run of the sort only
and only when the number of tuples is relatively small. Otherwise,
the first run, and subsequent runs in all cases, are produced using
quicksort. This tends to be faster except perhaps for very small
amounts of working memory.
Peter Geoghegan, reviewed by Tomas Vondra, Jeff Janes, Mithun Cy,
Greg Stark, and me.
|
|
|
|
|
|
|
|
|
| |
Contention on the relation extension lock can become quite fierce when
multiple processes are inserting data into the same relation at the same
time at a high rate. Experimentation shows the extending the relation
multiple blocks at a time improves scalability.
Dilip Kumar, reviewed by Petr Jelinek, Amit Kapila, and me.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In cases where joins use multiple columns we currently assess each join
separately causing gross mis-estimates for join cardinality.
This patch adds use of FK information for the first time into the
planner. When FKs are present and we have multi-column join information,
plan estimates will be drastically improved. Cases with multiple FKs
are handled, though partial matches are ignored currently.
Net effect is substantial performance improvements for joins in many
common cases. Additional planning time is isolated to cases that are
currently performing poorly, measured at 0.08 - 0.15 ms.
Please watch for planner performance regressions; circumstances seem
unlikely but the law of unintended consequences may apply somewhen.
Additional complex tests welcome to prove this before release.
Tests can be performed using SET enable_fkey_estimates = on | off
using scripts provided during Hackers discussions, message id:
552335D9.3090707@2ndquadrant.com
Authors: Tomas Vondra and David Rowley
Reviewed and tested by Simon Riggs, adding comments only
|
|
|
|
|
| |
Per investigation failure skink buildfarm member and
RANDOMIZE_ALLOCATED_MEMORY help
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extracted from pending unique-join patch, since this is a rather large
delta but it's simply moving code out into separately-accessible
subroutines.
I (tgl) did choose to add a bit more logic to rel_supports_distinctness,
so that it verifies that there's at least one potentially usable unique
index rather than just checking indexlist != NIL. Otherwise there's
no functional change here.
David Rowley
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While prior to this patch the user-visible effect on the database
of any set of successfully committed serializable transactions was
always consistent with some one-at-a-time order of execution of
those transactions, the presence of declarative constraints could
allow errors to occur which were not possible in any such ordering,
and developers had no good workarounds to prevent user-facing
errors where they were not necessary or desired. This patch adds
a check for serialization failure ahead of duplicate key checking
so that if a developer explicitly (redundantly) checks for the
pre-existing value they will get the desired serialization failure
where the problem is caused by a concurrent serializable
transaction; otherwise they will get a duplicate key error.
While it would be better if the reads performed by the constraints
could count as part of the work of the transaction for
serialization failure checking, and we will hopefully get there
some day, this patch allows a clean and reliable way for developers
to work around the issue. In many cases existing code will already
be doing the right thing for this to "just work".
Author: Thomas Munro, with minor editing of docs by me
Reviewed-by: Marko Tiikkaja, Kevin Grittner
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
On-disk and binary in/out format of tsquery are backward compatible.
It has two side effect:
- change order for tsquery, so, users, who has a btree index over tsquery,
should reindex it
- less number of parenthesis in tsquery output, and tsquery becomes more
readable
Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
Reviewers: Alexander Korotkov, Artur Zakirov
|
|
|
|
|
|
|
| |
Fastpath ignores this if no triggers defined.
Author: Tomas Vondra, with fastpath and comments added by me
Reviewers: David Rowley, Simon Riggs
|
|
|
|
|
|
|
|
|
|
|
| |
Commit c22650cd6450854e1a75064b698d7dcbb4a8821a sparked a discussion
about diverse interpretations of "token user" in error messages. Expel
old and new specimens of that phrase by making all GetTokenInformation()
callers report errors the way GetTokenUser() has been reporting them.
These error conditions almost can't happen, so users are unlikely to
observe this change.
Reviewed by Tom Lane and Stephen Frost.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that pg_dump will properly dump out any ACL changes made to
functions which exist in pg_catalog, switch to using the GRANT system
to manage access to those functions.
This means removing 'if (!superuser()) ereport()' checks from the
functions themselves and then REVOKEing EXECUTE right from 'public' for
these functions in system_views.sql.
Reviews by Alexander Korotkov, Jose Luis Tallon
|