| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have now grown enough registerable syscache-invalidation callback
functions that the original assumption that there would be few of them
is causing performance problems. In particular, let's fix things so that
CallSyscacheCallbacks doesn't have to search the whole array to find
which callback(s) to invoke for a given cache ID. Preserve the original
behavior that callbacks are called in order of registration, just in
case there's someplace that depends on that (which I doubt).
In support of this, export the number of syscaches from syscache.h.
People could have found that out anyway from the enum, but adding a
#define makes that much safer.
This provides a useful additional speedup in Mathieu Fenniak's
logical-decoding test case, although we're reaching the point of
diminishing returns there. I think any further improvement will have
to come from reducing the number of cache invalidations that are
triggered in the first place. Still, we can hope that this change
gives some incremental benefit for all invalidation scenarios.
Back-patch to 9.4 where logical decoding was introduced.
Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A test case provided by Mathieu Fenniak shows that the initial search for
the target catcache in CatalogCacheIdInvalidate consumes a very significant
amount of overhead in cases where cache invalidation is triggered but has
little useful work to do. There is no good reason for that search to exist
at all, as the index array maintained by syscache.c allows direct lookup of
the catcache from its ID. We just need a frontend function in syscache.c,
matching the division of labor for most other cache-accessing operations.
While there's more that can be done in this area, this patch alone reduces
the runtime of Mathieu's example by 2X. We can hope that it offers some
useful benefit in other cases too, although usually cache invalidation
overhead is not such a striking fraction of the total runtime.
Back-patch to 9.4 where logical decoding was introduced. It might be
worth going further back, but presently the only case we know of where
cache invalidation is really a significant burden is in logical decoding.
Also, older branches have fewer catcaches, reducing the possible benefit.
(Note: although this nominally changes catcache's API, we have always
documented CatalogCacheIdInvalidate as a private function, so I would
have little sympathy for an external module calling it directly. So
backpatching should be fine.)
Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we invalidate the relcache entry for a system catalog or index, we
must also delete the relcache "init file" if the init file contains a copy
of that rel's entry. The old way of doing this relied on a specially
maintained list of the OIDs of relations present in the init file: we made
the list either when reading the file in, or when writing the file out.
The problem is that when writing the file out, we included only rels
present in our local relcache, which might have already suffered some
deletions due to relcache inval events. In such cases we correctly decided
not to overwrite the real init file with incomplete data --- but we still
used the incomplete initFileRelationIds list for the rest of the current
session. This could result in wrong decisions about whether the session's
own actions require deletion of the init file, potentially allowing an init
file created by some other concurrent session to be left around even though
it's been made stale.
Since we don't support changing the schema of a system catalog at runtime,
the only likely scenario in which this would cause a problem in the field
involves a "vacuum full" on a catalog concurrently with other activity, and
even then it's far from easy to provoke. Remarkably, this has been broken
since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had
never seen a reproducible test case until recently. If it did happen in
the field, the symptoms would probably involve unexpected "cache lookup
failed" errors to begin with, then "could not open file" failures after the
next checkpoint, as all accesses to the affected catalog stopped working.
Recovery would require manually removing the stale "pg_internal.init" file.
To fix, get rid of the initFileRelationIds list, and instead consult
syscache.c's list of relations used in catalog caches to decide whether a
relation is included in the init file. This should be a tad more efficient
anyway, since we're replacing linear search of a list with ~100 entries
with a binary search. It's a bit ugly that the init file contents are now
so directly tied to the catalog caches, but in practice that won't make
much difference.
Back-patch to all supported branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a TABLESAMPLE clause to SELECT statements that allows
user to specify random BERNOULLI sampling or block level
SYSTEM sampling. Implementation allows for extensible
sampling functions to be written, using a standard API.
Basic version follows SQLStandard exactly. Usable
concrete use cases for the sampling API follow in later
commits.
Petr Jelinek
Reviewed by Michael Paquier and Simon Riggs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
e.g. to avoid loops in bi-directional replication setups
The solution to these problems, as implemented here, consist out of
three parts:
1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
replication origin, how far replay has progressed in a efficient and
crash safe manner.
3) The ability to filter out changes performed on the behest of a
replication origin during logical decoding; this allows complex
replication topologies. E.g. by filtering all replayed changes out.
Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated. We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.
This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL. Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.
For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.
Bumps both catversion and wal page magic.
Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
20140923182422.GA15776@alap3.anarazel.de,
20131114172632.GE7522@alap2.anarazel.de
|
|
|
|
|
|
|
|
| |
This provides a mechanism for specifying conversions between SQL data
types and procedural languages. As examples, there are transforms
for hstore and ltree for PL/Perl and PL/Python.
reviews by Pavel Stěhule and Andres Freund
|
|
|
|
| |
Backpatch certain files through 9.0
|
|
|
|
|
| |
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
|
|
|
|
|
| |
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
|
|
|
|
|
| |
Instead, place a forward struct declaration for struct catclist in
syscache.h. This reduces header proliferation somewhat.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They don't actually do anything yet; that will get fixed in a
follow-on commit. But this gets the basic infrastructure in place,
including CREATE/ALTER/DROP EVENT TRIGGER; support for COMMENT,
SECURITY LABEL, and ALTER EXTENSION .. ADD/DROP EVENT TRIGGER;
pg_dump and psql support; and documentation for the anticipated
initial feature set.
Dimitri Fontaine, with review and a bunch of additional hacking by me.
Thom Brown extensively reviewed earlier versions of this patch set,
but there's not a whole lot of that code left in this commit, as it
turns out.
|
|
|
|
|
|
|
|
|
|
|
| |
Now that cache invalidation callbacks get only a hash value, and not a
tuple TID (per commits 632ae6829f7abda34e15082c91d9dfb3fc0f298b and
b5282aa893e565b7844f8237462cb843438cdd5e), the only way they can restrict
what they invalidate is to know what the hash values mean. setrefs.c was
doing this via a hard-wired assumption but that seems pretty grotty, and
it'll only get worse as more cases come up. So let's expose a calculation
function that takes the same parameters as SearchSysCache. Per complaint
from Marko Kreen.
|
| |
|
|
|
|
|
|
|
| |
Selectivity estimation functions are missing for some range type operators,
which is a TODO.
Jeff Davis
|
|
|
|
|
|
|
|
| |
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
|
|
|
|
|
|
|
|
|
|
|
| |
Foreign tables are a core component of SQL/MED. This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried. Support for foreign table scans will need to
be added in a future patch. However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.
Shigeru Hanada, heavily revised by Robert Haas
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4). This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.
Design and review by Tom Lane.
|
|
|
|
|
|
|
|
|
| |
This patch only supports seq_page_cost and random_page_cost as parameters,
but it provides the infrastructure to scalably support many more.
In particular, we may want to add support for effective_io_concurrency,
but I'm leaving that as future work for now.
Thanks to Tom Lane for design help and Alvaro Herrera for the review.
|
| |
|
|
|
|
|
|
|
|
| |
and teach ANALYZE to compute such stats for tables that have subclasses.
Per my proposal of yesterday.
autovacuum still needs to be taught about running ANALYZE on parent tables
when their subclasses change, but the feature is useful even without that.
|
|
|
|
|
|
|
|
|
|
|
| |
the privileges that will be applied to subsequently-created objects.
Such adjustments are always per owning role, and can be restricted to objects
created in particular schemas too. A notable benefit is that users can
override the traditional default privilege settings, eg, the PUBLIC EXECUTE
privilege traditionally granted by default for functions.
Petr Jelinek
|
| |
|
|
|
|
|
|
|
|
| |
This doesn't do any remote or external things yet, but it gives modules
like plproxy and dblink a standardized and future-proof system for
managing their connection information.
Martin Pihlak and Peter Eisentraut
|
|
|
|
|
|
|
| |
to avoid the pain of manually renumbering them anytime we insert another
name in alphabetical order. An excellent idea from Alex Hunsaker and
NikhilS' inherited-constraints patch --- whether or not the rest of that
gets in, this should. Dunno why we never thought of it before.
|
| |
|
|
|
|
|
|
|
|
| |
Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing,
so anything that's broken is probably my fault.
Documentation is nonexistent as yet, but let's land the patch so we can
get some portability testing done.
|
|
|
|
|
| |
pg_type.typtype whereever practical. Tom Dunstan, with some kibitzing
from Tom Lane.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
equality checks it applies, instead of a random dependence on whatever
operators might be named "=". The equality operators will now be selected
from the opfamily of the unique index that the FK constraint depends on to
enforce uniqueness of the referenced columns; therefore they are certain to be
consistent with that index's notion of equality. Among other things this
should fix the problem noted awhile back that pg_dump may fail for foreign-key
constraints on user-defined types when the required operators aren't in the
search path. This also means that the former warning condition about "foreign
key constraint will require costly sequential scans" is gone: if the
comparison condition isn't indexable then we'll reject the constraint
entirely. All per past discussions.
Along the way, make the RI triggers look into pg_constraint for their
information, instead of using pg_trigger.tgargs; and get rid of the always
error-prone fixed-size string buffers in ri_triggers.c in favor of building up
the RI queries in StringInfo buffers.
initdb forced due to columns added to pg_constraint and pg_trigger.
|
|
|
|
| |
back-stamped for this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cases. Operator classes now exist within "operator families". While most
families are equivalent to a single class, related classes can be grouped
into one family to represent the fact that they are semantically compatible.
Cross-type operators are now naturally adjunct parts of a family, without
having to wedge them into a particular opclass as we had done originally.
This commit restructures the catalogs and cleans up enough of the fallout so
that everything still works at least as well as before, but most of the work
needed to actually improve the planner's behavior will come later. Also,
there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way
to create a new family right now is to allow CREATE OPERATOR CLASS to make
one by default. I owe some more documentation work, too. But that can all
be done in smaller pieces once this infrastructure is in place.
|
| |
|
|
|
|
| |
header files.
|
|
|
|
|
|
|
|
|
|
| |
in various places that were previously doing ad hoc pg_database searches.
This may speed up database-related privilege checks a little bit, but
the main motivation is to eliminate the performance reason for having
ReverifyMyDatabase do such a lot of stuff (viz, avoiding repeat scans
of pg_database during backend startup). The locking reason for having
that routine is about to go away, and it'd be good to have the option
to break it up.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
and pg_auth_members. There are still many loose ends to finish in this
patch (no documentation, no regression tests, no pg_dump support for
instance). But I'm going to commit it now anyway so that Alvaro can
make some progress on shared dependencies. The catalog changes should
be pretty much done.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
change saves a great deal of space in pg_proc and its primary index,
and it eliminates the former requirement that INDEX_MAX_KEYS and
FUNC_MAX_ARGS have the same value. INDEX_MAX_KEYS is still embedded
in the on-disk representation (because it affects index tuple header
size), but FUNC_MAX_ARGS is not. I believe it would now be possible
to increase FUNC_MAX_ARGS at little cost, but haven't experimented yet.
There are still a lot of vestigial references to FUNC_MAX_ARGS, which
I will clean up in a separate pass. However, getting rid of it
altogether would require changing the FunctionCallInfoData struct,
and I'm not sure I want to buy into that.
|
|
|
|
|
|
|
|
| |
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
|
| |
|
| |
|
| |
|
|
|
|
|
| |
CONVERSION code. Still need to figure out what to do about inappropriate
coding in parsing.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
code review by Tom Lane. Remaining issues: functions that take or
return tuple types are likely to break if one drops (or adds!)
a column in the table defining the type. Need to think about what
to do here.
Along the way: some code review for recent COPY changes; mark system
columns attnotnull = true where appropriate, per discussion a month ago.
|
|
|
|
| |
Add regression test
|
|
|
|
|
|
|
|
|
|
|
| |
extension to create binary compatible casts. Includes dependency tracking
as well.
pg_proc.proimplicit is now defunct, but will be removed in a separate
commit.
pg_dump provides a migration path from the previous scheme to declare
casts. Dumping binary compatible casts is currently impossible, though.
|