| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
| |
This saves several catalog lookups per reference. It's not all that
exciting right now, because we'd managed to minimize the number of places
that need to fetch the data; but the upcoming writable-foreign-tables patch
needs this info in a lot more places.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A materialized view has a rule just like a view and a heap and
other physical properties like a table. The rule is only used to
populate the table, references in queries refer to the
materialized data.
This is a minimal implementation, but should still be useful in
many cases. Currently data is only populated "on demand" by the
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
It is expected that future releases will add incremental updates
with various timings, and that a more refined concept of defining
what is "fresh" data will be developed. At some point it may even
be possible to have queries use a materialized in place of
references to underlying tables, but that requires the other
above-mentioned features to be working first.
Much of the documentation work by Robert Haas.
Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja
Security review by KaiGai Kohei, with a decision on how best to
implement sepgsql still pending.
|
|
|
|
|
|
|
| |
This enables non-backend code, such as pg_xlogdump, to use it easily.
The previous location, in src/backend/catalog/catalog.c, made that
essentially impossible because that file depends on many backend-only
facilities; so this needs to live separately.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch addresses the problem that applications currently have to
extract object names from possibly-localized textual error messages,
if they want to know for example which index caused a UNIQUE_VIOLATION
failure. It adds new error message fields to the wire protocol, which
can carry the name of a table, table column, data type, or constraint
associated with the error. (Since the protocol spec has always instructed
clients to ignore unrecognized field types, this should not create any
compatibility problem.)
Support for providing these new fields has been added to just a limited set
of error reports (mainly, those in the "integrity constraint violation"
SQLSTATE class), but we will doubtless add them to more calls in future.
Pavel Stehule, reviewed and extensively revised by Peter Geoghegan, with
additional hacking by Tom Lane.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the initial implementation of plan caching, we saved the active
search_path when a plan was first cached, then reinstalled that path
anytime we needed to reparse or replan. The idea of that was to try to
reselect the same referenced objects, in somewhat the same way that views
continue to refer to the same objects in the face of schema or name
changes. Of course, that analogy doesn't bear close inspection, since
holding the search_path fixed doesn't cope with object drops or renames.
Moreover sticking with the old path seems to create more surprises than
it avoids. So instead of doing that, consider that the cached plan depends
on search_path, and force reparse/replan if the active search_path is
different than it was when we last saved the plan.
This gets us fairly close to having "transparency" of plan caching, in the
sense that the cached statement acts the same as if you'd just resubmitted
the original query text for another execution. There are still some corner
cases where this fails though: a new object added in the search path
schema(s) might capture a reference in the query text, but we'd not realize
that and force a reparse. We might try to fix that in the future, but for
the moment it looks too expensive and complicated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
1290721684-sup-3951@alvh.no-ip.org
1294953201-sup-2099@alvh.no-ip.org
1320343602-sup-2290@alvh.no-ip.org
1339690386-sup-8927@alvh.no-ip.org
4FE5FF020200002500048A3D@gw.wicourts.gov
4FEAB90A0200002500048B7D@gw.wicourts.gov
|
|
|
|
| |
Dimitri Fontaine, with slight changes by me
|
|
|
|
|
|
| |
The code failed to detect an out-of-memory failure.
Xi Wang
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AtEOXact_RelationCache() scanned the entire relation cache at the end of
any transaction that created a new relation or assigned a new relfilenode.
Thus, clients such as pg_restore had an O(N^2) performance problem that
would start to be noticeable after creating 10000 or so tables. Since
typically only a small number of relcache entries need any cleanup, we
can fix this by keeping a small list of their OIDs and doing hash_searches
for them. We fall back to the full-table scan if the list overflows.
Ideally, the maximum list length would be set at the point where N
hash_searches would cost just less than the full-table scan. Some quick
experimentation says that point might be around 50-100; I (tgl)
conservatively set MAX_EOXACT_LIST = 32. For the case that we're worried
about here, which is short single-statement transactions, it's unlikely
there would ever be more than about a dozen list entries anyway; so it's
probably not worth being too tense about the value.
We could avoid the hash_searches by instead keeping the target relcache
entries linked into a list, but that would be noticeably more complicated
and bug-prone because of the need to maintain such a list in the face of
relcache entry drops. Since a relcache entry can only need such cleanup
after a somewhat-heavyweight filesystem operation, trying to save a
hash_search per cleanup doesn't seem very useful anyway --- it's the scan
over all the not-needing-cleanup entries that we wish to avoid here.
Jeff Janes, reviewed and tweaked a bit by Tom Lane
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SPI_execute() and related functions create a CachedPlan, execute it once,
and immediately discard it, so that the functionality offered by
plancache.c is of no value in this code path. And performance measurements
show that the extra data copying and invalidation checking done by
plancache.c slows down simple queries by 10% or more compared to 9.1.
However, enough of the SPI code is shared with functions that do need plan
caching that it seems impractical to bypass plancache.c altogether.
Instead, let's invent a variant version of cached plans that preserves
99% of the API but doesn't offer any of the actual functionality, nor the
overhead. This puts SPI_execute() performance back on par, or maybe even
slightly better, than it was before. This change should resolve recent
complaints of performance degradation from Dong Ye, Pavel Stehule, and
others.
By avoiding data copying, this change also reduces the amount of memory
needed to execute many-statement SPI_execute() strings, as for instance in
a recent complaint from Tomas Vondra.
An additional benefit of this change is that multi-statement SPI_execute()
query strings are now processed fully serially, that is we complete
execution of earlier statements before running parse analysis and planning
on following ones. This eliminates a long-standing POLA violation, in that
DDL that affects the behavior of a later statement will now behave as
expected.
Back-patch to 9.2, since this was a performance regression compared to 9.1.
(In 9.2, place the added struct fields so as to avoid changing the offsets
of existing fields.)
Heikki Linnakangas and Tom Lane
|
|
|
|
|
| |
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
|
|
|
|
|
|
|
| |
Teach RelationCacheInvalidate() to keep rd_newRelfilenodeSubid across rel cache
message overflows, so that behaviour is now fully deterministic.
Noah Misch
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During crash recovery, we remove disk files belonging to temporary tables,
but the system catalog entries for such tables are intentionally not
cleaned up right away. Instead, the first backend that uses a temp schema
is expected to clean out any leftover objects therein. This approach
requires that we be careful to ignore leftover temp tables (since any
actual access attempt would fail), *even if their BackendId matches our
session*, if we have not yet established use of the session's corresponding
temp schema. That worked fine in the past, but was broken by commit
debcec7dc31a992703911a9953e299c8d730c778 which incorrectly removed the
rd_islocaltemp relcache flag. Put it back, and undo various changes
that substituted tests like "rel->rd_backend == MyBackendId" for use
of a state-aware flag. Per trouble report from Heikki Linnakangas.
Back-patch to 9.1 where the erroneous change was made. In the back
branches, be careful to add rd_islocaltemp in a spot in the struct that
was alignment padding before, so as not to break existing add-on code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation. The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY. This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions. Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.
To fix, add an additional boolean column "indislive" to pg_index, so that
the freshly-created and about-to-die states can be distinguished. (This
change obviously is only possible in HEAD. This patch will need to be
back-patched, but in 9.2 we'll use a kluge consisting of overloading the
formerly-impossible state of indisvalid = true and indisready = false.)
In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update. The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption. This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.
In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate. These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.
Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation. Previously we
could have been left with stale values of some fields in an index relcache
entry. It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.
In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.
This will need to be back-patched, but in a noticeably different form,
so I'm committing it to HEAD before working on the back-patch.
Problem reported by Amit Kapila, diagnosis by Pavan Deolassee,
fix by Tom Lane and Andres Freund.
|
|
|
|
|
| |
This is necessary (but not sufficient) to have them compilable outside
of a backend environment.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Files opened with BasicOpenFile or PathNameOpenFile are not automatically
cleaned up on error. That puts unnecessary burden on callers that only want
to keep the file open for a short time. There is AllocateFile, but that
returns a buffered FILE * stream, which in many cases is not the nicest API
to work with. So add function called OpenTransientFile, which returns a
unbuffered fd that's cleaned up like the FILE* returned by AllocateFile().
This plugs a few rare fd leaks in error cases:
1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile
2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't
use OpenTransientFile here because the fd is supposed to persist over
transaction boundaries.
3. lo_import/lo_export - fixed by using OpenTransientFile instead of
PathNameOpenFile.
In addition to plugging those leaks, this replaces many BasicOpenFile() calls
with OpenTransientFile() that were not leaking, because the code meticulously
closed the file on error. That wasn't strictly necessary, but IMHO it's good
for robustness.
The same leaks exist in older versions, but given the rarity of the issues,
I'm not backpatching this. Not yet, anyway - it might be good to backpatch
later, after this mechanism has had some more testing in master branch.
|
|
|
|
|
|
| |
Remove useless duplicate initialization of bucket headers, don't use a
dlist_mutable_iter in a performance-critical path that doesn't need it,
make some other cosmetic changes for consistency's sake.
|
|
|
|
|
|
|
|
| |
dlist_delete, dlist_insert_after, dlist_insert_before, slist_insert_after
do not need access to the list header, and indeed insisting on that negates
one of the main advantages of a doubly-linked list.
In consequence, revert addition of "cache_bucket" field to CatCTup.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide a common implementation of embedded singly-linked and
doubly-linked lists. "Embedded" in the sense that the nodes'
next/previous pointers exist within some larger struct; this design
choice reduces memory allocation overhead.
Most of the implementation uses inlineable functions (where supported),
for performance.
Some existing uses of both types of lists have been converted to the new
code, for demonstration purposes. Other uses can (and probably will) be
converted in the future. Since dllist.c is unused after this conversion,
it has been removed.
Author: Andres Freund
Some tweaks by me
Reviewed by Tom Lane, Peter Geoghegan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change ensures that the planner will see implicit and explicit casts
as equivalent for all purposes, except in the minority of cases where
there's actually a semantic difference (as reflected by having a 3-argument
cast function). In particular, this fixes cases where the EquivalenceClass
machinery failed to consider two references to a varchar column as
equivalent if one was implicitly cast to text but the other was explicitly
cast to text, as seen in bug #7598 from Vaclav Juza. We have had similar
bugs before in other parts of the planner, so I think it's time to fix this
problem at the core instead of continuing to band-aid around it.
Remove set_coercionform_dontcare(), which represents the band-aid
previously in use for allowing matching of index and constraint expressions
with inconsistent cast labeling. (We can probably get rid of
COERCE_DONTCARE altogether, but I don't think removing that enum value in
back branches would be wise; it's possible there's third party code
referring to it.)
Back-patch to 9.2. We could go back further, and might want to once this
has been tested more; but for the moment I won't risk destabilizing plan
choices in long-since-stable branches.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reduces unnecessary exposure of other headers through htup.h, which
is very widely included by many files.
I have chosen to move the function prototypes to the new file as well,
because that means htup.h no longer needs to include tupdesc.h. In
itself this doesn't have much effect in indirect inclusion of tupdesc.h
throughout the tree, because it's also required by execnodes.h; but it's
something to explore in the future, and it seemed best to do the htup.h
change now while I'm busy with it.
|
|
|
|
|
| |
Instead, place a forward struct declaration for struct catclist in
syscache.h. This reduces header proliferation somewhat.
|
|
|
|
|
| |
This lets files that are mere users of ResourceOwner not automatically
include the headers for stuff that is managed by the resowner mechanism.
|
|
|
|
| |
Bug spotted by Jeff Davis using -DCLOBBER_CACHE_ALWAYS.
|
|
|
|
| |
Spotted by Jeff Davis.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 3855968f328918b6cd1401dd11d109d471a54d40 added syntax, pg_dump,
psql support, and documentation, but the triggers didn't actually fire.
With this commit, they now do. This is still a pretty basic facility
overall because event triggers do not get a whole lot of information
about what the user is trying to do unless you write them in C; and
there's still no option to fire them anywhere except at the very
beginning of the execution sequence, but it's better than nothing,
and a good building block for future work.
Along the way, add a regression test for ALTER LARGE OBJECT, since
testing of event triggers reveals that we haven't got one.
Dimitri Fontaine and Robert Haas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They don't actually do anything yet; that will get fixed in a
follow-on commit. But this gets the basic infrastructure in place,
including CREATE/ALTER/DROP EVENT TRIGGER; support for COMMENT,
SECURITY LABEL, and ALTER EXTENSION .. ADD/DROP EVENT TRIGGER;
pg_dump and psql support; and documentation for the anticipated
initial feature set.
Dimitri Fontaine, with review and a bunch of additional hacking by me.
Thom Brown extensively reviewed earlier versions of this patch set,
but there's not a whole lot of that code left in this commit, as it
turns out.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When (re) loading the typcache comparison cache for an enum type's values,
use an up-to-date MVCC snapshot, not the transaction's existing snapshot.
This avoids problems if we encounter an enum OID that was created since our
transaction started. Per report from Andres Freund and diagnosis by Robert
Haas.
To ensure this is safe even if enum comparison manages to get invoked
before we've set a transaction snapshot, tweak GetLatestSnapshot to
redirect to GetTransactionSnapshot instead of throwing error when
FirstSnapshotSet is false. The existing uses of GetLatestSnapshot (in
ri_triggers.c) don't care since they couldn't be invoked except in a
transaction that's already done some work --- but it seems just conceivable
that this might not be true of enums, especially if we ever choose to use
enums in system catalogs.
Note that the comparable coding in enum_endpoint and enum_range_internal
remains GetTransactionSnapshot; this is perhaps debatable, but if we
changed it those functions would have to be marked volatile, which doesn't
seem attractive.
Back-patch to 9.1 where ALTER TYPE ADD VALUE was added.
|
|
|
|
|
|
|
|
| |
By my count there are 18 callers of CacheRegisterSyscacheCallback in the
core code in HEAD, so we are potentially leaving as few as 2 slots for any
add-on code to use (though possibly not all these callers would actually
activate in any particular session). That doesn't seem like a lot of
headroom, so let's pump it up a little.
|
|
|
|
|
|
|
| |
This may have been important at some point in the past, but it no
longer does anything useful.
Review by Tom Lane.
|
|
|
|
| |
commit-fest.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original syntax wasn't universally loved, and it didn't allow its
usage in CREATE TABLE, only ALTER TABLE. It now works everywhere, and
it also allows using ALTER TABLE ONLY to add an uninherited CHECK
constraint, per discussion.
The pg_constraint column has accordingly been renamed connoinherit.
This commit partly reverts some of the changes in
61d81bd28dbec65a6b144e0cd3d0bfe25913c3ac, particularly some pg_dump and
psql bits, because now pg_get_constraintdef includes the necessary NO
INHERIT within the constraint definition.
Author: Nikhil Sontakke
Some tweaks by me
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This restores the pre-9.0 situation that it's possible to add new indexes
on pg_class and other mapped-but-not-shared catalogs, so long as you broke
the glass and flipped the big red Dont-Touch-Me switch. As before, there
are a lot of gotchas, and you'd have to be pretty desperate to try this
on a production database; but there doesn't seem to be a reason for
relmapper.c to be preventing such things all by itself. Per
experimentation with a case suggested by Cody Cutrer.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Making this operation look like a utility statement seems generally a good
idea, and particularly so in light of the desire to provide command
triggers for utility statements. The original choice of representing it as
SELECT with an IntoClause appendage had metastasized into rather a lot of
places, unfortunately, so that this patch is a great deal more complicated
than one might at first expect.
In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS
subcommands required restructuring some EXPLAIN-related APIs. Add-on code
that calls ExplainOnePlan or ExplainOneUtility, or uses
ExplainOneQuery_hook, will need adjustment.
Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO,
which formerly were accepted though undocumented, are no longer accepted.
The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE.
The CREATE RULE case doesn't seem to have much real-world use (since the
rule would work only once before failing with "table already exists"),
so we'll not bother with that one.
Both SELECT INTO and CREATE TABLE AS still return a command tag of
"SELECT nnnn". There was some discussion of returning "CREATE TABLE nnnn",
but for the moment backwards compatibility wins the day.
Andres Freund and Tom Lane
|
|
|
|
|
|
|
|
|
|
|
| |
Now that cache invalidation callbacks get only a hash value, and not a
tuple TID (per commits 632ae6829f7abda34e15082c91d9dfb3fc0f298b and
b5282aa893e565b7844f8237462cb843438cdd5e), the only way they can restrict
what they invalidate is to know what the hash values mean. setrefs.c was
doing this via a hard-wired assumption but that seems pretty grotty, and
it'll only get worse as more cases come up. So let's expose a calculation
function that takes the same parameters as SearchSysCache. Per complaint
from Marko Kreen.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't normally allow quals to be pushed down into a view created
with the security_barrier option, but functions without side effects
are an exception: they're OK. This allows much better performance in
common cases, such as when using an equality operator (that might
even be indexable).
There is an outstanding issue here with the CREATE FUNCTION / ALTER
FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the
leakproof flag. But I'm committing this as-is so that it doesn't
have to be rebased again; we can fix up the grammar in a future
commit.
KaiGai Kohei, with some wordsmithing by me.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When default_text_search_config, default_tablespace, or temp_tablespaces
setting is set per-user or per-database, with an "ALTER USER/DATABASE SET
..." statement, don't throw an error if the text search configuration or
tablespace does not exist. In case of text search configuration, even if
it doesn't exist in the current database, it might exist in another
database, where the setting is intended to have its effect. This behavior
is now the same as search_path's.
Tablespaces are cluster-wide, so the same argument doesn't hold for
tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER
SET ..." statements before the "CREATE TABLESPACE" statements. Arguably
that's pg_dumpall's fault - it should dump the statements in such an order
that the tablespace is created first and then the "ALTER USER SET
default_tablespace ..." statements after that - but it seems better to be
consistent with search_path and default_text_search_config anyway. Besides,
you could still create a dump that throws an error, by creating the
tablespace, running "ALTER USER SET default_tablespace", then dropping the
tablespace and running pg_dumpall on that.
Backpatch to all supported versions.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally, accessing variable-length members of catalog structures past
the first one doesn't work at all. Here, it happened to work because
indnatts was checked to be 1, and so the defined FormData_pg_index
layout, using int2vector[1] and oidvector[1] for variable-length
arrays, happened to match the actual memory layout. But it's a very
fragile assumption, and it's not in a performance-critical path, so
code it properly using heap_getattr() instead.
bug analysis by Tom Lane
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a view is marked as a security barrier, it will not be pulled up
into the containing query, and no quals will be pushed down into it,
so that no function or operator chosen by the user can be applied to
rows not exposed by the view. Views not configured with this
option cannot provide robust row-level security, but will perform far
better.
Patch by KaiGai Kohei; original problem report by Heikki Linnakangas
(in October 2009!). Review (in earlier versions) by Noah Misch and
others. Design advice by Tom Lane and myself. Further review and
cleanup by me.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes them enforceable only on the parent table, not on children
tables. This is useful in various situations, per discussion involving
people bitten by the restrictive behavior introduced in 8.4.
Message-Id:
8762mp93iw.fsf@comcast.net
CAFaPBrSMMpubkGf4zcRL_YL-AERUbYF_-ZNNYfb3CVwwEqc9TQ@mail.gmail.com
Authors: Nikhil Sontakke, Alex Hunsaker
Reviewed by Robert Haas and myself
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch creates an API whereby a btree index opclass can optionally
provide non-SQL-callable support functions for sorting. In the initial
patch, we only use this to provide a directly-callable comparator function,
which can be invoked with a bit less overhead than the traditional
SQL-callable comparator. While that should be of value in itself, the real
reason for doing this is to provide a datatype-extensible framework for
more aggressive optimizations, as in Peter Geoghegan's recent work.
Robert Haas and Tom Lane
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since record[] uses array_in, it needs to have its element type passed
as typioparam. In HEAD and 9.1, this fix essentially reverts commit
9bc933b2125a5358722490acbc50889887bf7680, which was a hack that is no
longer needed since domains don't set their typelem anymore. Before
that, adjust the logic so that only domains are excluded from being
treated like arrays, rather than assuming that only base types should
be included. Add a regression test to demonstrate the need for this.
Per report from Maxim Boguk.
Back-patch to 8.4, where type record[] was added.
|
|
|
|
|
| |
Fix some bugs in coercion logic and pg_dump; more comment cleanup;
minor cosmetic improvements.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the responsibility for caching specialized information about range
types into the type cache, so that the catalog lookups only have to occur
once per session. Rearrange APIs a bit so that fn_extra caching is
actually effective in the GiST support code. (Use of OidFunctionCallN is
bad enough for performance in itself, but it also prevents the function
from exploiting fn_extra caching.)
The range I/O functions are still not very bright about caching repeated
lookups, but that seems like material for a separate patch.
Also, avoid unnecessary use of memcpy to fetch/store the range type OID and
flags, and don't use the full range_deserialize machinery when all we need
to see is the flags value.
Also fix API error in range_gist_penalty --- it was failing to set *penalty
for any case involving an empty range.
|
|
|
|
|
|
|
| |
Selectivity estimation functions are missing for some range type operators,
which is a TODO.
Jeff Davis
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a tuple in a syscache contains an out-of-line toasted field, and we
try to fetch that field shortly after some other transaction has committed
an update or deletion of the tuple, there is a race condition: vacuum
could come along and remove the toast tuples before we can fetch them.
This leads to transient failures like "missing chunk number 0 for toast
value NNNNN in pg_toast_2619", as seen in recent reports from Andrew
Hammond and Tim Uckun.
The design idea of syscache is that access to stale syscache entries
should be prevented by relation-level locks, but that fails for at least
two cases where toasted fields are possible: ANALYZE updates pg_statistic
rows without locking out sessions that might want to plan queries on the
same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without
any meaningful lock at all.
The least risky fix seems to be an idea that Heikki suggested when we
were dealing with a related problem back in August: forcibly detoast any
out-of-line fields before putting a tuple into syscache in the first place.
This avoids the problem because at the time we fetch the parent tuple from
the catalog, we should be holding an MVCC snapshot that will prevent
removal of the toast tuples, even if the parent tuple is outdated
immediately after we fetch it. (Note: I'm not convinced that this
statement holds true at every instant where we could be fetching a syscache
entry at all, but it does appear to hold true at the times where we could
fetch an entry that could have a toasted field. We will need to be a bit
wary of adding toast tables to low-level catalogs that don't have them
already.) An additional benefit is that subsequent uses of the syscache
entry should be faster, since they won't have to detoast the field.
Back-patch to all supported versions. The problem is significantly harder
to reproduce in pre-9.0 releases, because of their willingness to flush
every entry in a syscache whenever the underlying catalog is vacuumed
(cf CatalogCacheFlushRelation); but there is still a window for trouble.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a column pg_class.relallvisible to remember the number of pages that
were all-visible according to the visibility map as of the last VACUUM
(or ANALYZE, or some other operations that update pg_class.relpages).
Use relallvisible/relpages, instead of an arbitrary constant, to estimate
how many heap page fetches can be avoided during an index-only scan.
This is pretty primitive and will no doubt see refinements once we've
acquired more field experience with the index-only scan mechanism, but
it's way better than using a constant.
Note: I had to adjust an underspecified query in the window.sql regression
test, because it was changing answers when the plan changed to use an
index-only scan. Some of the adjacent tests perhaps should be adjusted
as well, but I didn't do that here.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a btree index contains all columns required by the query, and the
visibility map shows that all tuples on a target heap page are
visible-to-all, we don't need to fetch that heap page. This patch depends
on the previous patches that made the visibility map reliable.
There's a fair amount left to do here, notably trying to figure out a less
chintzy way of estimating the cost of an index-only scan, but the core
functionality seems ready to commit.
Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
|