| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's been true for a long time that we expect names of table and domain
constraints to be unique among the constraints of that table or domain.
However, the enforcement of that has been pretty haphazard, and it missed
some corner cases such as creating a CHECK constraint and then an index
constraint of the same name (as per recent report from André Hänsel).
Also, due to the lack of an actual unique index enforcing this, duplicates
could be created through race conditions.
Moreover, the code that searches pg_constraint has been quite inconsistent
about how to handle duplicate names if one did occur: some places checked
and threw errors if there was more than one match, while others just
processed the first match they came to.
To fix, create a unique index on (conrelid, contypid, conname). Since
either conrelid or contypid is zero, this will separately enforce
uniqueness of constraint names among constraints of any one table and any
one domain. (If we ever implement SQL assertions, and put them into this
catalog, more thought might be needed. But it'd be at least as reasonable
to put them into a new catalog; having overloaded this one catalog with
two kinds of constraints was a mistake already IMO.) This index can replace
the existing non-unique index on conrelid, though we need to keep the one
on contypid for query performance reasons.
Having done that, we can simplify the logic in various places that either
coped with duplicates or neglected to, as well as potentially improve
lookup performance when searching for a constraint by name.
Also, as per our usual practice, install a preliminary check so that you
get something more friendly than a unique-index violation report in the
case complained of by André. And teach ChooseIndexName to avoid choosing
autogenerated names that would draw such a failure.
While it's not possible to make such a change in the back branches,
it doesn't seem quite too late to put this into v11, so do so.
Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 9da0cc35284, which introduced parallel CREATE INDEX, failed to
propagate relmapper.c backend local cache state to parallel worker
processes. This could result in parallel index builds against mapped
catalog relations where the leader process (participating as a worker)
scans the new, pristine relfilenode, while worker processes scan the
obsolescent relfilenode. When this happened, the final index structure
was typically not consistent with the owning table's structure. The
final index structure could contain entries formed from both heap
relfilenodes. Only rebuilds on mapped catalog relations that occur as
part of a VACUUM FULL or CLUSTER could become corrupt in practice, since
their mapped relation relfilenode swap is what allows the inconsistency
to arise.
On master, fix the problem by propagating the required relmapper.c
backend state as part of standard parallel initialization (Cf. commit
29d58fd3). On v11, simply disallow builds against mapped catalog
relations by deeming them parallel unsafe.
Author: Peter Geoghegan
Reported-By: "death lock"
Reviewed-By: Tom Lane, Amit Kapila
Bug: #15309
Discussion: https://postgr.es/m/153329671686.1405.18298309097348420351@wrigleys.postgresql.org
Backpatch: 11-, where parallel CREATE INDEX was introduced.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Those would use the default ERRCODE_INTERNAL_ERROR, but for foreseeable
failures an errcode ought to be set, ERRCODE_DATA_CORRUPTED making the
most sense here.
While on the way, fix one errcode_for_file_access missing in origin.c
since the code has been created, and remove one assignment of errno to 0
before calling read(), as this was around to fit with what was present
before 811b6e36 where errno would not be set when not enough bytes are
read. I have noticed the first one, and Tom has pinged me about the
second one.
Author: Michael Paquier
Reported-by: Tom Lane
Discussion: https://postgr.es/m/27265.1531925836@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some error messages related to file handling are using the code path
context to define their state. For example, 2PC-related errors are
referring to "two-phase status files", or "relation mapping file" is
used for catalog-to-filenode mapping, however those prove to be
difficult to translate, and are not more helpful than just referring to
the path of the file being worked on. So simplify all those error
messages by just referring to files with their path used. In some
cases, like the manipulation of WAL segments, the context is actually
helpful so those are kept.
Calls to the system function read() have also been rather inconsistent
with their error handling sometimes not reporting the number of bytes
read, and some other code paths trying to use an errno which has not
been set. The in-core functions are using a more consistent pattern
with this patch, which checks for both errno if set or if an
inconsistent read is happening.
So as to care about pluralization when reading an unexpected number of
byte(s), "could not read: read %d of %zu" is used as error message, with
%d field being the output result of read() and %zu the expected size.
This simplifies the work of translators with less variations of the same
message.
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20180520000522.GB1603@paquier.xyz
|
|
|
|
|
|
|
| |
This allows overriding the choice of custom or generic plan.
Author: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRAGLaiEm8ur5DWEBo7qHRWTk9HxkuUAz00CZZtJj-LkCA%40mail.gmail.com
|
|
|
|
|
|
|
| |
It appears that there are more files, whose header comment paths are
wrong. So, fix those paths. No backpatching per proposal of Tom Lane.
Discussion: https://postgr.es/m/CAPpHfdsJyYbOj59MOQL%2B4XxdcomLSLfLqBtAvwR%2BpsCqj3ELdQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
Changed the name of few structure members for the sake of clarity and
removed spurious whitespace.
Reported-by: Amit Kapila
Author: Amit Kapila, based on suggestion by Andrew Dunstan
Reviewed-by: Alvaro Herrera
Discussion: https://postgr.es/m/CAA4eK1K2znsFpC+NQ9A4vxT4uDxADN4RmvHX0L6Y=aHVo9gB4Q@mail.gmail.com
|
|
|
|
|
| |
"compute_hash_value" is particularly gratuitously generic, but IMO
all of these ought to have names clearly related to partitioning.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous coding saved pointers into the partitioned table's relcache
entry, but then closed the relcache entry, causing those pointers to
nominally become dangling. Actual trouble would be seen in the field
only if a relcache flush occurred mid-query, but that's hardly out of
the question.
While we could fix this by copying all the data in question at query
start, it seems better to just hold the relcache entry open for the
whole query.
While at it, improve the handling of support-function lookups: do that
once per query not once per pruning test. There's still something to be
desired here, in that we fail to exploit the possibility of caching data
across queries in the fn_extra fields of the relcache's FmgrInfo structs,
which could happen if we just used those structs in-place rather than
copying them. However, combining that with the possibility of per-query
lookups of cross-type comparison functions seems to require changes in the
APIs of a lot of the pruning support functions, so it's too invasive to
consider as part of this patch. A win would ensue only for complex
partition key data types (e.g. arrays), so it may not be worth the
trouble.
David Rowley and Tom Lane
Discussion: https://postgr.es/m/17850.1528755844@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When vacuum processes a relation it uses the corresponding relcache
entry's relfrozenxid / relminmxid as a cutoff for when to remove
tuples etc. Unfortunately for nailed relations (i.e. critical system
catalogs) bugs could frequently lead to the corresponding relcache
entry being stale.
This set of bugs could cause actual data corruption as vacuum would
potentially not remove the correct row versions, potentially reviving
them at a later point. After 699bf7d05c some corruptions in this vein
were prevented, but the additional error checks could also trigger
spuriously. Examples of such errors are:
ERROR: found xmin ... from before relfrozenxid ...
and
ERROR: found multixact ... from before relminmxid ...
To be caused by this bug the errors have to occur on system catalog
tables.
The two bugs are:
1) Invalidations for nailed relations were ignored, based on the
theory that the relcache entry for such tables doesn't
change. Which is largely true, except for fields like relfrozenxid
etc. This means that changes to relations vacuumed in other
sessions weren't picked up by already existing sessions. Luckily
autovacuum doesn't have particularly longrunning sessions.
2) For shared *and* nailed relations, the shared relcache init file
was never invalidated while running. That means that for such
tables (e.g. pg_authid, pg_database) it's not just already existing
sessions that are affected, but even new connections are as well.
That explains why the reports usually were about pg_authid et. al.
To fix 1), revalidate the rd_rel portion of a relcache entry when
invalid. This implies a bit of extra complexity to deal with
bootstrapping, but it's not too bad. The fix for 2) is simpler,
simply always remove both the shared and local init files.
Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion:
https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de
https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com
https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com
https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com
Backpatch: 9.3-
|
|
|
|
| |
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's been a massive addition of partitioning code in PostgreSQL 11,
with little oversight on its placement, resulting in a
catalog/partition.c with poorly defined boundaries and responsibilities.
This commit tries to set a couple of distinct modules to separate things
a little bit. There are no code changes here, only code movement.
There are three new files:
src/backend/utils/cache/partcache.c
src/include/partitioning/partdefs.h
src/include/utils/partcache.h
The previous arrangement of #including catalog/partition.h almost
everywhere is no more.
Authors: Amit Langote and Álvaro Herrera
Discussion: https://postgr.es/m/98e8d509-790a-128c-be7f-e48a5b2d8d97@lab.ntt.co.jp
https://postgr.es/m/11aa0c50-316b-18bb-722d-c23814f39059@lab.ntt.co.jp
https://postgr.es/m/143ed9a4-6038-76d4-9a55-502035815e68@lab.ntt.co.jp
https://postgr.es/m/20180413193503.nynq7bnmgh6vs5vm@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were using CurrentMemoryContext to put the partsupfunc fmgr_info
into, which isn't right, because we want the PartitionKey as a whole to
be in the isolated Relation->rd_partkeycxt context. This can cause a
crash with user-defined support functions in the operator classes used
by partitioning keys. (Maybe this can cause problems with core-supplied
opclasses too, not sure.)
This is demonstrably broken in Postgres 10, too, but the initial
proposed fix runs afoul of a problem discussed back when 8a0596cb656e
("Get rid of copy_partition_key") reorganized that code: namely that it
is possible to jump out of RelationBuildPartitionKey because of some
error and leave a dangling memory context child of CacheMemoryContext.
Also, while reviewing this I noticed that the removed-in-pg11
copy_partition_key was doing something wrong, unfixed in pg10, namely
doing memcpy() on the FmgrInfo, which is bogus (should be doing
fmgr_info_copy). Therefore, in branch pg10, the sane fix seems to be to
backpatch both the aforementioned 8a0596cb656e and its followup
be2343221fb7 ("Protect against hypothetical memory leaks in
RelationGetPartitionKey"), so do that, then apply the fmgr_info memcxt
bugfix on top.
Add a test case exercising btree-based custom operator classes, which
causes a crash prior to this fix. This is not a security problem,
because in order to create an operator class you need superuser
privileges anyway.
Authors: Álvaro Herrera and Amit Langote
Reported and diagnosed by: Amit Langote
Discussion: https://postgr.es/m/3041e853-b1dd-a0c6-ff21-7cc5633bffd0@lab.ntt.co.jp
|
|
|
|
|
|
|
|
|
|
|
| |
- Explicitly forbids opclass, collation and indoptions (like DESC/ASC etc) for
including columns. Throw an error if user points that.
- Truncated storage arrays for such attributes to store only key atrributes,
added assertion checks.
- Do not check opfamily and collation for including columns in
CompareIndexInfo()
Discussion: https://www.postgresql.org/message-id/5ee72852-3c4e-ee35-e2ed-c1d053d45c08@sigaev.ru
|
|
|
|
|
|
|
|
| |
Rename ii_KeyAttrNumbers to ii_IndexAttrNumbers to prevent confusion with
ii_NumIndexAttrs/ii_NumIndexKeyAttrs. ii_IndexAttrNumbers contains
all attributes including "including" columns, not only key attribute.
Discussion: https://www.postgresql.org/message-id/13123421-1d52-d0e4-c95c-6d69011e0595%40sigaev.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces INCLUDE clause to index definition. This clause
specifies a list of columns which will be included as a non-key part in
the index. The INCLUDE columns exist solely to allow more queries to
benefit from index-only scans. Also, such columns don't need to have
appropriate operator classes. Expressions are not supported as INCLUDE
columns since they cannot be used in index-only scans.
Index access methods supporting INCLUDE are indicated by amcaninclude flag
in IndexAmRoutine. For now, only B-tree indexes support INCLUDE clause.
In B-tree indexes INCLUDE columns are truncated from pivot index tuples
(tuples located in non-leaf pages and high keys). Therefore, B-tree indexes
now might have variable number of attributes. This patch also provides
generic facility to support that: pivot tuples contain number of their
attributes in t_tid.ip_posid. Free 13th bit of t_info is used for indicating
that. This facility will simplify further support of index suffix truncation.
The changes of above are backward-compatible, pg_upgrade doesn't need special
handling of B-tree indexes for that.
Bump catalog version
Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me
Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes,
David Rowley, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update the built-in logical replication system to make use of the
previously added logical decoding for TRUNCATE support. Add the
required truncate callback to pgoutput and a new logical replication
protocol message.
Publications get a new attribute to determine whether to replicate
truncate actions. When updating a publication via pg_dump from an older
version, this is not set, thus preserving the previous behavior.
Author: Simon Riggs <simon@2ndquadrant.com>
Author: Marco Nenciarini <marco.nenciarini@2ndquadrant.it>
Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
|
|
|
|
|
|
| |
MemoryContextCopySetIdentifier -> MemoryContextCopyAndSetIdentifier
Discussion: https://www.postgresql.org/message-id/6421.1522194949@sss.pgh.pa.us
|
|
|
|
|
|
|
|
| |
Reported-by: Michael Paquier, Abhijit Menon-Sen
Discussion: https://postgr.es/m/20180305045854.GB2266@paquier.xyz
Author: Michael Paquier, Abhijit Menon-Sen, me
|
|
|
|
| |
Arthur Zakirov, confirmed by Thomas Munro
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently adding a column to a table with a non-NULL default results in
a rewrite of the table. For large tables this can be both expensive and
disruptive. This patch removes the need for the rewrite as long as the
default value is not volatile. The default expression is evaluated at
the time of the ALTER TABLE and the result stored in a new column
(attmissingval) in pg_attribute, and a new column (atthasmissing) is set
to true. Any existing row when fetched will be supplied with the
attmissingval. New rows will have the supplied value or the default and
so will never need the attmissingval.
Any time the table is rewritten all the atthasmissing and attmissingval
settings for the attributes are cleared, as they are no longer needed.
The most visible code change from this is in heap_attisnull, which
acquires a third TupleDesc argument, allowing it to detect a missing
value if there is one. In many cases where it is known that there will
not be any (e.g. catalog relations) NULL can be passed for this
argument.
Andrew Dunstan, heavily modified from an original patch from Serge
Rielau.
Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley.
Discussion: https://postgr.es/m/31e2e921-7002-4c27-59f5-51f08404c858@2ndQuadrant.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b1 rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident". The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to. The "ident" string, if
wanted, is supplied later. This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.
The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there. And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.
All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead. This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other. Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.
I also arranged for plancache contexts to use the query source string
as their identifier. This is basically free for CachedPlanSources, as
they contained a copy of that string already. We pay an extra pstrdup
to do it for CachedPlans. That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first). I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.
This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format. This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.
The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b1. Seems
better for extension authors to endure just one round of API changes
not two.
Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the value of an index expression is unchanged after UPDATE,
allow HOT updates where previously we disallowed them, giving
a significant performance boost in those cases.
Particularly useful for indexes such as JSON->>field where the
JSON value changes but the indexed value does not.
Submitted as "surjective indexes" patch, now enabled by use
of new "recheck_on_update" parameter.
Author: Konstantin Knizhnik
Reviewer: Simon Riggs, with much wordsmithing and some cleanup
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One of the things canonicalize_qual() does is to remove constant-NULL
subexpressions of top-level AND/OR clauses. It does that on the assumption
that what it's given is a top-level WHERE clause, so that NULL can be
treated like FALSE. Although this is documented down inside a subroutine
of canonicalize_qual(), it wasn't mentioned in the documentation of that
function itself, and some callers hadn't gotten that memo.
Notably, commit d007a9505 caused get_relation_constraints() to apply
canonicalize_qual() to CHECK constraints. That allowed constraint
exclusion to misoptimize situations in which a CHECK constraint had a
provably-NULL subclause, as seen in the regression test case added here,
in which a child table that should be scanned is not. (Although this
thinko is ancient, the test case doesn't fail before 9.2, for reasons
I've not bothered to track down in detail. There may be related cases
that do fail before that.)
More recently, commit f0e44751d added an independent bug by applying
canonicalize_qual() to index expressions, which is even sillier since
those might not even be boolean. If they are, though, I think this
could lead to making incorrect index entries for affected index
expressions in v10. I haven't attempted to prove that though.
To fix, add an "is_check" parameter to canonicalize_qual() to specify
whether it should assume WHERE or CHECK semantics, and make it perform
NULL-elimination accordingly. Adjust the callers to apply the right
semantics, or remove the call entirely in cases where it's not known
that the expression has one or the other semantics. I also removed
the call in some cases involving partition expressions, where it should
be a no-op because such expressions should be canonical already ...
and was a no-op, independently of whether it could in principle have
done something, because it was being handed the qual in implicit-AND
format which isn't what it expects. In HEAD, add an Assert to catch
that type of mistake in future.
This represents an API break for external callers of canonicalize_qual().
While that's intentional in HEAD to make such callers think about which
case applies to them, it seems like something we probably wouldn't be
thanked for in released branches. Hence, in released branches, the
extra parameter is added to a new function canonicalize_qual_ext(),
and canonicalize_qual() is a wrapper that retains its old behavior.
Patch by me with suggestions from Dean Rasheed. Back-patch to all
supported branches.
Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
| |
The new column distinguishes normal functions, procedures, aggregates,
and window functions. This replaces the existing columns proisagg and
proiswindow, and replaces the convention that procedures are indicated
by prorettype == 0. Also change prorettype to be VOIDOID for procedures.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
|
|
|
|
|
|
|
|
|
|
|
|
| |
formrdesc's comment listed the specific catalogs it is called for,
but the list was out of date. Rather than jumping back onto that
maintenance treadmill, let's just remove the list. It tells the
reader nothing that can't be learned quickly and more reliably by
searching relcache.c for callers of formrdesc().
Oversight noted by Kyotaro Horiguchi.
Discussion: https://postgr.es/m/20180214.105314.138966434.horiguchi.kyotaro@lab.ntt.co.jp
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Formerly, DTYPE_REC was used only for variables declared as "record";
variables of named composite types used DTYPE_ROW, which is faster for
some purposes but much less flexible. In particular, the ROW code paths
are entirely incapable of dealing with DDL-caused changes to the number
or data types of the columns of a row variable, once a particular plpgsql
function has been parsed for the first time in a session. And, since the
stored representation of a ROW isn't a tuple, there wasn't any easy way
to deal with variables of domain-over-composite types, since the domain
constraint checking code would expect the value to be checked to be a
tuple. A lesser, but still real, annoyance is that ROW format cannot
represent a true NULL composite value, only a row of per-field NULL
values, which is not exactly the same thing.
Hence, switch to using DTYPE_REC for all composite-typed variables,
whether "record", named composite type, or domain over named composite
type. DTYPE_ROW remains but is used only for its native purpose, to
represent a fixed-at-compile-time list of variables, for instance the
targets of an INTO clause.
To accomplish this without taking significant performance losses, introduce
infrastructure that allows storing composite-type variables as "expanded
objects", similar to the "expanded array" infrastructure introduced in
commit 1dc5ebc90. A composite variable's value is thereby kept (most of
the time) in the form of separate Datums, so that field accesses and
updates are not much more expensive than they were in the ROW format.
This holds the line, more or less, on performance of variables of named
composite types in field-access-intensive microbenchmarks, and makes
variables declared "record" perform much better than before in similar
tests. In addition, the logic involved with enforcing composite-domain
constraints against updates of individual fields is in the expanded
record infrastructure not plpgsql proper, so that it might be reusable
for other purposes.
In further support of this, introduce a typcache feature for assigning a
unique-within-process identifier to each distinct tuple descriptor of
interest; in particular, DDL alterations on composite types result in a new
identifier for that type. This allows very cheap detection of the need to
refresh tupdesc-dependent data. This improves on the "tupDescSeqNo" idea
I had in commit 687f096ea: that assigned identifying sequence numbers to
successive versions of individual composite types, but the numbers were not
unique across different types, nor was there support for assigning numbers
to registered record types.
In passing, allow plpgsql functions to accept as well as return type
"record". There was no good reason for the old restriction, and it
was out of step with most of the other PLs.
Tom Lane, reviewed by Pavel Stehule
Discussion: https://postgr.es/m/8962.1514399547@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
| |
The modern way is to use a missing_ok argument instead of two separate
almost-identical routines, so do that.
Author: Michaël Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20180201063212.GE6398@paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Failure to advance the list pointer while reading partition expressions
from a list results in invoking an input function with inappropriate data,
possibly leading to crashes or, with carefully crafted input, disclosure
of arbitrary backend memory.
Bug discovered independently by Álvaro Herrera and David Rowley.
This patch is by Álvaro but owes something to David's proposed fix.
Back-patch to v10 where the issue was introduced.
Security: CVE-2018-1052
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's never any value in giving a fully specified cache key to
SearchCatCacheList: you might as well call SearchCatCache instead,
since there could be only one match. So the maximum useful number of
key arguments is one less than the supported number of key columns.
We might as well remove the useless extra argument and save some few
bytes per call site, as well as a cycle or so per call.
I believe the reason it was coded like this is that originally, callers
had to write out all the dummy arguments in each call, and so it seemed
less confusing if SearchCatCache and SearchCatCacheList took the same
number of key arguments. But since commit e26c539e9, callers only write
their live arguments explicitly, making that a non-factor; and there's
surely been enough time for third-party modules to adapt to that coding
style. So this is only an ABI break not an API break for callers.
Per discussion with Oliver Ford, this might also make it less confusing
how to use SearchCatCacheList correctly.
Discussion: https://postgr.es/m/27788.1517069693@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
|
|
|
|
| |
"c.f." should be "cf.".
|
|
|
|
|
| |
Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/d8jefpk4jtd.fsf@dalvik.ping.uio.no
|
|
|
|
| |
Backpatch-through: certain files through 9.3
|
|
|
|
|
|
|
| |
Also, fix a comment that commit 8a0596cb656e made obsolete.
Reported-by: Robert Haas
Discussion: http://postgr.es/m/CA+TgmoYbpuUUUp2GhYNwWm0qkah39spiU7uOiNXLz20ASfKYoA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
That function currently exists to avoid leaking memory in
CacheMemoryContext in case of trouble while the partition key is being
built, but there's a better way: allocate everything in a memcxt that
goes away if the current (sub)transaction fails, and once the partition
key is built and no further errors can occur, make the memcxt permanent
by making it a child of CacheMemoryContext.
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20171027172730.eh2domlkpn4ja62m@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts. The key ideas are:
* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.
* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).
* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.
The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations. That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.
Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant. Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)
The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.
To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option. AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.
An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.
Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module. That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.
In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles. The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.
Also, mark the memory context method-pointer structs "const",
just for cleanliness.
Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a couple of low-probability bugs that could lead to
reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE)
concerning directory-open or file-open failures. It also fixes places
where we took shortcuts in reporting such errors, either by using elog
instead of ereport or by using ereport but forgetting to specify an
errcode. And it eliminates a lot of just plain redundant error-handling
code.
In service of all this, export fd.c's formerly-static function
ReadDirExtended, so that external callers can make use of the coding
pattern
dir = AllocateDir(path);
while ((de = ReadDirExtended(dir, path, LOG)) != NULL)
if they'd like to treat directory-open failures as mere LOG conditions
rather than errors. Also fix FreeDir to be a no-op if we reach it
with dir == NULL, as such a coding pattern would cause.
Then, remove code at many call sites that was throwing an error or log
message for AllocateDir failure, as ReadDir or ReadDirExtended can handle
that job just fine. Aside from being a net code savings, this gets rid of
a lot of not-quite-up-to-snuff reports, as mentioned above. (In some
places these changes result in replacing a custom error message such as
"could not open tablespace directory" with more generic wording "could not
open directory", but it was agreed that the custom wording buys little as
long as we report the directory name.) In some other call sites where we
can't just remove code, change the error reports to be fully
project-style-compliant.
Also reorder code in restoreTwoPhaseData that was acquiring a lock
between AllocateDir and ReadDir; in the unlikely but surely not
impossible case that LWLockAcquire changes errno, AllocateDir failures
would be misreported. There is no great value in opening the directory
before acquiring TwoPhaseStateLock, so just do it in the other order.
Also fix CheckXLogRemoved to guarantee that it preserves errno,
as quite a number of call sites are implicitly assuming. (Again,
it's unlikely but I think not impossible that errno could change
during a SpinLockAcquire. If so, this function was broken for its
own purposes as well as breaking callers.)
And change a few places that were using not-per-project-style messages,
such as "could not read directory" when "could not open directory" is
more correct.
Back-patch the exporting of ReadDirExtended, in case we have occasion
to back-patch some fix that makes use of it; it's not needed right now
but surely making it global is pretty harmless. Also back-patch the
restoreTwoPhaseData and CheckXLogRemoved fixes. The rest of this is
essentially cosmetic and need not get back-patched.
Michael Paquier, with a bit of additional work by me
Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new object type "procedure" that is similar to a function
but does not have a return type and is invoked by the new CALL statement
instead of SELECT or similar. This implementation is aligned with the
SQL standard and compatible with or similar to other SQL implementations.
This commit adds new commands CALL, CREATE/ALTER/DROP PROCEDURE, as well
as ALTER/DROP ROUTINE that can refer to either a function or a
procedure (or an aggregate function, as an extension to SQL). There is
also support for procedures in various utility commands such as COMMENT
and GRANT, as well as support in pg_dump and psql. Support for defining
procedures is available in all the languages supplied by the core
distribution.
While this commit is mainly syntax sugar around existing functionality,
future features will rely on having procedures as a separate object
type.
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix PL/Python so that it can handle domains over composite, and so that
it enforces domain constraints correctly in other cases that were not
always done properly before. Notably, it didn't do arrays of domains
right (oversight in commit c12d570fa), and it failed to enforce domain
constraints when returning a composite type containing a domain field,
and if a transform function is being used for a domain's base type then
it failed to enforce domain constraints on the result. Also, in many
places it missed checking domain constraints on null values, because
the plpy_typeio code simply wasn't called for Py_None.
Rather than try to band-aid these problems, I made a significant
refactoring of the plpy_typeio logic. The existing design of recursing
for array and composite members is extended to also treat domains as
containers requiring recursion, and the APIs for the module are cleaned
up and simplified.
The patch also modifies plpy_typeio to rely on the typcache more than
it did before (which was pretty much not at all). This reduces the
need for repetitive lookups, and lets us get rid of an ad-hoc scheme
for detecting changes in composite types. I added a couple of small
features to typcache to help with that.
Although some of this is fixing bugs that long predate v11, I don't
think we should risk a back-patch: it's a significant amount of code
churn, and there've been no complaints from the field about the bugs.
Tom Lane, reviewed by Anthony Bykov
Discussion: https://postgr.es/m/24449.1509393613@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hash partitioning is useful when you want to partition a growing data
set evenly. This can be useful to keep table sizes reasonable, which
makes maintenance operations such as VACUUM faster, or to enable
partition-wise join.
At present, we still depend on constraint exclusion for partitioning
pruning, and the shape of the partition constraints for hash
partitioning is such that that doesn't work. Work is underway to fix
that, which should both improve performance and make partitioning
pruning work with hash partitioning.
Amul Sul, reviewed and tested by Dilip Kumar, Ashutosh Bapat, Yugo
Nagata, Rajkumar Raghuwanshi, Jesper Pedersen, and by me. A few
final tweaks also by me.
Discussion: http://postgr.es/m/CAAJ_b96fhpJAP=ALbETmeLk1Uni_GFZD938zgenhF49qgDTjaQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lower case spellings are C and C++ standard and are used in most
parts of the PostgreSQL sources. The upper case spellings are only used
in some files/modules. So standardize on the standard spellings.
The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so
those are left as is when using those APIs.
In code comments, we use the lower-case spelling for the C concepts and
keep the upper-case spelling for the SQL concepts.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
|
|
|
|
|
|
|
| |
Add docs to explain this for other backup mechanisms
Author: David Steele <david@pgmasters.net>
Reviewed-by: Petr Jelinek <petr.jelinek@2ndQuadrant.com> et al
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the last major omission in our domains feature: you can now
make a domain over anything that's not a pseudotype.
The major complication from an implementation standpoint is that places
that might be creating tuples of a domain type now need to be prepared
to apply domain_check(). It seems better that unprepared code fail
with an error like "<type> is not composite" than that it silently fail
to apply domain constraints. Therefore, relevant infrastructure like
get_func_result_type() and lookup_rowtype_tupdesc() has been adjusted
to treat domain-over-composite as a distinct case that unprepared code
won't recognize, rather than just transparently treating it the same
as plain composite. This isn't a 100% solution to the possibility of
overlooked domain checks, but it catches most places.
In passing, improve typcache.c's support for domains (it can now cache
the identity of a domain's base type), and rewrite the argument handling
logic in jsonfuncs.c's populate_record[set]_worker to reduce duplicative
per-call lookups.
I believe this is code-complete so far as the core and contrib code go.
The PLs need varying amounts of work, which will be tackled in followup
patches.
Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Like the similar logic for arrays and records, it's necessary to examine
the range's subtype to decide whether the range type can support hashing.
We can omit checking the subtype for btree-defined operations, though,
since range subtypes are required to have those operations. (Possibly
that simplification for btree cases led us to overlook that it does
not apply for hash cases.)
This is only an issue if the subtype lacks hash support, which is not
true of any built-in range type, but it's easy to demonstrate a problem
with a range type over, eg, money: you can get a "could not identify
a hash function" failure when the planner is misled into thinking that
hash join or aggregation would work.
This was born broken, so back-patch to all supported branches.
|
|
|
|
|
|
|
|
|
|
|
| |
The previous coding would report that an array type supports extended
hashing if its element type supports regular hashing. This bug is
only latent at the moment, since AFAICS there is not yet any code
that depends on checking presence of extended-hashing support to make
any decisions. (And in any case it wouldn't matter unless the element
type has only regular hashing, which isn't true of any core data type.)
But that doesn't make it less broken. Extend the
cache_array_element_properties infrastructure to check this properly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following are the individual improvements:
1) Avoidance of FunctionCallInfo based function calls, replaced by
more efficient functions with a native C argument interface.
2) Don't extract columns from a cache entry's tuple whenever matching
entries - instead store them as a Datum array. This also allows to
get rid of having to build dummy tuples for negative & list
entries, and of a hack for dealing with cstring vs. text weirdness.
3) Reorder members of catcache.h struct, so imortant entries are more
likely to be on one cacheline.
4) Allowing the compiler to specialize critical SearchCatCache for a
specific number of attributes allows to unroll loops and avoid
other nkeys dependant initialization.
5) Only initializing the ScanKey when necessary, i.e. catcache misses,
greatly reduces cache unnecessary cpu cache misses.
6) Split of the cache-miss case from the hash lookup, reducing stack
allocations etc in the common case.
7) CatCTup and their corresponding heaptuple are allocated in one
piece.
This results in making cache lookups themselves roughly three times as
fast - full-system benchmarks obviously improve less than that.
I've also evaluated further techniques:
- replace open coded hash with simplehash - the list walk right now
shows up in profiles. Unfortunately it's not easy to do so safely as
an entry's memory location can change at various times, which
doesn't work well with the refcounting and cache invalidation.
- Cacheline-aligning CatCTup entries - helps some with performance,
but the win isn't big and the code for it is ugly, because the
tuples have to be freed as well.
- add more proper functions, rather than macros for
SearchSysCacheCopyN etc., but right now they don't show up in
profiles.
The reason the macro wrapper for syscache.c/h have to be changed,
rather than just catcache, is that doing otherwise would require
exposing the SysCache array to the outside. That might be a good idea
anyway, but it's for another day.
Author: Andres Freund
Reviewed-By: Robert Haas
Discussion: https://postgr.es/m/20170914061207.zxotvyopetm7lrrp@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The file handling functions from fd.c were called with a diverse mix of
notations for the file permissions when they were opening new files.
Almost all files created by the server should have the same permissions
set. So change the API so that e.g. OpenTransientFile() automatically
uses the standard permissions set, and OpenTransientFilePerm() is a new
function that takes an explicit permissions set for the few cases where
it is needed. This also saves an unnecessary argument for call sites
that are just opening an existing file.
While we're reviewing these APIs, get rid of the FileName typedef and
use the standard const char * for the file name and mode_t for the file
mode. This makes these functions match other file handling functions
and removes an unnecessary layer of mysteriousness. We can also get rid
of a few casts that way.
Author: David Steele <david@pgmasters.net>
|
|
|
|
|
|
|
|
|
| |
The elements of RecordCacheArray are TupleDesc, not TupleDesc *.
Those are actually the same size, so that this error is harmless,
but it's still wrong --- and it might bite us someday, if TupleDesc
ever became a struct, say.
Per Coverity.
|