aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/cache
Commit message (Collapse)AuthorAge
...
* Fully enforce uniqueness of constraint names.Tom Lane2018-09-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
* Handle parallel index builds on mapped relations.Peter Geoghegan2018-08-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9da0cc35284, which introduced parallel CREATE INDEX, failed to propagate relmapper.c backend local cache state to parallel worker processes. This could result in parallel index builds against mapped catalog relations where the leader process (participating as a worker) scans the new, pristine relfilenode, while worker processes scan the obsolescent relfilenode. When this happened, the final index structure was typically not consistent with the owning table's structure. The final index structure could contain entries formed from both heap relfilenodes. Only rebuilds on mapped catalog relations that occur as part of a VACUUM FULL or CLUSTER could become corrupt in practice, since their mapped relation relfilenode swap is what allows the inconsistency to arise. On master, fix the problem by propagating the required relmapper.c backend state as part of standard parallel initialization (Cf. commit 29d58fd3). On v11, simply disallow builds against mapped catalog relations by deeming them parallel unsafe. Author: Peter Geoghegan Reported-By: "death lock" Reviewed-By: Tom Lane, Amit Kapila Bug: #15309 Discussion: https://postgr.es/m/153329671686.1405.18298309097348420351@wrigleys.postgresql.org Backpatch: 11-, where parallel CREATE INDEX was introduced.
* Add proper errcodes to new error messages for read() failuresMichael Paquier2018-07-23
| | | | | | | | | | | | | | | | | Those would use the default ERRCODE_INTERNAL_ERROR, but for foreseeable failures an errcode ought to be set, ERRCODE_DATA_CORRUPTED making the most sense here. While on the way, fix one errcode_for_file_access missing in origin.c since the code has been created, and remove one assignment of errno to 0 before calling read(), as this was around to fit with what was present before 811b6e36 where errno would not be set when not enough bytes are read. I have noticed the first one, and Tom has pinged me about the second one. Author: Michael Paquier Reported-by: Tom Lane Discussion: https://postgr.es/m/27265.1531925836@sss.pgh.pa.us
* Rework error messages around file handlingMichael Paquier2018-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some error messages related to file handling are using the code path context to define their state. For example, 2PC-related errors are referring to "two-phase status files", or "relation mapping file" is used for catalog-to-filenode mapping, however those prove to be difficult to translate, and are not more helpful than just referring to the path of the file being worked on. So simplify all those error messages by just referring to files with their path used. In some cases, like the manipulation of WAL segments, the context is actually helpful so those are kept. Calls to the system function read() have also been rather inconsistent with their error handling sometimes not reporting the number of bytes read, and some other code paths trying to use an errno which has not been set. The in-core functions are using a more consistent pattern with this patch, which checks for both errno if set or if an inconsistent read is happening. So as to care about pluralization when reading an unexpected number of byte(s), "could not read: read %d of %zu" is used as error message, with %d field being the output result of read() and %zu the expected size. This simplifies the work of translators with less variations of the same message. Author: Michael Paquier Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/20180520000522.GB1603@paquier.xyz
* Add plan_cache_mode settingPeter Eisentraut2018-07-16
| | | | | | | This allows overriding the choice of custom or generic plan. Author: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRAGLaiEm8ur5DWEBo7qHRWTk9HxkuUAz00CZZtJj-LkCA%40mail.gmail.com
* Fix more wrong paths in header commentsAlexander Korotkov2018-07-11
| | | | | | | It appears that there are more files, whose header comment paths are wrong. So, fix those paths. No backpatching per proposal of Tom Lane. Discussion: https://postgr.es/m/CAPpHfdsJyYbOj59MOQL%2B4XxdcomLSLfLqBtAvwR%2BpsCqj3ELdQ%40mail.gmail.com
* Cosmetic improvements for faster column addition.Amit Kapila2018-06-27
| | | | | | | | | | Changed the name of few structure members for the sake of clarity and removed spurious whitespace. Reported-by: Amit Kapila Author: Amit Kapila, based on suggestion by Andrew Dunstan Reviewed-by: Alvaro Herrera Discussion: https://postgr.es/m/CAA4eK1K2znsFpC+NQ9A4vxT4uDxADN4RmvHX0L6Y=aHVo9gB4Q@mail.gmail.com
* Fix some ill-chosen names for globally-visible partition support functions.Tom Lane2018-06-13
| | | | | "compute_hash_value" is particularly gratuitously generic, but IMO all of these ought to have names clearly related to partitioning.
* Fix up run-time partition pruning's use of relcache's partition data.Tom Lane2018-06-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous coding saved pointers into the partitioned table's relcache entry, but then closed the relcache entry, causing those pointers to nominally become dangling. Actual trouble would be seen in the field only if a relcache flush occurred mid-query, but that's hardly out of the question. While we could fix this by copying all the data in question at query start, it seems better to just hold the relcache entry open for the whole query. While at it, improve the handling of support-function lookups: do that once per query not once per pruning test. There's still something to be desired here, in that we fail to exploit the possibility of caching data across queries in the fn_extra fields of the relcache's FmgrInfo structs, which could happen if we just used those structs in-place rather than copying them. However, combining that with the possibility of per-query lookups of cross-type comparison functions seems to require changes in the APIs of a lot of the pruning support functions, so it's too invasive to consider as part of this patch. A win would ensue only for complex partition key data types (e.g. arrays), so it may not be worth the trouble. David Rowley and Tom Lane Discussion: https://postgr.es/m/17850.1528755844@sss.pgh.pa.us
* Fix bugs in vacuum of shared rels, by keeping their relcache entries current.Andres Freund2018-06-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When vacuum processes a relation it uses the corresponding relcache entry's relfrozenxid / relminmxid as a cutoff for when to remove tuples etc. Unfortunately for nailed relations (i.e. critical system catalogs) bugs could frequently lead to the corresponding relcache entry being stale. This set of bugs could cause actual data corruption as vacuum would potentially not remove the correct row versions, potentially reviving them at a later point. After 699bf7d05c some corruptions in this vein were prevented, but the additional error checks could also trigger spuriously. Examples of such errors are: ERROR: found xmin ... from before relfrozenxid ... and ERROR: found multixact ... from before relminmxid ... To be caused by this bug the errors have to occur on system catalog tables. The two bugs are: 1) Invalidations for nailed relations were ignored, based on the theory that the relcache entry for such tables doesn't change. Which is largely true, except for fields like relfrozenxid etc. This means that changes to relations vacuumed in other sessions weren't picked up by already existing sessions. Luckily autovacuum doesn't have particularly longrunning sessions. 2) For shared *and* nailed relations, the shared relcache init file was never invalidated while running. That means that for such tables (e.g. pg_authid, pg_database) it's not just already existing sessions that are affected, but even new connections are as well. That explains why the reports usually were about pg_authid et. al. To fix 1), revalidate the rd_rel portion of a relcache entry when invalid. This implies a bit of extra complexity to deal with bootstrapping, but it's not too bad. The fix for 2) is simpler, simply always remove both the shared and local init files. Author: Andres Freund Reviewed-By: Alvaro Herrera Discussion: https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com Backpatch: 9.3-
* Post-feature-freeze pgindent run.Tom Lane2018-04-26
| | | | Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
* Plural of modulus is moduliAlvaro Herrera2018-04-19
|
* Reorganize partitioning codeAlvaro Herrera2018-04-14
| | | | | | | | | | | | | | | | | | | | | | There's been a massive addition of partitioning code in PostgreSQL 11, with little oversight on its placement, resulting in a catalog/partition.c with poorly defined boundaries and responsibilities. This commit tries to set a couple of distinct modules to separate things a little bit. There are no code changes here, only code movement. There are three new files: src/backend/utils/cache/partcache.c src/include/partitioning/partdefs.h src/include/utils/partcache.h The previous arrangement of #including catalog/partition.h almost everywhere is no more. Authors: Amit Langote and Álvaro Herrera Discussion: https://postgr.es/m/98e8d509-790a-128c-be7f-e48a5b2d8d97@lab.ntt.co.jp https://postgr.es/m/11aa0c50-316b-18bb-722d-c23814f39059@lab.ntt.co.jp https://postgr.es/m/143ed9a4-6038-76d4-9a55-502035815e68@lab.ntt.co.jp https://postgr.es/m/20180413193503.nynq7bnmgh6vs5vm@alvherre.pgsql
* Use the right memory context for partkey's FmgrInfoAlvaro Herrera2018-04-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were using CurrentMemoryContext to put the partsupfunc fmgr_info into, which isn't right, because we want the PartitionKey as a whole to be in the isolated Relation->rd_partkeycxt context. This can cause a crash with user-defined support functions in the operator classes used by partitioning keys. (Maybe this can cause problems with core-supplied opclasses too, not sure.) This is demonstrably broken in Postgres 10, too, but the initial proposed fix runs afoul of a problem discussed back when 8a0596cb656e ("Get rid of copy_partition_key") reorganized that code: namely that it is possible to jump out of RelationBuildPartitionKey because of some error and leave a dangling memory context child of CacheMemoryContext. Also, while reviewing this I noticed that the removed-in-pg11 copy_partition_key was doing something wrong, unfixed in pg10, namely doing memcpy() on the FmgrInfo, which is bogus (should be doing fmgr_info_copy). Therefore, in branch pg10, the sane fix seems to be to backpatch both the aforementioned 8a0596cb656e and its followup be2343221fb7 ("Protect against hypothetical memory leaks in RelationGetPartitionKey"), so do that, then apply the fmgr_info memcxt bugfix on top. Add a test case exercising btree-based custom operator classes, which causes a crash prior to this fix. This is not a security problem, because in order to create an operator class you need superuser privileges anyway. Authors: Álvaro Herrera and Amit Langote Reported and diagnosed by: Amit Langote Discussion: https://postgr.es/m/3041e853-b1dd-a0c6-ff21-7cc5633bffd0@lab.ntt.co.jp
* Cleanup covering infrastructureTeodor Sigaev2018-04-12
| | | | | | | | | | | - Explicitly forbids opclass, collation and indoptions (like DESC/ASC etc) for including columns. Throw an error if user points that. - Truncated storage arrays for such attributes to store only key atrributes, added assertion checks. - Do not check opfamily and collation for including columns in CompareIndexInfo() Discussion: https://www.postgresql.org/message-id/5ee72852-3c4e-ee35-e2ed-c1d053d45c08@sigaev.ru
* Rename IndexInfo.ii_KeyAttrNumbers arrayTeodor Sigaev2018-04-12
| | | | | | | | Rename ii_KeyAttrNumbers to ii_IndexAttrNumbers to prevent confusion with ii_NumIndexAttrs/ii_NumIndexKeyAttrs. ii_IndexAttrNumbers contains all attributes including "including" columns, not only key attribute. Discussion: https://www.postgresql.org/message-id/13123421-1d52-d0e4-c95c-6d69011e0595%40sigaev.ru
* Indexes with INCLUDE columns and their support in B-treeTeodor Sigaev2018-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces INCLUDE clause to index definition. This clause specifies a list of columns which will be included as a non-key part in the index. The INCLUDE columns exist solely to allow more queries to benefit from index-only scans. Also, such columns don't need to have appropriate operator classes. Expressions are not supported as INCLUDE columns since they cannot be used in index-only scans. Index access methods supporting INCLUDE are indicated by amcaninclude flag in IndexAmRoutine. For now, only B-tree indexes support INCLUDE clause. In B-tree indexes INCLUDE columns are truncated from pivot index tuples (tuples located in non-leaf pages and high keys). Therefore, B-tree indexes now might have variable number of attributes. This patch also provides generic facility to support that: pivot tuples contain number of their attributes in t_tid.ip_posid. Free 13th bit of t_info is used for indicating that. This facility will simplify further support of index suffix truncation. The changes of above are backward-compatible, pg_upgrade doesn't need special handling of B-tree indexes for that. Bump catalog version Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes, David Rowley, Alexander Korotkov Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru
* Logical replication support for TRUNCATEPeter Eisentraut2018-04-07
| | | | | | | | | | | | | | | | | | Update the built-in logical replication system to make use of the previously added logical decoding for TRUNCATE support. Add the required truncate callback to pgoutput and a new logical replication protocol message. Publications get a new attribute to determine whether to replicate truncate actions. When updating a publication via pg_dump from an older version, this is not set, thus preserving the previous behavior. Author: Simon Riggs <simon@2ndquadrant.com> Author: Marco Nenciarini <marco.nenciarini@2ndquadrant.it> Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
* Rename MemoryContextCopySetIdentifier() for clarityPeter Eisentraut2018-04-06
| | | | | | MemoryContextCopySetIdentifier -> MemoryContextCopyAndSetIdentifier Discussion: https://www.postgresql.org/message-id/6421.1522194949@sss.pgh.pa.us
* C comments: "a" <--> "an" correctionsBruce Momjian2018-03-29
| | | | | | | | Reported-by: Michael Paquier, Abhijit Menon-Sen Discussion: https://postgr.es/m/20180305045854.GB2266@paquier.xyz Author: Michael Paquier, Abhijit Menon-Sen, me
* Fix typo in commentMagnus Hagander2018-03-29
| | | | Arthur Zakirov, confirmed by Thomas Munro
* Fast ALTER TABLE ADD COLUMN with a non-NULL defaultAndrew Dunstan2018-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently adding a column to a table with a non-NULL default results in a rewrite of the table. For large tables this can be both expensive and disruptive. This patch removes the need for the rewrite as long as the default value is not volatile. The default expression is evaluated at the time of the ALTER TABLE and the result stored in a new column (attmissingval) in pg_attribute, and a new column (atthasmissing) is set to true. Any existing row when fetched will be supplied with the attmissingval. New rows will have the supplied value or the default and so will never need the attmissingval. Any time the table is rewritten all the atthasmissing and attmissingval settings for the attributes are cleared, as they are no longer needed. The most visible code change from this is in heap_attisnull, which acquires a third TupleDesc argument, allowing it to detect a missing value if there is one. In many cases where it is known that there will not be any (e.g. catalog relations) NULL can be passed for this argument. Andrew Dunstan, heavily modified from an original patch from Serge Rielau. Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley. Discussion: https://postgr.es/m/31e2e921-7002-4c27-59f5-51f08404c858@2ndQuadrant.com
* Allow memory contexts to have both fixed and variable ident strings.Tom Lane2018-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally, we treated memory context names as potentially variable in all cases, and therefore always copied them into the context header. Commit 9fa6f00b1 rethought this a little bit and invented a distinction between fixed and variable names, skipping the copy step for the former. But we can make things both simpler and more useful by instead allowing there to be two parts to a context's identification, a fixed "name" and an optional, variable "ident". The name supplied in the context create call is now required to be a compile-time-constant string in all cases, as it is never copied but just pointed to. The "ident" string, if wanted, is supplied later. This is needed because typically we want the ident to be stored inside the context so that it's cleaned up automatically on context deletion; that means it has to be copied into the context before we can set the pointer. The cost of this approach is basically just an additional pointer field in struct MemoryContextData, which isn't much overhead, and is bought back entirely in the AllocSet case by not needing a headerSize field anymore, since we no longer have to cope with variable header length. In addition, we can simplify the internal interfaces for memory context creation still further, saving a few cycles there. And it's no longer true that a custom identifier disqualifies a context from participating in aset.c's freelist scheme, so possibly there's some win on that end. All the places that were using non-compile-time-constant context names are adjusted to put the variable info into the "ident" instead. This allows more effective identification of those contexts in many cases; for example, subsidary contexts of relcache entries are now identified by both type (e.g. "index info") and relname, where before you got only one or the other. Contexts associated with PL function cache entries are now identified more fully and uniformly, too. I also arranged for plancache contexts to use the query source string as their identifier. This is basically free for CachedPlanSources, as they contained a copy of that string already. We pay an extra pstrdup to do it for CachedPlans. That could perhaps be avoided, but it would make things more fragile (since the CachedPlanSource is sometimes destroyed first). I suspect future improvements in error reporting will require CachedPlans to have a copy of that string anyway, so it's not clear that it's worth moving mountains to avoid it now. This also changes the APIs for context statistics routines so that the context-specific routines no longer assume that output goes straight to stderr, nor do they know all details of the output format. This is useful immediately to reduce code duplication, and it also allows for external code to do something with stats output that's different from printing to stderr. The reason for pushing this now rather than waiting for v12 is that it rethinks some of the API changes made by commit 9fa6f00b1. Seems better for extension authors to endure just one round of API changes not two. Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
* Allow HOT updates for some expression indexesSimon Riggs2018-03-27
| | | | | | | | | | | | | | | If the value of an index expression is unchanged after UPDATE, allow HOT updates where previously we disallowed them, giving a significant performance boost in those cases. Particularly useful for indexes such as JSON->>field where the JSON value changes but the indexed value does not. Submitted as "surjective indexes" patch, now enabled by use of new "recheck_on_update" parameter. Author: Konstantin Knizhnik Reviewer: Simon Riggs, with much wordsmithing and some cleanup
* Fix improper uses of canonicalize_qual().Tom Lane2018-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the things canonicalize_qual() does is to remove constant-NULL subexpressions of top-level AND/OR clauses. It does that on the assumption that what it's given is a top-level WHERE clause, so that NULL can be treated like FALSE. Although this is documented down inside a subroutine of canonicalize_qual(), it wasn't mentioned in the documentation of that function itself, and some callers hadn't gotten that memo. Notably, commit d007a9505 caused get_relation_constraints() to apply canonicalize_qual() to CHECK constraints. That allowed constraint exclusion to misoptimize situations in which a CHECK constraint had a provably-NULL subclause, as seen in the regression test case added here, in which a child table that should be scanned is not. (Although this thinko is ancient, the test case doesn't fail before 9.2, for reasons I've not bothered to track down in detail. There may be related cases that do fail before that.) More recently, commit f0e44751d added an independent bug by applying canonicalize_qual() to index expressions, which is even sillier since those might not even be boolean. If they are, though, I think this could lead to making incorrect index entries for affected index expressions in v10. I haven't attempted to prove that though. To fix, add an "is_check" parameter to canonicalize_qual() to specify whether it should assume WHERE or CHECK semantics, and make it perform NULL-elimination accordingly. Adjust the callers to apply the right semantics, or remove the call entirely in cases where it's not known that the expression has one or the other semantics. I also removed the call in some cases involving partition expressions, where it should be a no-op because such expressions should be canonical already ... and was a no-op, independently of whether it could in principle have done something, because it was being handed the qual in implicit-AND format which isn't what it expects. In HEAD, add an Assert to catch that type of mistake in future. This represents an API break for external callers of canonicalize_qual(). While that's intentional in HEAD to make such callers think about which case applies to them, it seems like something we probably wouldn't be thanked for in released branches. Hence, in released branches, the extra parameter is added to a new function canonicalize_qual_ext(), and canonicalize_qual() is a wrapper that retains its old behavior. Patch by me with suggestions from Dean Rasheed. Back-patch to all supported branches. Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us
* Add prokind column, replacing proisagg and proiswindowPeter Eisentraut2018-03-02
| | | | | | | | | | The new column distinguishes normal functions, procedures, aggregates, and window functions. This replaces the existing columns proisagg and proiswindow, and replaces the convention that procedures are indicated by prorettype == 0. Also change prorettype to be VOIDOID for procedures. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz>
* Remove out-of-date comment about formrdesc().Tom Lane2018-03-01
| | | | | | | | | | | | formrdesc's comment listed the specific catalogs it is called for, but the list was out of date. Rather than jumping back onto that maintenance treadmill, let's just remove the list. It tells the reader nothing that can't be learned quickly and more reliably by searching relcache.c for callers of formrdesc(). Oversight noted by Kyotaro Horiguchi. Discussion: https://postgr.es/m/20180214.105314.138966434.horiguchi.kyotaro@lab.ntt.co.jp
* Make plpgsql use its DTYPE_REC code paths for composite-type variables.Tom Lane2018-02-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Formerly, DTYPE_REC was used only for variables declared as "record"; variables of named composite types used DTYPE_ROW, which is faster for some purposes but much less flexible. In particular, the ROW code paths are entirely incapable of dealing with DDL-caused changes to the number or data types of the columns of a row variable, once a particular plpgsql function has been parsed for the first time in a session. And, since the stored representation of a ROW isn't a tuple, there wasn't any easy way to deal with variables of domain-over-composite types, since the domain constraint checking code would expect the value to be checked to be a tuple. A lesser, but still real, annoyance is that ROW format cannot represent a true NULL composite value, only a row of per-field NULL values, which is not exactly the same thing. Hence, switch to using DTYPE_REC for all composite-typed variables, whether "record", named composite type, or domain over named composite type. DTYPE_ROW remains but is used only for its native purpose, to represent a fixed-at-compile-time list of variables, for instance the targets of an INTO clause. To accomplish this without taking significant performance losses, introduce infrastructure that allows storing composite-type variables as "expanded objects", similar to the "expanded array" infrastructure introduced in commit 1dc5ebc90. A composite variable's value is thereby kept (most of the time) in the form of separate Datums, so that field accesses and updates are not much more expensive than they were in the ROW format. This holds the line, more or less, on performance of variables of named composite types in field-access-intensive microbenchmarks, and makes variables declared "record" perform much better than before in similar tests. In addition, the logic involved with enforcing composite-domain constraints against updates of individual fields is in the expanded record infrastructure not plpgsql proper, so that it might be reusable for other purposes. In further support of this, introduce a typcache feature for assigning a unique-within-process identifier to each distinct tuple descriptor of interest; in particular, DDL alterations on composite types result in a new identifier for that type. This allows very cheap detection of the need to refresh tupdesc-dependent data. This improves on the "tupDescSeqNo" idea I had in commit 687f096ea: that assigned identifying sequence numbers to successive versions of individual composite types, but the numbers were not unique across different types, nor was there support for assigning numbers to registered record types. In passing, allow plpgsql functions to accept as well as return type "record". There was no good reason for the old restriction, and it was out of step with most of the other PLs. Tom Lane, reviewed by Pavel Stehule Discussion: https://postgr.es/m/8962.1514399547@sss.pgh.pa.us
* get_relid_attribute_name is dead, long live get_attnameAlvaro Herrera2018-02-12
| | | | | | | | | The modern way is to use a missing_ok argument instead of two separate almost-identical routines, so do that. Author: Michaël Paquier Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/20180201063212.GE6398@paquier.xyz
* Fix RelationBuildPartitionKey's processing of partition key expressions.Tom Lane2018-02-05
| | | | | | | | | | | | | Failure to advance the list pointer while reading partition expressions from a list results in invoking an input function with inappropriate data, possibly leading to crashes or, with carefully crafted input, disclosure of arbitrary backend memory. Bug discovered independently by Álvaro Herrera and David Rowley. This patch is by Álvaro but owes something to David's proposed fix. Back-patch to v10 where the issue was introduced. Security: CVE-2018-1052
* Save a few bytes by removing useless last argument to SearchCatCacheList.Tom Lane2018-01-29
| | | | | | | | | | | | | | | | | | | | | | There's never any value in giving a fully specified cache key to SearchCatCacheList: you might as well call SearchCatCache instead, since there could be only one match. So the maximum useful number of key arguments is one less than the supported number of key columns. We might as well remove the useless extra argument and save some few bytes per call site, as well as a cycle or so per call. I believe the reason it was coded like this is that originally, callers had to write out all the dummy arguments in each call, and so it seemed less confusing if SearchCatCache and SearchCatCacheList took the same number of key arguments. But since commit e26c539e9, callers only write their live arguments explicitly, making that a non-factor; and there's surely been enough time for third-party modules to adapt to that coding style. So this is only an ABI break not an API break for callers. Per discussion with Oliver Ford, this might also make it less confusing how to use SearchCatCacheList correctly. Discussion: https://postgr.es/m/27788.1517069693@sss.pgh.pa.us
* Local partitioned indexesAlvaro Herrera2018-01-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | When CREATE INDEX is run on a partitioned table, create catalog entries for an index on the partitioned table (which is just a placeholder since the table proper has no data of its own), and recurse to create actual indexes on the existing partitions; create them in future partitions also. As a convenience gadget, if the new index definition matches some existing index in partitions, these are picked up and used instead of creating new ones. Whichever way these indexes come about, they become attached to the index on the parent table and are dropped alongside it, and cannot be dropped on isolation unless they are detached first. To support pg_dump'ing these indexes, add commands CREATE INDEX ON ONLY <table> (which creates the index on the parent partitioned table, without recursing) and ALTER INDEX ATTACH PARTITION (which is used after the indexes have been created individually on each partition, to attach them to the parent index). These reconstruct prior database state exactly. Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit Langote, Jesper Pedersen, Simon Riggs, David Rowley Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
* Fix Latin spellingPeter Eisentraut2018-01-11
| | | | "c.f." should be "cf.".
* Fix typoAlvaro Herrera2018-01-03
| | | | | Author: Dagfinn Ilmari Mannsåker Discussion: https://postgr.es/m/d8jefpk4jtd.fsf@dalvik.ping.uio.no
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Protect against hypothetical memory leaks in RelationGetPartitionKeyAlvaro Herrera2017-12-27
| | | | | | | Also, fix a comment that commit 8a0596cb656e made obsolete. Reported-by: Robert Haas Discussion: http://postgr.es/m/CA+TgmoYbpuUUUp2GhYNwWm0qkah39spiU7uOiNXLz20ASfKYoA@mail.gmail.com
* Get rid of copy_partition_keyAlvaro Herrera2017-12-21
| | | | | | | | | | | | That function currently exists to avoid leaking memory in CacheMemoryContext in case of trouble while the partition key is being built, but there's a better way: allocate everything in a memcxt that goes away if the current (sub)transaction fails, and once the partition key is built and no further errors can occur, make the memcxt permanent by making it a child of CacheMemoryContext. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20171027172730.eh2domlkpn4ja62m@alvherre.pgsql
* Rethink MemoryContext creation to improve performance.Tom Lane2017-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes a number of interrelated changes to reduce the overhead involved in creating/deleting memory contexts. The key ideas are: * Include the AllocSetContext header of an aset.c context in its first malloc request, rather than allocating it separately in TopMemoryContext. This means that we now always create an initial or "keeper" block in an aset, even if it never receives any allocation requests. * Create freelists in which we can save and recycle recently-destroyed asets (this idea is due to Robert Haas). * In the common case where the name of a context is a constant string, just store a pointer to it in the context header, rather than copying the string. The first change eliminates a palloc/pfree cycle per context, and also avoids bloat in TopMemoryContext, at the price that creating a context now involves a malloc/free cycle even if the context never receives any allocations. That would be a loser for some common usage patterns, but recycling short-lived contexts via the freelist eliminates that pain. Avoiding copying constant strings not only saves strlen() and strcpy() overhead, but is an essential part of the freelist optimization because it makes the context header size constant. Currently we make no attempt to use the freelist for contexts with non-constant names. (Perhaps someday we'll need to think harder about that, but in current usage, most contexts with custom names are long-lived anyway.) The freelist management in this initial commit is pretty simplistic, and we might want to refine it later --- but in common workloads that will never matter because the freelists will never get full anyway. To create a context with a non-constant name, one is now required to call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME option. AllocSetContextCreate becomes a wrapper macro, and it includes a test that will complain about non-string-literal context name parameters on gcc and similar compilers. An unfortunate side effect of making AllocSetContextCreate a macro is that one is now *required* to use the size parameter abstraction macros (ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of writing out individual size parameters no longer works unless you switch to AllocSetContextCreateExtended. Internally to the memory-context-related modules, the context creation APIs are simplified, removing the rather baroque original design whereby a context-type module called mcxt.c which then called back into the context-type module. That saved a bit of code duplication, but not much, and it prevented context-type modules from exercising control over the allocation of context headers. In passing, I converted the test-and-elog validation of aset size parameters into Asserts to save a few more cycles. The original thought was that callers might compute size parameters on the fly, but in practice nobody does that, so it's useless to expend cycles on checking those numbers in production builds. Also, mark the memory context method-pointer structs "const", just for cleanliness. Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
* Clean up assorted messiness around AllocateDir() usage.Tom Lane2017-12-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a couple of low-probability bugs that could lead to reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE) concerning directory-open or file-open failures. It also fixes places where we took shortcuts in reporting such errors, either by using elog instead of ereport or by using ereport but forgetting to specify an errcode. And it eliminates a lot of just plain redundant error-handling code. In service of all this, export fd.c's formerly-static function ReadDirExtended, so that external callers can make use of the coding pattern dir = AllocateDir(path); while ((de = ReadDirExtended(dir, path, LOG)) != NULL) if they'd like to treat directory-open failures as mere LOG conditions rather than errors. Also fix FreeDir to be a no-op if we reach it with dir == NULL, as such a coding pattern would cause. Then, remove code at many call sites that was throwing an error or log message for AllocateDir failure, as ReadDir or ReadDirExtended can handle that job just fine. Aside from being a net code savings, this gets rid of a lot of not-quite-up-to-snuff reports, as mentioned above. (In some places these changes result in replacing a custom error message such as "could not open tablespace directory" with more generic wording "could not open directory", but it was agreed that the custom wording buys little as long as we report the directory name.) In some other call sites where we can't just remove code, change the error reports to be fully project-style-compliant. Also reorder code in restoreTwoPhaseData that was acquiring a lock between AllocateDir and ReadDir; in the unlikely but surely not impossible case that LWLockAcquire changes errno, AllocateDir failures would be misreported. There is no great value in opening the directory before acquiring TwoPhaseStateLock, so just do it in the other order. Also fix CheckXLogRemoved to guarantee that it preserves errno, as quite a number of call sites are implicitly assuming. (Again, it's unlikely but I think not impossible that errno could change during a SpinLockAcquire. If so, this function was broken for its own purposes as well as breaking callers.) And change a few places that were using not-per-project-style messages, such as "could not read directory" when "could not open directory" is more correct. Back-patch the exporting of ReadDirExtended, in case we have occasion to back-patch some fix that makes use of it; it's not needed right now but surely making it global is pretty harmless. Also back-patch the restoreTwoPhaseData and CheckXLogRemoved fixes. The rest of this is essentially cosmetic and need not get back-patched. Michael Paquier, with a bit of additional work by me Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com
* SQL proceduresPeter Eisentraut2017-11-30
| | | | | | | | | | | | | | | | | | | | | This adds a new object type "procedure" that is similar to a function but does not have a return type and is invoked by the new CALL statement instead of SELECT or similar. This implementation is aligned with the SQL standard and compatible with or similar to other SQL implementations. This commit adds new commands CALL, CREATE/ALTER/DROP PROCEDURE, as well as ALTER/DROP ROUTINE that can refer to either a function or a procedure (or an aggregate function, as an extension to SQL). There is also support for procedures in various utility commands such as COMMENT and GRANT, as well as support in pg_dump and psql. Support for defining procedures is available in all the languages supplied by the core distribution. While this commit is mainly syntax sugar around existing functionality, future features will rely on having procedures as a separate object type. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
* Make PL/Python handle domain-type conversions correctly.Tom Lane2017-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix PL/Python so that it can handle domains over composite, and so that it enforces domain constraints correctly in other cases that were not always done properly before. Notably, it didn't do arrays of domains right (oversight in commit c12d570fa), and it failed to enforce domain constraints when returning a composite type containing a domain field, and if a transform function is being used for a domain's base type then it failed to enforce domain constraints on the result. Also, in many places it missed checking domain constraints on null values, because the plpy_typeio code simply wasn't called for Py_None. Rather than try to band-aid these problems, I made a significant refactoring of the plpy_typeio logic. The existing design of recursing for array and composite members is extended to also treat domains as containers requiring recursion, and the APIs for the module are cleaned up and simplified. The patch also modifies plpy_typeio to rely on the typcache more than it did before (which was pretty much not at all). This reduces the need for repetitive lookups, and lets us get rid of an ad-hoc scheme for detecting changes in composite types. I added a couple of small features to typcache to help with that. Although some of this is fixing bugs that long predate v11, I don't think we should risk a back-patch: it's a significant amount of code churn, and there've been no complaints from the field about the bugs. Tom Lane, reviewed by Anthony Bykov Discussion: https://postgr.es/m/24449.1509393613@sss.pgh.pa.us
* Add hash partitioning.Robert Haas2017-11-09
| | | | | | | | | | | | | | | | | | | Hash partitioning is useful when you want to partition a growing data set evenly. This can be useful to keep table sizes reasonable, which makes maintenance operations such as VACUUM faster, or to enable partition-wise join. At present, we still depend on constraint exclusion for partitioning pruning, and the shape of the partition constraints for hash partitioning is such that that doesn't work. Work is underway to fix that, which should both improve performance and make partitioning pruning work with hash partitioning. Amul Sul, reviewed and tested by Dilip Kumar, Ashutosh Bapat, Yugo Nagata, Rajkumar Raghuwanshi, Jesper Pedersen, and by me. A few final tweaks also by me. Discussion: http://postgr.es/m/CAAJ_b96fhpJAP=ALbETmeLk1Uni_GFZD938zgenhF49qgDTjaQ@mail.gmail.com
* Change TRUE/FALSE to true/falsePeter Eisentraut2017-11-08
| | | | | | | | | | | | | | The lower case spellings are C and C++ standard and are used in most parts of the PostgreSQL sources. The upper case spellings are only used in some files/modules. So standardize on the standard spellings. The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so those are left as is when using those APIs. In code comments, we use the lower-case spelling for the C concepts and keep the upper-case spelling for the SQL concepts. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
* Exclude pg_internal.init from BASE_BACKUPSimon Riggs2017-11-07
| | | | | | | Add docs to explain this for other backup mechanisms Author: David Steele <david@pgmasters.net> Reviewed-by: Petr Jelinek <petr.jelinek@2ndQuadrant.com> et al
* Support domains over composite types.Tom Lane2017-10-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is the last major omission in our domains feature: you can now make a domain over anything that's not a pseudotype. The major complication from an implementation standpoint is that places that might be creating tuples of a domain type now need to be prepared to apply domain_check(). It seems better that unprepared code fail with an error like "<type> is not composite" than that it silently fail to apply domain constraints. Therefore, relevant infrastructure like get_func_result_type() and lookup_rowtype_tupdesc() has been adjusted to treat domain-over-composite as a distinct case that unprepared code won't recognize, rather than just transparently treating it the same as plain composite. This isn't a 100% solution to the possibility of overlooked domain checks, but it catches most places. In passing, improve typcache.c's support for domains (it can now cache the identity of a domain's base type), and rewrite the argument handling logic in jsonfuncs.c's populate_record[set]_worker to reduce duplicative per-call lookups. I believe this is code-complete so far as the core and contrib code go. The PLs need varying amounts of work, which will be tackled in followup patches. Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us
* Fix typcache's failure to treat ranges as container types.Tom Lane2017-10-20
| | | | | | | | | | | | | | | | | Like the similar logic for arrays and records, it's necessary to examine the range's subtype to decide whether the range type can support hashing. We can omit checking the subtype for btree-defined operations, though, since range subtypes are required to have those operations. (Possibly that simplification for btree cases led us to overlook that it does not apply for hash cases.) This is only an issue if the subtype lacks hash support, which is not true of any built-in range type, but it's easy to demonstrate a problem with a range type over, eg, money: you can get a "could not identify a hash function" failure when the planner is misled into thinking that hash join or aggregation would work. This was born broken, so back-patch to all supported branches.
* Fix misimplementation of typcache logic for extended hashing.Tom Lane2017-10-20
| | | | | | | | | | | The previous coding would report that an array type supports extended hashing if its element type supports regular hashing. This bug is only latent at the moment, since AFAICS there is not yet any code that depends on checking presence of extended-hashing support to make any decisions. (And in any case it wouldn't matter unless the element type has only regular hashing, which isn't true of any core data type.) But that doesn't make it less broken. Extend the cache_array_element_properties infrastructure to check this properly.
* Improve sys/catcache performance.Andres Freund2017-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following are the individual improvements: 1) Avoidance of FunctionCallInfo based function calls, replaced by more efficient functions with a native C argument interface. 2) Don't extract columns from a cache entry's tuple whenever matching entries - instead store them as a Datum array. This also allows to get rid of having to build dummy tuples for negative & list entries, and of a hack for dealing with cstring vs. text weirdness. 3) Reorder members of catcache.h struct, so imortant entries are more likely to be on one cacheline. 4) Allowing the compiler to specialize critical SearchCatCache for a specific number of attributes allows to unroll loops and avoid other nkeys dependant initialization. 5) Only initializing the ScanKey when necessary, i.e. catcache misses, greatly reduces cache unnecessary cpu cache misses. 6) Split of the cache-miss case from the hash lookup, reducing stack allocations etc in the common case. 7) CatCTup and their corresponding heaptuple are allocated in one piece. This results in making cache lookups themselves roughly three times as fast - full-system benchmarks obviously improve less than that. I've also evaluated further techniques: - replace open coded hash with simplehash - the list walk right now shows up in profiles. Unfortunately it's not easy to do so safely as an entry's memory location can change at various times, which doesn't work well with the refcounting and cache invalidation. - Cacheline-aligning CatCTup entries - helps some with performance, but the win isn't big and the code for it is ugly, because the tuples have to be freed as well. - add more proper functions, rather than macros for SearchSysCacheCopyN etc., but right now they don't show up in profiles. The reason the macro wrapper for syscache.c/h have to be changed, rather than just catcache, is that doing otherwise would require exposing the SysCache array to the outside. That might be a good idea anyway, but it's for another day. Author: Andres Freund Reviewed-By: Robert Haas Discussion: https://postgr.es/m/20170914061207.zxotvyopetm7lrrp@alap3.anarazel.de
* Refactor new file permission handlingPeter Eisentraut2017-09-23
| | | | | | | | | | | | | | | | | | | The file handling functions from fd.c were called with a diverse mix of notations for the file permissions when they were opening new files. Almost all files created by the server should have the same permissions set. So change the API so that e.g. OpenTransientFile() automatically uses the standard permissions set, and OpenTransientFilePerm() is a new function that takes an explicit permissions set for the few cases where it is needed. This also saves an unnecessary argument for call sites that are just opening an existing file. While we're reviewing these APIs, get rid of the FileName typedef and use the standard const char * for the file name and mode_t for the file mode. This makes these functions match other file handling functions and removes an unnecessary layer of mysteriousness. We can also get rid of a few casts that way. Author: David Steele <david@pgmasters.net>
* Fix bogus size calculation introduced by commit cc5f81366.Tom Lane2017-09-17
| | | | | | | | | The elements of RecordCacheArray are TupleDesc, not TupleDesc *. Those are actually the same size, so that this error is harmless, but it's still wrong --- and it might bite us someday, if TupleDesc ever became a struct, say. Per Coverity.