aboutsummaryrefslogtreecommitdiff
path: root/src/include
Commit message (Collapse)AuthorAge
...
* Rename macro to RELKIND_HAS_STORAGEAlvaro Herrera2019-01-04
| | | | | | The original name was an unfortunate choice. Discussion: https://postgr.es/m/20181218.145600.172055615.horiguchi.kyotaro@lab.ntt.co.jp
* Move the built-in conversions into the initial catalog data.Tom Lane2019-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of running a SQL script to create the standard conversion functions and pg_conversion entries, put those entries into the initial data in postgres.bki. This shaves a few percent off the runtime of initdb, and also allows accurate comments to be attached to the conversion functions; the previous script labeled them with machine-generated comments that were not quite right for multi-purpose conversion functions. Also, we can get rid of the duplicative Makefile and MSVC perl implementations of the generation code for that SQL script. A functional change is that these pg_proc and pg_conversion entries are now "pinned" by initdb. Leaving them unpinned was perhaps a good thing back while the conversions feature was under development, but there seems no valid reason for it now. Also, the conversion functions are now marked as immutable, where before they were volatile by virtue of lacking any explicit specification. That seems like it was just an oversight. To avoid using magic constants in pg_conversion.dat, extend genbki.pl to allow encoding names to be converted, much as it does for language, access method, etc names. John Naylor Discussion: https://postgr.es/m/CAJVSVGWtUqxpfAaxS88vEGvi+jKzWZb2EStu5io-UPc4p9rSJg@mail.gmail.com
* Use symbolic references for pg_language OIDs in the bootstrap data.Tom Lane2019-01-03
| | | | | | | | | | | | | | | | | | | This patch teaches genbki.pl to replace pg_language names by OIDs in much the same way as it already does for pg_am names etc, and converts pg_proc.dat to use such symbolic references in the prolang column. Aside from getting rid of a few more magic numbers in the initial catalog data, this means that Gen_fmgrtab.pl no longer needs to read pg_language.dat, since it doesn't have to know the OID of the "internal" language; now it's just looking for the string "internal". No need for a catversion bump, since the contents of postgres.bki don't actually change at all. John Naylor Discussion: https://postgr.es/m/CAJVSVGWtUqxpfAaxS88vEGvi+jKzWZb2EStu5io-UPc4p9rSJg@mail.gmail.com
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Remove configure switch --disable-strong-randomMichael Paquier2019-01-01
| | | | | | | | | | | | | | | | This removes a portion of infrastructure introduced by fe0a0b5 to allow compilation of Postgres in environments where no strong random source is available, meaning that there is no linking to OpenSSL and no /dev/urandom (Windows having its own CryptoAPI). No systems shipped this century lack /dev/urandom, and the buildfarm is actually not testing this switch at all, so just remove it. This simplifies particularly some backend code which included a fallback implementation using shared memory, and removes a set of alternate regression output files from pgcrypto. Author: Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20181230063219.GG608@paquier.xyz
* Update leakproofness markings on some btree comparison functions.Tom Lane2018-12-31
| | | | | | | | | | | | | | | | | | | | Mark pg_lsn and oidvector comparison functions as leakproof. Per discussion, these clearly are leakproof so we might as well mark them so. On the other hand, remove leakproof markings from name comparison functions other than equal/not-equal. Now that these depend on varstr_cmp, they can't be considered leakproof if text comparison isn't. (This was my error in commit 586b98fdf.) While at it, add some opr_sanity queries to catch cases where related functions do not have the same volatility and leakproof markings. This would clearly be bogus for commutator or negator pairs. In the domain of btree comparison functions, we do have some exceptions, because text equality is leakproof but inequality comparisons are not. That's odd on first glance but is reasonable (for now anyway) given the much greater complexity of the inequality code paths. Discussion: https://postgr.es/m/20181231172551.GA206480@gust.leadboat.com
* Remove some useless codeAlvaro Herrera2018-12-31
| | | | | | | | | | | | | In commit 8b08f7d4820f I added member relationId to IndexStmt struct. I'm now not sure why; DefineIndex doesn't need it, since the relation OID is passed as a separate argument anyway. Remove it. Also remove a redundant assignment to the relationId argument (it wasn't redundant when added by commit e093dcdd285, but should have been removed in commit 5f173040e3), and use relationId instead of stmt->relation when locking the relation in the second phase of CREATE INDEX CONCURRENTLY, which is not only confusing but it means we resolve the name twice for no reason.
* Add a hash opclass for type "tid".Tom Lane2018-12-30
| | | | | | | | | | | | | | | | | Up to now we've not worried much about joins where the join key is a relation's CTID column, reasoning that storing a table's CTIDs in some other table would be pretty useless. However, there are use-cases for this sort of query involving self-joins, so that argument doesn't really hold water. With larger relations, a merge or hash join is desirable. We had a btree opclass for type "tid", allowing merge joins on CTID, but no hash opclass so that hash joins weren't possible. Add the missing infrastructure. This also potentially enables hash aggregation on "tid", though the use-cases for that aren't too clear. Discussion: https://postgr.es/m/1853.1545453106@sss.pgh.pa.us
* Support parameterized TidPaths.Tom Lane2018-12-30
| | | | | | | | | | | | | | | | | | | Up to now we've not worried much about joins where the join key is a relation's CTID column, reasoning that storing a table's CTIDs in some other table would be pretty useless. However, there are use-cases for this sort of query involving self-joins, so that argument doesn't really hold water. This patch allows generating plans for joins on CTID that use a nestloop with inner TidScan, similar to what we might do with an index on the join column. This is the most efficient way to join when the outer side of the nestloop is expected to yield relatively few rows. This change requires upgrading tidpath.c and the generated TidPaths to work with RestrictInfos instead of bare qual clauses, but that's long-postponed technical debt anyway. Discussion: https://postgr.es/m/17443.1545435266@sss.pgh.pa.us
* Improve description of DEFAULT_XLOG_SEG_SIZE in pg_config.hMichael Paquier2018-12-29
| | | | | | | | | This was incorrectly referring to --walsegsize, and its description is rewritten in a clearer way. Author: Ian Barwick, Tom Lane Reviewed-by: Álvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/08534fc6-119a-c498-254e-d5acc4e6bf85@2ndquadrant.com
* Remove obsolete IndexIs* macrosPeter Eisentraut2018-12-27
| | | | | | | | | Remove IndexIsValid(), IndexIsReady(), IndexIsLive() in favor of accessing the index structure directly. These macros haven't been used consistently, and the original reason of maintaining source compatibility with PostgreSQL 9.2 is gone. Discussion: https://www.postgresql.org/message-id/flat/d419147c-09d4-6196-5d9d-0234b230880a%402ndquadrant.com
* Remove entry tree root conflict checking from GIN predicate lockingAlexander Korotkov2018-12-27
| | | | | | | | | | | | | | According to README we acquire predicate locks on entry tree leafs and posting tree roots. However, when ginFindLeafPage() is going to lock leaf in exclusive mode, then it checks root for conflicts regardless whether it's a entry or posting tree. Assuming that we never place predicate lock on entry tree root (excluding corner case when root is leaf), this check is redundant. This commit removes this check. Now, root conflict checking is controlled by separate argument of ginFindLeafPage(). Discussion: https://postgr.es/m/CAPpHfdv7rrDyy%3DMgsaK-L9kk0AH7az0B-mdC3w3p0FSb9uoyEg%40mail.gmail.com Author: Alexander Korotkov Backpatch-through: 11
* Add some const decorationsPeter Eisentraut2018-12-22
| | | | These mainly help understanding the function signatures better.
* Check for conflicting queries during replay of gistvacuumpage()Alexander Korotkov2018-12-21
| | | | | | | | | | | | | | | | | | | | | | | 013ebc0a7b implements so-called GiST microvacuum. That is gistgettuple() marks index tuples as dead when kill_prior_tuple is set. Later, when new tuple insertion claims page space, those dead index tuples are physically deleted from page. When this deletion is replayed on standby, it might conflict with read-only queries. But 013ebc0a7b doesn't handle this. That may lead to disappearance of some tuples from read-only snapshots on standby. This commit implements resolving of conflicts between replay of GiST microvacuum and standby queries. On the master we implement new WAL record type XLOG_GIST_DELETE, which comprises necessary information. On stable releases we've to be tricky to keep WAL compatibility. Information required for conflict processing is just appended to data of XLOG_GIST_PAGE_UPDATE record. So, PostgreSQL version, which doesn't know about conflict processing, will just ignore that. Reported-by: Andres Freund Diagnosed-by: Andres Freund Discussion: https://postgr.es/m/20181212224524.scafnlyjindmrbe6%40alap3.anarazel.de Author: Alexander Korotkov Backpatch-through: 9.6
* Base information_schema.sql_identifier domain on name, not varchar.Tom Lane2018-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SQL spec says that sql_identifier is a domain over varchar, but it also says that that domain is supposed to represent the set of valid identifiers for the implementation, in particular applying a length limit matching the implementation's identifier length limit. We were declaring sql_identifier as just "character varying", thus duplicating what the spec says about base type, but entirely failing at the rest of it. Instead, let's declare sql_identifier as a domain over type "name". (We can drop the COLLATE "C" added by commit 6b0faf723, since that's now implicit in "name".) With the recent improvements to name's comparison support, there's not a lot of functional difference between name and varchar. So although in principle this is a spec deviation, it's a pretty minor one. And correctly enforcing PG's name length limit is a good thing; on balance this seems closer to the intent of the spec than what we had. But that's all just language-lawyering. The *real* reason to do this is that it makes sql_identifier columns exposed by information_schema views be just direct representations of the underlying "name" catalog columns, eliminating a semantic mismatch that was disastrous for performance of typical queries on the information_schema. In combination with the recent change to allow dropping no-op CoerceToDomain nodes, this allows (for example) queries such as select ... from information_schema.tables where table_name = 'foo'; to produce an indexscan rather than a seqscan on pg_class. Discussion: https://postgr.es/m/CAFj8pRBUCX4LZ2rA2BbEkdD6NN59mgx+BLo1gO08Wod4RLtcTg@mail.gmail.com
* Avoid producing over-length specific_name outputs in information_schema.Tom Lane2018-12-20
| | | | | | | | | | | | | | | | | | | | | | | | information_schema output columns that are declared as being type sql_identifier are supposed to conform to the implementation's rules for valid identifiers, in particular the identifier length limit. Several places potentially violated this limit by concatenating a function's name and OID. (The OID is added to ensure name uniqueness within a schema, since the spec doesn't expect function name overloading.) Simply truncating the concatenation result to fit in "name" won't do, since losing part of the OID might wind up giving non-unique results. Instead, let's truncate the function name as necessary. The most practical way to do that is to do it in a C function; the information_schema.sql script doesn't have easy access to the value of NAMEDATALEN, nor does it have an easy way to truncate on the basis of resulting byte-length rather than number of characters. (There are still a couple of places that cast concatenation results to sql_identifier, but as far as I can see they are guaranteed not to produce over-length strings, at least with the normal value of NAMEDATALEN.) Discussion: https://postgr.es/m/23817.1545283477@sss.pgh.pa.us
* Make bitmapset.c use 64-bit bitmap words on 64-bit machines.Tom Lane2018-12-20
| | | | | | | | | | | | Using the full width of the CPU's native word size shouldn't cost anything in typical cases. When working with large bitmapsets, this halves the number of operations needed for many common BMS operations. On the right sort of test case, a measurable improvement is obtained. David Rowley Discussion: https://postgr.es/m/CAKJS1f9EGBd2h-VkXvb=51tf+X46zMX5T8h-KYgXEV_u2zmLUw@mail.gmail.com
* Add text-vs-name cross-type operators, and unify name_ops with text_ops.Tom Lane2018-12-19
| | | | | | | | | | | | | | | | | | | Now that name comparison has effectively the same behavior as text comparison, we might as well merge the name_ops opfamily into text_ops, allowing cross-type comparisons to be processed without forcing a datatype coercion first. We need do little more than add cross-type operators to make the opfamily complete, and fix one or two places in the planner that assumed text_ops was a single-datatype opfamily. I chose to unify hash name_ops into hash text_ops as well, since the types have compatible hashing semantics. This allows marking the new cross-type equality operators as oprcanhash. (Note: this doesn't remove the name_ops opclasses, so there's no breakage of index definitions. Those opclasses are just reparented into the text_ops opfamily.) Discussion: https://postgr.es/m/15938.1544377821@sss.pgh.pa.us
* Make type "name" collation-aware.Tom Lane2018-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The "name" comparison operators now all support collations, making them functionally equivalent to "text" comparisons, except for the different physical representation of the datatype. They do, in fact, mostly share the varstr_cmp and varstr_sortsupport infrastructure, which has been slightly enlarged to handle the case. To avoid changes in the default behavior of the datatype, set name's typcollation to C_COLLATION_OID not DEFAULT_COLLATION_OID, so that by default comparisons to a name value will continue to use strcmp semantics. (This would have been the case for system catalog columns anyway, because of commit 6b0faf723, but doing this makes it true for user-created name columns as well. In particular, this avoids locale-dependent changes in our regression test results.) In consequence, tweak a couple of places that made assumptions about collatable base types always having typcollation DEFAULT_COLLATION_OID. I have not, however, attempted to relax the restriction that user- defined collatable types must have that. Hence, "name" doesn't behave quite like a user-defined type; it acts more like a domain with COLLATE "C". (Conceivably, if we ever get rid of the need for catalog name columns to be fixed-length, "name" could actually become such a domain over text. But that'd be a pretty massive undertaking, and I'm not volunteering.) Discussion: https://postgr.es/m/15938.1544377821@sss.pgh.pa.us
* Make collation-aware system catalog columns use "C" collation.Tom Lane2018-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now we allowed text columns in system catalogs to use collation "default", but that isn't really safe because it might mean something different in template0 than it means in a database cloned from template0. In particular, this could mean that cloned pg_statistic entries for such columns weren't entirely valid, possibly leading to bogus planner estimates, though (probably) not any outright failures. In the wake of commit 5e0928005, a better solution is available: if we label such columns with "C" collation, then their pg_statistic entries will also use that collation and hence will be valid independently of the database collation. This also provides a cleaner solution for indexes on such columns than the hack added by commit 0b28ea79c: the indexes will naturally inherit "C" collation and don't have to be forced to use text_pattern_ops. Also, with the planned improvement of type "name" to be collation-aware, this policy will apply cleanly to both text and name columns. Because of the pg_statistic angle, we should also apply this policy to the tables in information_schema. This patch does that by adjusting information_schema's textual domain types to specify "C" collation. That has the user-visible effect that order-sensitive comparisons to textual information_schema view columns will now use "C" collation by default. The SQL standard says that the collation of those view columns is implementation-defined, so I think this is legal per spec. At some point this might allow for translation of such comparisons into indexable conditions on the underlying "name" columns, although additional work will be needed before that can happen. Discussion: https://postgr.es/m/19346.1544895309@sss.pgh.pa.us
* Include partitioned indexes to system view pg_indexesMichael Paquier2018-12-18
| | | | | | | | | pg_tables already includes partitioned tables, so for consistency pg_indexes should show partitioned indexes. Author: Suraj Kharage Reviewed-by: Amit Langote, Michael Paquier Discussion: https://postgr.es/m/CAF1DzPVrYo4XNTEnc=PqVp6aLJc7LFYpYR4rX=_5pV=wJ2KdZg@mail.gmail.com
* Drop support for getting signal descriptions from sys_siglist[].Tom Lane2018-12-17
| | | | | | | | | | It appears that all platforms that have sys_siglist[] also have strsignal(), making that fallback case in pg_strsignal() dead code. Getting rid of it allows dropping a configure test, which seems worth more than providing textual signal descriptions on whatever platforms might still hypothetically have use for the fallback case. Discussion: https://postgr.es/m/25758.1544983503@sss.pgh.pa.us
* Fix tablespace handling for partitioned tablesAlvaro Herrera2018-12-17
| | | | | | | | | | | | | | | | | | When partitioned tables were introduced, we failed to realize that by copying the tablespace handling for other relation kinds with no physical storage we were causing the secondary effect that their partitions would not automatically inherit the tablespace setting. This is surprising and unhelpful, so change it to adopt the behavior introduced in pg11 (commit 33e6c34c3267) for partitioned indexes: the parent relation remembers the tablespace specification, which is then used for any new partitions that don't declare one. Because this commit changes behavior of the TABLESPACE clause for partitioned tables (it's no longer a no-op), it is not backpatched. Author: David Rowley, Álvaro Herrera Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAKJS1f9SxVzqDrGD1teosFd6jBMM0UEaa14_8mRvcWE19Tu0hA@mail.gmail.com
* Modernize our code for looking up descriptive strings for Unix signals.Tom Lane2018-12-16
| | | | | | | | | | | | | | | | | | | | | | At least as far back as the 2008 spec, POSIX has defined strsignal(3) for looking up descriptive strings for signal numbers. We hadn't gotten the word though, and were still using the crufty old sys_siglist array, which is in no standard even though most Unixen provide it. Aside from not being formally standards-compliant, this was just plain ugly because it involved #ifdef's at every place using the code. To eliminate the #ifdef's, create a portability function pg_strsignal, which wraps strsignal(3) if available and otherwise falls back to sys_siglist[] if available. The set of Unixen with neither API is probably empty these days, but on any platform with neither, you'll just get "unrecognized signal". All extant callers print the numeric signal number too, so no need to work harder than that. Along the way, upgrade pg_basebackup's child-error-exit reporting to match the rest of the system. Discussion: https://postgr.es/m/25758.1544983503@sss.pgh.pa.us
* Improve detection of child-process SIGPIPE failures.Tom Lane2018-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ffa4cbd62 added logic to detect SIGPIPE failure of a COPY child process, but it only worked correctly if the SIGPIPE occurred in the immediate child process. Depending on the shell in use and the complexity of the shell command string, we might instead get back an exit code of 128 + SIGPIPE, representing a shell error exit reporting SIGPIPE in the child process. We could just hack up ClosePipeToProgram() to add the extra case, but it seems like this is a fairly general issue deserving a more general and better-documented solution. I chose to add a couple of functions in src/common/wait_error.c, which is a natural place to know about wait-result encodings, that will test for either a specific child-process signal type or any child-process signal failure. Then, adjust other places that were doing ad-hoc tests of this type to use the common functions. In RestoreArchivedFile, this fixes a race condition affecting whether the process will report an error or just silently proc_exit(1): before, that depended on whether the intermediate shell got SIGTERM'd itself or reported a child process failing on SIGTERM. Like the previous patch, back-patch to v10; we could go further but there seems no real need to. Per report from Erik Rijkers. Discussion: https://postgr.es/m/f3683f87ab1701bea5d86a7742b22432@xs4all.nl
* Make pg_statistic and related code account more honestly for collations.Tom Lane2018-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we first put in collations support, we basically punted on teaching pg_statistic, ANALYZE, and the planner selectivity functions about that. They've just used DEFAULT_COLLATION_OID independently of the actual collation of the data. It's time to improve that, so: * Add columns to pg_statistic that record the specific collation associated with each statistics slot. * Teach ANALYZE to use the column's actual collation when comparing values for statistical purposes, and record this in the appropriate slot. (Note that type-specific typanalyze functions are now expected to fill stats->stacoll with the appropriate collation, too.) * Teach assorted selectivity functions to use the actual collation of the stats they are looking at, instead of just assuming it's DEFAULT_COLLATION_OID. This should give noticeably better results in selectivity estimates for columns with nondefault collations, at least for query clauses that use that same collation (which would be the default behavior in most cases). It's still true that comparisons with explicit COLLATE clauses different from the stored data's collation won't be well-estimated, but that's no worse than before. Also, this patch does make the first step towards doing better with that, which is that it's now theoretically possible to collect stats for a collation other than the column's own collation. Patch by me; thanks to Peter Eisentraut for review. Discussion: https://postgr.es/m/14706.1544630227@sss.pgh.pa.us
* Introduce new extended routines for FDW and foreign server lookupsMichael Paquier2018-12-14
| | | | | | | | | | | | | | The cache lookup routines for foreign-data wrappers and foreign servers are extended with an extra argument to handle a set of flags. The only value which can be used now is to indicate if a missing object should result in an error or not, and are designed to be extensible on need. Those new routines are added into the existing set of user-visible FDW APIs and documented in consequence. They will be used for future patches to improve the SQL interface for object addresses. Author: Michael Paquier Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com
* Create a separate oid range for oids assigned by genbki.pl.Andres Freund2018-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The changes I made in 578b229718e assigned oids below FirstBootstrapObjectId to objects in include/catalog/*.dat files that did not have an oid assigned, starting at the max oid explicitly assigned. Tom criticized that for mainly two reasons: 1) It's not clear which values are manually and which explicitly assigned. 2) The space below FirstBootstrapObjectId gets pretty crowded, and some PostgreSQL forks have used oids >= 9000 for their own objects, to avoid conflicting. Thus create a new range for objects not assigned explicit oids, but assigned by genbki.pl. For now 1-9999 is for explicitly assigned oids, FirstGenbkiObjectId (10000) to FirstBootstrapObjectId (1200) -1 is for genbki.pl assigned oids, and < FirstNormalObjectId (16384) is for oids assigned during bootstrap. It's possible that we'll have to adjust these boundaries, but there's some headroom for now. Add a note suggesting that oids in forks should be assigned in the 9000-9999 range. Catversion bump for obvious reasons. Per complaint from Tom Lane. Author: Andres Freund Discussion: https://postgr.es/m/16845.1544393682@sss.pgh.pa.us
* Drop no-op CoerceToDomain nodes from expressions at planning time.Tom Lane2018-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a domain has no constraints, then CoerceToDomain doesn't really do anything and can be simplified to a RelabelType. This not only eliminates cycles at execution, but allows the planner to optimize better (for instance, match the coerced expression to an index on the underlying column). However, we do have to support invalidating the plan later if a constraint gets added to the domain. That's comparable to the case of a change to a SQL function that had been inlined into a plan, so all the necessary logic already exists for plans depending on functions. We need only duplicate or share that logic for domains. ALTER DOMAIN ADD/DROP CONSTRAINT need to be taught to send out sinval messages for the domain's pg_type entry, since those operations don't update that row. (ALTER DOMAIN SET/DROP NOT NULL do update that row, so no code change is needed for them.) Testing this revealed what's really a pre-existing bug in plpgsql: it caches the SQL-expression-tree expansion of type coercions and had no provision for invalidating entries in that cache. Up to now that was only a problem if such an expression had inlined a SQL function that got changed, which is unlikely though not impossible. But failing to track changes of domain constraints breaks an existing regression test case and would likely cause practical problems too. We could fix that locally in plpgsql, but what seems like a better idea is to build some generic infrastructure in plancache.c to store standalone expressions and track invalidation events for them. (It's tempting to wonder whether plpgsql's "simple expression" stuff could use this code with lower overhead than its current use of the heavyweight plancache APIs. But I've left that idea for later.) Other stuff fixed in passing: * Allow estimate_expression_value() to drop CoerceToDomain unconditionally, effectively assuming that the coercion will succeed. This will improve planner selectivity estimates for cases involving estimatable expressions that are coerced to domains. We could have done this independently of everything else here, but there wasn't previously any need for eval_const_expressions_mutator to know about CoerceToDomain at all. * Use a dlist for plancache.c's list of cached plans, rather than a manually threaded singly-linked list. That eliminates a potential performance problem in DropCachedPlan. * Fix a couple of inconsistencies in typecmds.c about whether operations on domains drop RowExclusiveLock on pg_type. Our common practice is that DDL operations do drop catalog locks, so standardize on that choice. Discussion: https://postgr.es/m/19958.1544122124@sss.pgh.pa.us
* Prevent GIN deleted pages from being reclaimed too earlyAlexander Korotkov2018-12-13
| | | | | | | | | | | | | | | | | | When GIN vacuum deletes a posting tree page, it assumes that no concurrent searchers can access it, thanks to ginStepRight() locking two pages at once. However, since 9.4 searches can skip parts of posting trees descending from the root. That leads to the risk that page is deleted and reclaimed before concurrent search can access it. This commit prevents the risk of above by waiting for every transaction, which might wait to reference this page, to finish. Due to binary compatibility we can't change GinPageOpaqueData to store corresponding transaction id. Instead we reuse page header pd_prune_xid field, which is unused in index pages. Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Andrey Borodin, Alexander Korotkov Reviewed-by: Alexander Korotkov Backpatch-through: 9.4
* Repair bogus EPQ plans generated for postgres_fdw foreign joins.Tom Lane2018-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | postgres_fdw's postgresGetForeignPlan() assumes without checking that the outer_plan it's given for a join relation must have a NestLoop, MergeJoin, or HashJoin node at the top. That's been wrong at least since commit 4bbf6edfb (which could cause insertion of a Sort node on top) and it seems like a pretty unsafe thing to Just Assume even without that. Through blind good fortune, this doesn't seem to have any worse consequences today than strange EXPLAIN output, but it's clearly trouble waiting to happen. To fix, test the node type explicitly before touching Join-specific fields, and avoid jamming the new tlist into a node type that can't do projection. Export a new support function from createplan.c to avoid building low-level knowledge about the latter into FDWs. Back-patch to 9.6 where the faulty coding was added. Note that the associated regression test cases don't show any changes before v11, apparently because the tests back-patched with 4bbf6edfb don't actually exercise the problem case before then (there's no top-level Sort in those plans). Discussion: https://postgr.es/m/8946.1544644803@sss.pgh.pa.us
* Add timestamp of last received message from standby to pg_stat_replicationMichael Paquier2018-12-09
| | | | | | | | | | | | | | The timestamp generated by the standby at message transmission has been included in the protocol since its introduction for both the status update message and hot standby feedback message, but it has never appeared in pg_stat_replication. Seeing this timestamp does not matter much with a cluster which has a lot of activity, but on a mostly-idle cluster, this makes monitoring able to react faster than the configured timeouts. Author: MyungKyu LIM Reviewed-by: Michael Paquier, Masahiko Sawada Discussion: https://postgr.es/m/1657809367.407321.1533027417725.JavaMail.jboss@ep2ml404
* Cleanup comments in xlog compressionStephen Frost2018-12-06
| | | | | | | | | | | | Skipping over the "hole" in full page images in the XLOG code was described as being a form of compression, but this got a bit confusing since we now have PGLZ-based compression happening, so adjust the wording to discuss "removing" the "hole" and keeping the talk about compression to where we're talking about using PGLZ-based compression of the full page images. Reviewed-By: Kyotaro Horiguchi Discussion: https://postgr.es/m/20181127234341.GM3415@tamriel.snowman.net
* Add log_statement_sample_rate parameterAlvaro Herrera2018-11-29
| | | | | | | | | | This allows to set a lower log_min_duration_statement value without incurring excessive log traffic (which reduces performance). This can be useful to analyze workloads with lots of short queries. Author: Adrien Nayrat Reviewed-by: David Rowley, Vik Fearing Discussion: https://postgr.es/m/c30ee535-ee1e-db9f-fa97-146b9f62caed@anayrat.info
* Add CSV table output mode in psql.Tom Lane2018-11-26
| | | | | | | | | | | | | | | | | | "\pset format csv", or --csv, selects comma-separated values table format. This is compliant with RFC 4180, except that we aren't too picky about whether the record separator is LF or CRLF; also, the user may choose a field separator other than comma. This output format is directly compatible with the server's COPY CSV format, and will also be useful as input to other programs. It's considerably safer for that purpose than the old recommendation to use "unaligned" format, since the latter couldn't handle data containing the field separator character. Daniel Vérité, reviewed by Fabien Coelho and David Fetter, some tweaking by me Discussion: https://postgr.es/m/a8de371e-006f-4f92-ab72-2bbe3ee78f03@manitou-mail.org
* Integrate recovery.conf into postgresql.confPeter Eisentraut2018-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | recovery.conf settings are now set in postgresql.conf (or other GUC sources). Currently, all the affected settings are PGC_POSTMASTER; this could be refined in the future case by case. Recovery is now initiated by a file recovery.signal. Standby mode is initiated by a file standby.signal. The standby_mode setting is gone. If a recovery.conf file is found, an error is issued. The trigger_file setting has been renamed to promote_trigger_file as part of the move. The documentation chapter "Recovery Configuration" has been integrated into "Server Configuration". pg_basebackup -R now appends settings to postgresql.auto.conf and creates a standby.signal file. Author: Fujii Masao <masao.fujii@gmail.com> Author: Simon Riggs <simon@2ndquadrant.com> Author: Abhijit Menon-Sen <ams@2ndquadrant.com> Author: Sergei Kornilov <sk@zsrv.org> Discussion: https://www.postgresql.org/message-id/flat/607741529606767@web3g.yandex.ru/
* Add WL_EXIT_ON_PM_DEATH pseudo-event.Thomas Munro2018-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users of the WaitEventSet and WaitLatch() APIs can now choose between asking for WL_POSTMASTER_DEATH and then handling it explicitly, or asking for WL_EXIT_ON_PM_DEATH to trigger immediate exit on postmaster death. This reduces code duplication, since almost all callers want the latter. Repair all code that was previously ignoring postmaster death completely, or requesting the event but ignoring it, or requesting the event but then doing an unconditional PostmasterIsAlive() call every time through its event loop (which is an expensive syscall on platforms for which we don't have USE_POSTMASTER_DEATH_SIGNAL support). Assert that callers of WaitLatchXXX() under the postmaster remember to ask for either WL_POSTMASTER_DEATH or WL_EXIT_ON_PM_DEATH, to prevent future bugs. The only process that doesn't handle postmaster death is syslogger. It waits until all backends holding the write end of the syslog pipe (including the postmaster) have closed it by exiting, to be sure to capture any parting messages. By using the WaitEventSet API directly it avoids the new assertion, and as a by-product it may be slightly more efficient on platforms that have epoll(). Author: Thomas Munro Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas, Tom Lane Discussion: https://postgr.es/m/CAEepm%3D1TCviRykkUb69ppWLr_V697rzd1j3eZsRMmbXvETfqbQ%40mail.gmail.com, https://postgr.es/m/CAEepm=2LqHzizbe7muD7-2yHUbTOoF7Q+qkSD5Q41kuhttRTwA@mail.gmail.com
* instr_time.h: add INSTR_TIME_SET_CURRENT_LAZYAlvaro Herrera2018-11-21
| | | | | | | | Sets the timestamp to current if not already set. Will acquire more callers momentarily. Author: Fabien Coelho Discussion: https://postgr.es/m/alpine.DEB.2.21.1808111104320.1705@lancre
* Remove WITH OIDS support, change oid catalog column visibility.Andres Freund2018-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
* Improve description of buffer used to store records in WAL readerMichael Paquier2018-11-21
| | | | | | | | | | The dedicated private buffer to store records is used only for these crossing a page boundary since 285bd0ac, but its description did not match completely the reality. Reported-by: Andrey Lepikhov Author: Michael Paquier Discussion: https://postgr.es/m/49518b48-2036-5e43-1818-0f594e375e76@postgrespro.ru
* Add settings to control SSL/TLS protocol versionPeter Eisentraut2018-11-20
| | | | | | | | | | For example: ssl_min_protocol_version = 'TLSv1.1' ssl_max_protocol_version = 'TLSv1.2' Reviewed-by: Steve Singer <steve@ssinger.info> Discussion: https://www.postgresql.org/message-id/flat/1822da87-b862-041a-9fc2-d0310c3da173@2ndquadrant.com
* Reduce unnecessary list construction in RelationBuildPartitionDesc.Robert Haas2018-11-19
| | | | | | | | | | | | | | | The 'partoids' list which was constructed by the previous version of this code was necessarily identical to 'inhoids'. There's no point to duplicating the list, so avoid that. Instead, construct the array representation directly from the original 'inhoids' list. Also, use an array rather than a list for 'boundspecs'. We know exactly how many items we need to store, so there's really no reason to use a list. Using an array instead reduces the number of memory allocations we perform. Patch by me, reviewed by Michael Paquier and Amit Langote, the latter of whom also helped with rebasing.
* PANIC on fsync() failure.Thomas Munro2018-11-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some operating systems, it doesn't make sense to retry fsync(), because dirty data cached by the kernel may have been dropped on write-back failure. In that case the only remaining copy of the data is in the WAL. A subsequent fsync() could appear to succeed, but not have flushed the data. That means that a future checkpoint could apparently complete successfully but have lost data. Therefore, violently prevent any future checkpoint attempts by panicking on the first fsync() failure. Note that we already did the same for WAL data; this change extends that behavior to non-temporary data files. Provide a GUC data_sync_retry to control this new behavior, for users of operating systems that don't eject dirty data, and possibly forensic/testing uses. If it is set to on and the write-back error was transient, a later checkpoint might genuinely succeed (on a system that does not throw away buffers on failure); if the error is permanent, later checkpoints will continue to fail. The GUC defaults to off, meaning that we panic. Back-patch to all supported releases. There is still a narrow window for error-loss on some operating systems: if the file is closed and later reopened and a write-back error occurs in the intervening time, but the inode has the bad luck to be evicted due to memory pressure before we reopen, we could miss the error. A later patch will address that with a scheme for keeping files with dirty data open at all times, but we judge that to be too complicated to back-patch. Author: Craig Ringer, with some adjustments by Thomas Munro Reported-by: Craig Ringer Reviewed-by: Robert Haas, Thomas Munro, Andres Freund Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
* Avoid defining SIGTTIN/SIGTTOU on Windows.Tom Lane2018-11-17
| | | | | | | | | | | Setting them to SIG_IGN seems unlikely to have any beneficial effect on that platform, and given the signal numbering collision with SIGABRT, it could easily have bad effects. Given the lack of field complaints that can be traced to this, I don't presently feel a need to back-patch. Discussion: https://postgr.es/m/5627.1542477392@sss.pgh.pa.us
* Make TupleTableSlots extensible, finish split of existing slot type.Andres Freund2018-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit completes the work prepared in 1a0586de36, splitting the old TupleTableSlot implementation (which could store buffer, heap, minimal and virtual slots) into four different slot types. As described in the aforementioned commit, this is done with the goal of making tuple table slots extensible, to allow for pluggable table access methods. To achieve runtime extensibility for TupleTableSlots, operations on slots that can differ between types of slots are performed using the TupleTableSlotOps struct provided at slot creation time. That includes information from the size of TupleTableSlot struct to be allocated, initialization, deforming etc. See the struct's definition for more detailed information about callbacks TupleTableSlotOps. I decided to rename TTSOpsBufferTuple to TTSOpsBufferHeapTuple and ExecCopySlotTuple to ExecCopySlotHeapTuple, as that seems more consistent with other naming introduced in recent patches. There's plenty optimization potential in the slot implementation, but according to benchmarking the state after this commit has similar performance characteristics to before this set of changes, which seems sufficient. There's a few changes in execReplication.c that currently need to poke through the slot abstraction, that'll be repaired once the pluggable storage patchset provides the necessary infrastructure. Author: Andres Freund and Ashutosh Bapat, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
* Inline hot path of slot_getsomeattrs().Andres Freund2018-11-16
| | | | | | | | This yields a minor speedup, which roughly balances the loss from the upcoming introduction of callbacks to do some operations on slots. Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
* Redesign initialization of partition routing structuresAlvaro Herrera2018-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well as the future MERGE) on partitioned tables. This changes the setup for tuple routing so that it does far less work during the initial setup and pushes more work out to when partitions receive tuples. PartitionDispatchData structs for sub-partitioned tables are only created when a tuple gets routed through it. The possibly large arrays in the PartitionTupleRouting struct have largely been removed. The partitions[] array remains but now never contains any NULL gaps. Previously the NULLs had to be skipped during ExecCleanupTupleRouting(), which could add a large overhead to the cleanup when the number of partitions was large. The partitions[] array is allocated small to start with and only enlarged when we route tuples to enough partitions that it runs out of space. This allows us to keep simple single-row partition INSERTs running quickly. Redesign The arrays in PartitionTupleRouting which stored the tuple translation maps have now been removed. These have been moved out into a PartitionRoutingInfo struct which is an additional field in ResultRelInfo. The find_all_inheritors() call still remains by far the slowest part of ExecSetupPartitionTupleRouting(). This commit just removes the other slow parts. In passing also rename the tuple translation maps from being ParentToChild and ChildToParent to being RootToPartition and PartitionToRoot. The old names mislead you into thinking that a partition of some sub-partitioned table would translate to the rowtype of the sub-partitioned table rather than the root partitioned table. Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera Testing help from Jesper Pedersen and Kato Sho. Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com
* Add dummy field to currently empty struct TupleTableSlotOps.Andres Freund2018-11-15
| | | | Per MSVC complaint on buildfarm member dory.
* Don't generate tuple deforming functions for virtual slots.Andres Freund2018-11-15
| | | | | | | | | | Virtual tuple table slots never need tuple deforming. Therefore, if we know at expression compilation time, that a certain slot will always be virtual, there's no need to create a tuple deforming routine for it. Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
* Compute information about EEOP_*_FETCHSOME at expression init time.Andres Freund2018-11-15
| | | | | | | | | | | | | | Previously this information was computed when JIT compiling an expression. But the information is useful for assertions in the non-JIT case too (for assertions), therefore it makes sense to move it. This will, in a followup commit, allow to treat different slot types differently. E.g. for virtual slots there's no need to generate a JIT function to deform the slot. Author: Andres Freund Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de