aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Add infrastructure to track WAL usage.Amit Kapila2020-04-04
| | | | | | | | | | | | | | | | | | | | | | This allows gathering the WAL generation statistics for each statement execution. The three statistics that we collect are the number of WAL records, the number of full page writes and the amount of WAL bytes generated. This helps the users who have write-intensive workload to see the impact of I/O due to WAL. This further enables us to see approximately what percentage of overall WAL is due to full page writes. In the future, we can extend this functionality to allow us to compute the the exact amount of WAL data due to full page writes. This patch in itself is just an infrastructure to compute WAL usage data. The upcoming patches will expose this data via explain, auto_explain, pg_stat_statements and verbose (auto)vacuum output. Author: Kirill Bychik, Julien Rouhaud Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
* Include chunk overhead in hash table entry size estimate.Jeff Davis2020-04-03
| | | | | | | | Don't try to be precise about it, just use a constant 16 bytes of chunk overhead. Being smarter would require knowing the memory context where the chunk will be allocated, which is not known by all callers. Discussion: https://postgr.es/m/20200325220936.il3ni2fj2j2b45y5@alap3.anarazel.de
* Fix resource management bug with replication=database.Robert Haas2020-04-03
| | | | | | | | | | | | | | | Commit 0d8c9c1210c44b36ec2efcb223a1dfbe897a3661 allowed BASE_BACKUP to acquire a ResourceOwner without a transaction so that the backup manifest functionality could use a BufFile, but it overlooked the fact that when a walsender is used with replication=database, it might have a transaction in progress, because in that mode, SQL and replication commands can be mixed. Try to fix things up so that the two cleanup mechanisms don't conflict. Per buildfarm member serinus, which triggered the problem when CREATE_REPLICATION_SLOT failed from inside a transaction. It passed on the subsequent run, so evidently the failure doesn't happen every time.
* Be more careful about time_t vs. pg_time_t in basebackup.c.Robert Haas2020-04-03
| | | | | | | | | | | | | | | lapwing is complaining that about a call to pg_gmtime, saying that it "expected 'const pg_time_t *' but argument is of type 'time_t *'". I at first thought that the problem had someting to do with const, but Thomas Munro suggested that it might be just because time_t and pg_time_t are different identifers. lapwing is i686 rather than x86_64, and pg_time_t is always int64, so that seems like a good guess. There is other code that just casts time_t to pg_time_t without any conversion function, so try that approach here. Introduced in commit 0d8c9c1210c44b36ec2efcb223a1dfbe897a3661.
* Cosmetic improvements for code related to partitionwise join.Tom Lane2020-04-03
| | | | | | | | | | | | | Move have_partkey_equi_join and match_expr_to_partition_keys to relnode.c, since they're used only there. Refactor build_joinrel_partition_info to split out the code that fills the joinrel's partition key lists; this doesn't have any non-cosmetic impact, but it seems like a useful separation of concerns. Improve assorted nearby comments. Amit Langote, with a little further editorialization by me Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com
* Generate backup manifests for base backups, and validate them.Robert Haas2020-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A manifest is a JSON document which includes (1) the file name, size, last modification time, and an optional checksum for each file backed up, (2) timelines and LSNs for whatever WAL will need to be replayed to make the backup consistent, and (3) a checksum for the manifest itself. By default, we use CRC-32C when checksumming data files, because we are trying to detect corruption and user error, not foil an adversary. However, pg_basebackup and the server-side BASE_BACKUP command now have options to select a different algorithm, so users wanting a cryptographic hash function can select SHA-224, SHA-256, SHA-384, or SHA-512. Users not wanting file checksums at all can disable them, or disable generating of the backup manifest altogether. Using a cryptographic hash function in place of CRC-32C consumes significantly more CPU cycles, which may slow down backups in some cases. A new tool called pg_validatebackup can validate a backup against the manifest. If no checksums are present, it can still check that the right files exist and that they have the expected sizes. If checksums are present, it can also verify that each file has the expected checksum. Additionally, it calls pg_waldump to verify that the expected WAL files are present and parseable. Only plain format backups can be validated directly, but tar format backups can be validated after extracting them. Robert Haas, with help, ideas, review, and testing from David Steele, Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan Chalke, Amit Kapila, Andres Freund, and Noah Misch. Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
* Include information on buffer usage during planning phase, in EXPLAIN ↵Fujii Masao2020-04-04
| | | | | | | | | | | | | | | | | | | | | output, take two. When BUFFERS option is enabled, EXPLAIN command includes the information on buffer usage during each plan node, in its output. In addition to that, this commit makes EXPLAIN command include also the information on buffer usage during planning phase, in its output. This feature makes it easier to discern the cases where lots of buffer access happen during planning. This commit revives the original commit ed7a509571 that was reverted by commit 19db23bcbd. The original commit had to be reverted because it caused the regression test failure on the buildfarm members prion and dory. But since commit c0885c4c30 got rid of the caues of the test failure, the original commit can be safely introduced again. Author: Julien Rouhaud, slightly revised by Fujii Masao Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/16109-26a1a88651e90608@postgresql.org
* Fix bugs in gin_fuzzy_search_limit processing.Tom Lane2020-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | entryGetItem()'s three code paths each contained bugs associated with filtering the entries for gin_fuzzy_search_limit. The posting-tree path failed to advance "advancePast" after having decided to filter an item. If we ran out of items on the current page and needed to advance to the next, what would actually happen is that entryLoadMoreItems() would re-load the same page. Eventually, the random dropItem() test would accept one of the same items it'd previously rejected, and we'd move on --- but it could take awhile with small gin_fuzzy_search_limit. To add insult to injury, this case would inevitably cause entryLoadMoreItems() to decide it needed to re-descend from the root, making things even slower. The posting-list path failed to implement gin_fuzzy_search_limit filtering at all, so that all entries in the posting list would be returned. The bitmap-result path used a "gotitem" variable that it failed to update in the one place where it'd actually make a difference, ie at the one "continue" statement. I think this was unreachable in practice, because if we'd looped around then it shouldn't be the case that the entries on the new page are before advancePast. Still, the "gotitem" variable was contributing nothing to either clarity or correctness, so get rid of it. Refactor all three loops so that the termination conditions are more alike and less unreadable. The code coverage report showed that we had no coverage at all for the re-descend-from-root code path in entryLoadMoreItems(), which seems like a very bad thing, so add a test case that exercises it. We also had exactly no coverage for gin_fuzzy_search_limit, so add a simplistic test case that at least hits those code paths a little bit. Back-patch to all supported branches. Adé Heyward and Tom Lane Discussion: https://postgr.es/m/CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com
* Fix bogus CALLED_AS_TRIGGER() defenses.Tom Lane2020-04-03
| | | | | | | | | | | | | | | | | | | | | contrib/lo's lo_manage() thought it could use trigdata->tg_trigger->tgname in its error message about not being called as a trigger. That naturally led to a core dump. unique_key_recheck() figured it could Assert that fcinfo->context is a TriggerData node in advance of having checked that it's being called as a trigger. That's harmless in production builds, and perhaps not that easy to reach in any case, but it's logically wrong. The first of these per bug #16340 from William Crowell; the second from manual inspection of other CALLED_AS_TRIGGER call sites. Back-patch the lo.c change to all supported branches, the other to v10 where the thinko crept in. Discussion: https://postgr.es/m/16340-591c7449dc7c8c47@postgresql.org
* Revert "Include information on buffer usage during planning phase, in ↵Fujii Masao2020-04-03
| | | | | | | | EXPLAIN output." This reverts commit ed7a5095716ee498ecc406e1b8d5ab92c7662d10. Per buildfarm member prion.
* Add wait events for recovery conflicts.Fujii Masao2020-04-03
| | | | | | | | | | | | | | | | | This commit introduces new wait events RecoveryConflictSnapshot and RecoveryConflictTablespace. The former is reported while waiting for recovery conflict resolution on a vacuum cleanup. The latter is reported while waiting for recovery conflict resolution on dropping tablespace. Also this commit changes the code so that the wait event Lock is reported while waiting in ResolveRecoveryConflictWithVirtualXIDs() for recovery conflict resolution on a lock. Basically the wait event Lock is reported during that wait, but previously was not reported only when that wait happened in ResolveRecoveryConflictWithVirtualXIDs(). Author: Masahiko Sawada Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com
* Include information on buffer usage during planning phase, in EXPLAIN output.Fujii Masao2020-04-03
| | | | | | | | | | | | | When BUFFERS option is enabled, EXPLAIN command includes the information on buffer usage during each plan node, in its output. In addition to that, this commit makes EXPLAIN command include also the information on buffer usage during planning phase, in its output. This feature makes it easier to discern the cases where lots of buffer access happen during planning. Author: Julien Rouhaud, slightly revised by Fujii Masao Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/16109-26a1a88651e90608@postgresql.org
* Improve user control over truncation of logged bind-parameter values.Tom Lane2020-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces the boolean GUC log_parameters_on_error introduced by commit ba79cb5dc with an integer log_parameter_max_length_on_error, adding the ability to specify how many bytes to trim each logged parameter value to. (The previous coding hard-wired that choice at 64 bytes.) In addition, add a new parameter log_parameter_max_length that provides similar control over truncation of query parameters that are logged in response to statement-logging options, as opposed to errors. Previous releases always logged such parameters in full, possibly causing log bloat. For backwards compatibility with prior releases, log_parameter_max_length defaults to -1 (log in full), while log_parameter_max_length_on_error defaults to 0 (no logging). Per discussion, log_parameter_max_length is SUSET since the DBA should control routine logging behavior, but log_parameter_max_length_on_error is USERSET because it also affects errcontext data sent back to the client. Alexey Bashtanov, editorialized a little by me Discussion: https://postgr.es/m/b10493cc-a399-a03a-67c7-068f2791ee50@imap.cc
* Add SQL functions for Unicode normalizationPeter Eisentraut2020-04-02
| | | | | | | | | | | | | | | | | This adds SQL expressions NORMALIZE() and IS NORMALIZED to convert and check Unicode normal forms, per SQL standard. To support fast IS NORMALIZED tests, we pull in a new data file DerivedNormalizationProps.txt from Unicode and build a lookup table from that, using techniques similar to ones already used for other Unicode data. make update-unicode will keep it up to date. We only build and use these tables for the NFC and NFKC forms, because they are too big for NFD and NFKD and the improvement is not significant enough there. Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com
* Add some comments to some SQL featuresPeter Eisentraut2020-04-02
| | | | | | Otherwise, it could be confusing to a reader that some of these well-publicized features are simply listed as unsupported without further explanation.
* Add maintenance_io_concurrency to postgresql.conf.sample.Thomas Munro2020-04-02
| | | | New GUC from commit fc34b0d9.
* Allow parallel vacuum to accumulate buffer usage.Amit Kapila2020-04-02
| | | | | | | | | | | | Commit 40d964ec99 allowed vacuum command to process indexes in parallel but forgot to accumulate the buffer usage stats of parallel workers. This allows leader backend to accumulate buffer usage stats of all the parallel workers. Reported-by: Julien Rouhaud Author: Sawada Masahiko Reviewed-by: Dilip Kumar, Amit Kapila and Julien Rouhaud Discussion: https://postgr.es/m/20200328151721.GB12854@nol
* Collect statistics about SLRU cachesTomas Vondra2020-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a number of SLRU caches used to access important data like clog, commit timestamps, multixact, asynchronous notifications, etc. Until now we had no easy way to monitor these shared caches, compute hit ratios, number of reads/writes etc. This commit extends the statistics collector to track this information for a predefined list of SLRUs, and also introduces a new system view pg_stat_slru displaying the data. The list of built-in SLRUs is fixed, but additional SLRUs may be defined in extensions. Unfortunately, there's no suitable registry of SLRUs, so this patch simply defines a fixed list of SLRUs with entries for the built-in ones and one entry for all additional SLRUs. Extensions adding their own SLRU are fairly rare, so this seems acceptable. This patch only allows monitoring of SLRUs, not tuning. The SLRU sizes are still fixed (hard-coded in the code) and it's not entirely clear which of the SLRUs might need a GUC to tune size. In a way, allowing us to determine that is one of the goals of this patch. Bump catversion as the patch introduces new functions and system view. Author: Tomas Vondra Reviewed-by: Alvaro Herrera Discussion: https://www.postgresql.org/message-id/flat/20200119143707.gyinppnigokesjok@development
* Check equality semantics for unique indexes on partitioned tables.Tom Lane2020-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We require the partition key to be a subset of the set of columns being made unique, so that physically-separate indexes on the different partitions are sufficient to enforce the uniqueness constraint. The existing code checked that the listed columns appear, but did not inquire into the index semantics, which is a serious oversight given that different index opclasses might enforce completely different notions of uniqueness. Ideally, perhaps, we'd just match the partition key opfamily to the index opfamily. But hash partitioning uses hash opfamilies which we can't directly match to btree opfamilies. Hence, look up the equality operator in each family, and accept if it's the same operator. This should be okay in a fairly general sense, since the equality operator ought to precisely represent the opfamily's notion of uniqueness. A remaining weak spot is that we don't have a cross-index-AM notion of which opfamily member is "equality". But we know which one to use for hash and btree AMs, and those are the only two that are relevant here at present. (Any non-core AMs that know how to enforce equality are out of luck, for now.) Back-patch to v11 where this feature was introduced. Guancheng Luo, revised a bit by me Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
* Improve selectivity estimation for assorted match-style operators.Tom Lane2020-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quite a few matching operators such as JSONB's @> used "contsel" and "contjoinsel" as their selectivity estimators. That was a bad idea, because (a) contsel is only a stub, yielding a fixed default estimate, and (b) that default is 0.001, meaning we estimate these operators as five times more selective than equality, which is surely pretty silly. There's a good model for improving this in ltree's ltreeparentsel(): for any "var OP constant" query, we can try applying the operator to all of the column's MCV and histogram values, taking the latter as being a random sample of the non-MCV values. That code is actually 100% generic, except for the question of exactly what default selectivity ought to be plugged in when we don't have stats. Hence, migrate the guts of ltreeparentsel() into the core code, provide wrappers "matchingsel" and "matchingjoinsel" with a more-appropriate default estimate, and use those for the non-geometric operators that formerly used contsel (mostly JSONB containment operators and tsquery matching). Also apply this code to some match-like operators in hstore, ltree, and pg_trgm, including the former users of ltreeparentsel as well as ones that improperly used contsel. Since commit 911e70207 just created new versions of those extensions that we haven't released yet, we can sneak this change into those new versions instead of having to create an additional generation of update scripts. Patch by me, reviewed by Alexey Bashtanov Discussion: https://postgr.es/m/12237.1582833074@sss.pgh.pa.us
* Refactor code to look up local replication tuplePeter Eisentraut2020-04-01
| | | | | | | This unifies some duplicate code. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://www.postgresql.org/message-id/CA+HiwqFjYE5anArxvkjr37AQMd52L-LZtz9Ld2QrLQ3YfcYhTw@mail.gmail.com
* Fix coverity complaint about commit 40d964ec99.Amit Kapila2020-04-01
| | | | | | | | | | | The coverity complained that dividing integer expressions and then converting the integer quotient to type "double" would lose fractional part. Typecasting one of the arguments of expression with double should fix the report. Author: Mahendra Singh Thalor Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/20200329224818.6phnhv7o2q2rfovf@alap3.anarazel.de
* Add CREATE INDEX deduplication assertions.Peter Geoghegan2020-03-31
| | | | | Add two assertions that verify the assumptions about posting list tuple space accounting and suffix truncation made within nbtsort.c.
* Fix race condition in statext_store().Tom Lane2020-03-31
| | | | | | | | | | | | | Must hold some lock on the pg_statistic_ext_data catalog *before* we look up the tuple we aim to replace. Otherwise a concurrent VACUUM FULL or similar operation could move it to a different TID, leaving us trying to replace the wrong tuple. Back-patch to v12 where this got broken. Credit goes to Dean Rasheed; I'm just doing the clerical work. Discussion: https://postgr.es/m/CAEZATCU0zHMDiQV0g8P2U+YSP9C1idUPrn79DajsbonwkN0xvQ@mail.gmail.com
* Improve the message logged when recovery is paused.Fujii Masao2020-04-01
| | | | | | | | | | | | | | | | | When recovery target is reached and recovery is paused because of recovery_target_action=pause, executing pg_wal_replay_resume() causes the standby to promote, i.e., the recovery to end. So, in this case, the previous message "Execute pg_wal_replay_resume() to continue" logged was confusing because pg_wal_replay_resume() doesn't cause the recovery to continue. This commit improves the message logged when recovery is paused, and the proper message is output based on what (pg_wal_replay_pause or recovery_target_action) causes recovery to be paused. Author: Sergei Kornilov, revised by Fujii Masao Reviewed-by: Robert Haas Discussion: https://postgr.es/m/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net
* Teach pg_ls_dir_files() to ignore ENOENT failures from stat().Tom Lane2020-03-31
| | | | | | | | | | | | | | | | Buildfarm experience shows that this function can fail with ENOENT if some other process unlinks a file between when we read the directory entry and when we try to stat() it. The problem is old but we had not noticed it until 085b6b667 added regression test coverage. To fix, just ignore ENOENT failures. There is one other case that this might hide: a symlink that points to nowhere. That seems okay though, at least better than erroring. Back-patch to v10 where this function was added, since the regression test cases were too. Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
* Improve error reporting in opclasscmds.cAlexander Korotkov2020-03-31
| | | | | | This commit improves error reporting introduced by 911e702077. It puts argument of errmsg() to the single line for easier grepping source for error text. Also it improves wording of errhint().
* Fix assorted typosMagnus Hagander2020-03-31
| | | | Author: Daniel Gustafsson <daniel@yesql.se>
* Fix INSERT OVERRIDING USER VALUE behaviorPeter Eisentraut2020-03-31
| | | | | | | | | | | | | The original implementation disallowed using OVERRIDING USER VALUE on identity columns defined as GENERATED ALWAYS, which is not per standard. So allow that now. Expand documentation and tests around this. Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Vik Fearing <vik@postgresfriends.org> Discussion: https://www.postgresql.org/message-id/flat/CAEZATCVrh2ufCwmzzM%3Dk_OfuLhTTPBJCdFkimst2kry4oHepuQ%40mail.gmail.com
* Move routine definitions of xlogarchive.c to a new header fileMichael Paquier2020-03-31
| | | | | | | | | | | | | | The definitions of the routines defined in xlogarchive.c have been part of xlog_internal.h which is included by several frontend tools, but all those routines are only called by the backend. More cleanup could be done within xlog_internal.h, but that's already a nice cut. This will help a follow-up patch for pg_rewind where handling of restore_command is added for frontends. Author: Alexey Kondratov, Michael Paquier Reviewed-by: Álvaro Herrera, Alexander Korotkov Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
* Update SQL featuresPeter Eisentraut2020-03-31
| | | | Set T653 to supported. This has always been possible.
* Avoid calls to RelationGetRelationName() and RelationGetNamespace() inAmit Kapila2020-03-31
| | | | | | | | | | | | | | | | vacuum code. After commit b61d161c14, during vacuum, we cache the information of relation name and relation namespace in local structure LVRelStats so that we can use it in an error callback function. We can use the cached information to avoid the calls to RelationGetRelationName(), RelationGetNamespace() and get_namespace_name(). This is mainly for the consistent in vacuum code path but it will avoid the extra syscache lookup we do in get_namespace_name(). Author: Justin Pryzby Reviewed-by: Amit Kapila Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com
* Further simplify nbtree high key truncation.Peter Geoghegan2020-03-30
| | | | | | | | | Commit 7c2dbc69 reorganized _bt_truncate() in a way that enables a further simplification that I (pgeoghegan) missed: Since we mark the tuple that is returned to the caller as a pivot tuple before the point where its heap TID is set as of 7c2dbc69, it is possible to use the high level BTreeTupleGetHeapTID() inline function to get an item pointer. Do it that way now. This approach is clearer and more maintainable.
* Revert "Skip redundant anti-wraparound vacuums"Michael Paquier2020-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 2aa6e33, that added a fast path to skip anti-wraparound and non-aggressive autovacuum jobs (these have no sense as anti-wraparound implies aggressive). With a cluster using a high amount of relations with a portion of them being heavily updated, this could cause autovacuum to lock down, with autovacuum workers attempting repeatedly those jobs on the same relations for the same database, that just kept being skipped. This lock down can be solved with a manual VACUUM FREEZE. Justin King has reported one environment where the issue happened, and Julien Rouhaud and I have been able to reproduce it in a second environment. With a very aggressive autovacuum_freeze_max_age, triggering those jobs with pgbench is a matter of minutes, and hitting the lock down is a lot harder (my local tests failed to do that). Note that anti-wraparound and non-aggressive jobs can only be triggered on a subset of shared catalogs: - pg_auth_members - pg_authid - pg_database - pg_replication_origin - pg_shseclabel - pg_subscription - pg_tablespace While the lock down was possible down to v12, the root cause of those jobs is a much older issue, which needs more analysis. Bonus thanks to Andres Freund for the discussion. Reported-by: Justin King Discussion: https://postgr.es/m/CAE39h22zPLrkH17GrkDgAYL3kbjvySYD1io+rtnAUFnaJJVS4g@mail.gmail.com Backpatch-through: 12
* Refactor nbtree high key truncation.Peter Geoghegan2020-03-30
| | | | | | | | | | | | | Simplify _bt_truncate(), the routine that generates truncated leaf page high keys. Remove a micro-optimization that avoided a second palloc0() call (this was used when a heap TID was needed in the final pivot tuple, though only when the index happened to not be an INCLUDE index). Removing this dubious micro-optimization allows _bt_truncate() to use the index_truncate_tuple() indextuple.c utility routine in all cases. This was already the common case. This commit is a HEAD-only follow up to bugfix commit 4b42a899.
* Deduplicate PageIsNew() check in lazy_scan_heap().Andres Freund2020-03-30
| | | | | | | | | | | The recheck isn't needed anymore, as RelationGetBufferForTuple() now extends the relation with RBM_ZERO_AND_LOCK. Previously we needed to handle the fact that relation extension extended the relation and then separately acquired a lock on the page - while expecting that the page is empty. Reported-By: Ranier Vilela Discussion: https://postgr.es/m/CAEudQArA_=J0D5T258xsCY6Xtf6wiH4b=QDPDgVS+WZUN10WDw@mail.gmail.com
* Fix missing SP-GiST support in 911e702077Alexander Korotkov2020-03-30
| | | | | 911e702077 misses setting of amoptsprocnum for SP-GiST. This commit fixes that.
* Remove rudiments of supporting procnum == 0 from 911e702077Alexander Korotkov2020-03-30
| | | | | | Early versions of opclass options patch uses zero support procedure as opclass options procedure. This commit removes rudiments of it, which were committed in 911e702077. Also, it implements correct handling of amoptsprocnum == 0.
* Consistently truncate non-key suffix columns.Peter Geoghegan2020-03-30
| | | | | | | | | | | | | | | INCLUDE indexes failed to have their non-key attributes physically truncated away in certain rare cases. This led to physically larger pivot tuples that contained useless non-key attribute values. The impact on users should be negligible, but this is still clearly a regression (Postgres 11 supports INCLUDE indexes, and yet was not affected). The bug appeared in commit dd299df8, which introduced "true" suffix truncation of key attributes. Discussion: https://postgr.es/m/CAH2-Wz=E8pkV9ivRSFHtv812H5ckf8s1-yhx61_WrJbKccGcrQ@mail.gmail.com Backpatch: 12-, where "true" suffix truncation was introduced.
* Implement operator class parametersAlexander Korotkov2020-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
* Report waiting via PS while recovery is waiting for buffer pin in hot standby.Fujii Masao2020-03-30
| | | | | | | | | | | | | Previously while the startup process was waiting for the recovery conflict with snapshot, tablespace or lock to be resolved, waiting was reported in PS display, but not in the case of recovery conflict with buffer pin. This commit makes the startup process in hot standby report waiting via PS while waiting for the conflicts with other backends holding buffer pins to be resolved. Author: Masahiko Sawada Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com
* Improve handling of parameter differences in physical replicationPeter Eisentraut2020-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When certain parameters are changed on a physical replication primary, this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record. The standby then checks whether its own settings are at least as big as the ones on the primary. If not, the standby shuts down with a fatal error. The correspondence of settings between primary and standby is required because those settings influence certain shared memory sizings that are required for processing WAL records that the primary might send. For example, if the primary sends a prepared transaction, the standby must have had max_prepared_transaction set appropriately or it won't be able to process those WAL records. However, fatally shutting down the standby immediately upon receipt of the parameter change record might be a bit of an overreaction. The resources related to those settings are not required immediately at that point, and might never be required if the activity on the primary does not exhaust all those resources. If we just let the standby roll on with recovery, it will eventually produce an appropriate error when those resources are used. So this patch relaxes this a bit. Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but only issue a warning and set a global flag if there is a problem. Then when we actually hit the resource issue and the flag was set, we issue another warning message with relevant information. At that point we pause recovery, so a hot standby remains usable. We also repeat the last warning message once a minute so it is harder to miss or ignore. Reviewed-by: Sergei Kornilov <sk@zsrv.org> Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
* Add new part SQL/MDA to information_schema.sql_partsPeter Eisentraut2020-03-30
|
* Allow the planner-related functions and hook to accept the query string.Fujii Masao2020-03-30
| | | | | | | | | | | | | | | | | | This commit adds query_string argument into the planner-related functions and hook and allows us to pass the query string to them. Currently there is no user of the query string passed. But the upcoming patch for the planning counters will add the planning hook function into pg_stat_statements and the function will need the query string. So this change will be necessary for that patch. Also this change is useful for some extensions that want to use the query string in their planner hook function. Author: Pascal Legrand, Julien Rouhaud Reviewed-by: Yoshikazu Imai, Tom Lane, Fujii Masao Discussion: https://postgr.es/m/CAOBaU_bU1m3_XF5qKYtSj1ua4dxd=FWDyh2SH4rSJAUUfsGmAQ@mail.gmail.com Discussion: https://postgr.es/m/1583789487074-0.post@n3.nabble.com
* Expose BufferUsageAccumDiff().Fujii Masao2020-03-30
| | | | | | | | | | | | | | | | Previously pg_stat_statements calculated the difference of buffer counters by its own code even while BufferUsageAccumDiff() had the same code. This commit expose BufferUsageAccumDiff() and makes pg_stat_statements use it for the calculation, in order to simply the code. This change also would be useful for the upcoming patch for the planning counters in pg_stat_statements because the patch will add one more code for the calculation of difference of buffer counters and that can easily be done by using BufferUsageAccumDiff(). Author: Julien Rouhaud Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/bdfee4e0-a304-2498-8da5-3cb52c0a193e@oss.nttdata.com
* Introduce vacuum errcontext to display additional information.Amit Kapila2020-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | The additional information displayed will be block number for error occurring while processing heap and index name for error occurring while processing the index. This will help us in diagnosing the problems that occur during a vacuum. For ex. due to corruption (either caused by bad hardware or by some bug) if we get some error while vacuuming, it can help us identify the block in heap and or additional index information. It sets up an error context callback to display additional information with the error. During different phases of vacuum (heap scan, heap vacuum, index vacuum, index clean up, heap truncate), we update the error context callback to display appropriate information. We can extend it to a bit more granular level like adding the phases for FSM operations or for prefetching the blocks while truncating. However, I felt that it requires adding many more error callback function calls and can make the code a bit complex, so left those for now. Author: Justin Pryzby, with few changes by Amit Kapila Reviewed-by: Alvaro Herrera, Amit Kapila, Andres Freund, Michael Paquier and Sawada Masahiko Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com
* Update SQL featuresPeter Eisentraut2020-03-29
| | | | | Change F181 to supported. It requires that an embedded C program can be split across multiple files, which ECPG easily supports.
* Make deduplication use number of key attributes.Peter Geoghegan2020-03-28
| | | | | | | | Use IndexRelationGetNumberOfKeyAttributes() rather than IndexRelationGetNumberOfAttributes() when determining whether or not two index tuples are suitable for merging together into a single posting list tuple. This is a little bit tidier. It brings affected code in nbtdedup.c a little closer to similar, related code in nbtsplitloc.c.
* Ensure snapshot is registered within ScanPgRelation().Andres Freund2020-03-28
| | | | | | | | | | | | | | | | | | | | | In 9.4 I added support to use a historical snapshot in ScanPgRelation(), while adding logical decoding. Unfortunately a conflict with the concurrent removal of SnapshotNow was incorrectly resolved, leading to an unregistered snapshot being used. It is not correct to use an unregistered (or non-active) snapshot for anything non-trivial, because catalog invalidations can cause the snapshot to be invalidated. Luckily it seems unlikely to actively cause problems in practice, as ScanPgRelation() requires that we already have a lock on the relation, we only look for a single row, and we don't appear to rely on the result's tid to be correct. It however is clearly wrong and potential negative consequences would likely be hard to find. So it seems worth backpatching the fix, even without a concrete hazard. Discussion: https://postgr.es/m/20200229052459.wzhqnbhrriezg4v2@alap3.anarazel.de Backpatch: 9.5-
* Fix costing for disk-based hash aggregation.Jeff Davis2020-03-28
| | | | | | Report and suggestions from Richard Guo and Tomas Vondra. Discussion: https://postgr.es/m/CAMbWs4_W8fYbAn8KxgidAaZHON_Oo08OYn9ze=7remJymLqo5g@mail.gmail.com