aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
...
* Avoid calls to RelationGetRelationName() and RelationGetNamespace() inAmit Kapila2020-03-31
| | | | | | | | | | | | | | | | vacuum code. After commit b61d161c14, during vacuum, we cache the information of relation name and relation namespace in local structure LVRelStats so that we can use it in an error callback function. We can use the cached information to avoid the calls to RelationGetRelationName(), RelationGetNamespace() and get_namespace_name(). This is mainly for the consistent in vacuum code path but it will avoid the extra syscache lookup we do in get_namespace_name(). Author: Justin Pryzby Reviewed-by: Amit Kapila Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com
* Further simplify nbtree high key truncation.Peter Geoghegan2020-03-30
| | | | | | | | | Commit 7c2dbc69 reorganized _bt_truncate() in a way that enables a further simplification that I (pgeoghegan) missed: Since we mark the tuple that is returned to the caller as a pivot tuple before the point where its heap TID is set as of 7c2dbc69, it is possible to use the high level BTreeTupleGetHeapTID() inline function to get an item pointer. Do it that way now. This approach is clearer and more maintainable.
* Revert "Skip redundant anti-wraparound vacuums"Michael Paquier2020-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 2aa6e33, that added a fast path to skip anti-wraparound and non-aggressive autovacuum jobs (these have no sense as anti-wraparound implies aggressive). With a cluster using a high amount of relations with a portion of them being heavily updated, this could cause autovacuum to lock down, with autovacuum workers attempting repeatedly those jobs on the same relations for the same database, that just kept being skipped. This lock down can be solved with a manual VACUUM FREEZE. Justin King has reported one environment where the issue happened, and Julien Rouhaud and I have been able to reproduce it in a second environment. With a very aggressive autovacuum_freeze_max_age, triggering those jobs with pgbench is a matter of minutes, and hitting the lock down is a lot harder (my local tests failed to do that). Note that anti-wraparound and non-aggressive jobs can only be triggered on a subset of shared catalogs: - pg_auth_members - pg_authid - pg_database - pg_replication_origin - pg_shseclabel - pg_subscription - pg_tablespace While the lock down was possible down to v12, the root cause of those jobs is a much older issue, which needs more analysis. Bonus thanks to Andres Freund for the discussion. Reported-by: Justin King Discussion: https://postgr.es/m/CAE39h22zPLrkH17GrkDgAYL3kbjvySYD1io+rtnAUFnaJJVS4g@mail.gmail.com Backpatch-through: 12
* Refactor nbtree high key truncation.Peter Geoghegan2020-03-30
| | | | | | | | | | | | | Simplify _bt_truncate(), the routine that generates truncated leaf page high keys. Remove a micro-optimization that avoided a second palloc0() call (this was used when a heap TID was needed in the final pivot tuple, though only when the index happened to not be an INCLUDE index). Removing this dubious micro-optimization allows _bt_truncate() to use the index_truncate_tuple() indextuple.c utility routine in all cases. This was already the common case. This commit is a HEAD-only follow up to bugfix commit 4b42a899.
* Deduplicate PageIsNew() check in lazy_scan_heap().Andres Freund2020-03-30
| | | | | | | | | | | The recheck isn't needed anymore, as RelationGetBufferForTuple() now extends the relation with RBM_ZERO_AND_LOCK. Previously we needed to handle the fact that relation extension extended the relation and then separately acquired a lock on the page - while expecting that the page is empty. Reported-By: Ranier Vilela Discussion: https://postgr.es/m/CAEudQArA_=J0D5T258xsCY6Xtf6wiH4b=QDPDgVS+WZUN10WDw@mail.gmail.com
* Fix missing SP-GiST support in 911e702077Alexander Korotkov2020-03-30
| | | | | 911e702077 misses setting of amoptsprocnum for SP-GiST. This commit fixes that.
* Remove rudiments of supporting procnum == 0 from 911e702077Alexander Korotkov2020-03-30
| | | | | | Early versions of opclass options patch uses zero support procedure as opclass options procedure. This commit removes rudiments of it, which were committed in 911e702077. Also, it implements correct handling of amoptsprocnum == 0.
* Consistently truncate non-key suffix columns.Peter Geoghegan2020-03-30
| | | | | | | | | | | | | | | INCLUDE indexes failed to have their non-key attributes physically truncated away in certain rare cases. This led to physically larger pivot tuples that contained useless non-key attribute values. The impact on users should be negligible, but this is still clearly a regression (Postgres 11 supports INCLUDE indexes, and yet was not affected). The bug appeared in commit dd299df8, which introduced "true" suffix truncation of key attributes. Discussion: https://postgr.es/m/CAH2-Wz=E8pkV9ivRSFHtv812H5ckf8s1-yhx61_WrJbKccGcrQ@mail.gmail.com Backpatch: 12-, where "true" suffix truncation was introduced.
* Implement operator class parametersAlexander Korotkov2020-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
* Improve handling of parameter differences in physical replicationPeter Eisentraut2020-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When certain parameters are changed on a physical replication primary, this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record. The standby then checks whether its own settings are at least as big as the ones on the primary. If not, the standby shuts down with a fatal error. The correspondence of settings between primary and standby is required because those settings influence certain shared memory sizings that are required for processing WAL records that the primary might send. For example, if the primary sends a prepared transaction, the standby must have had max_prepared_transaction set appropriately or it won't be able to process those WAL records. However, fatally shutting down the standby immediately upon receipt of the parameter change record might be a bit of an overreaction. The resources related to those settings are not required immediately at that point, and might never be required if the activity on the primary does not exhaust all those resources. If we just let the standby roll on with recovery, it will eventually produce an appropriate error when those resources are used. So this patch relaxes this a bit. Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but only issue a warning and set a global flag if there is a problem. Then when we actually hit the resource issue and the flag was set, we issue another warning message with relevant information. At that point we pause recovery, so a hot standby remains usable. We also repeat the last warning message once a minute so it is harder to miss or ignore. Reviewed-by: Sergei Kornilov <sk@zsrv.org> Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
* Introduce vacuum errcontext to display additional information.Amit Kapila2020-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | The additional information displayed will be block number for error occurring while processing heap and index name for error occurring while processing the index. This will help us in diagnosing the problems that occur during a vacuum. For ex. due to corruption (either caused by bad hardware or by some bug) if we get some error while vacuuming, it can help us identify the block in heap and or additional index information. It sets up an error context callback to display additional information with the error. During different phases of vacuum (heap scan, heap vacuum, index vacuum, index clean up, heap truncate), we update the error context callback to display appropriate information. We can extend it to a bit more granular level like adding the phases for FSM operations or for prefetching the blocks while truncating. However, I felt that it requires adding many more error callback function calls and can make the code a bit complex, so left those for now. Author: Justin Pryzby, with few changes by Amit Kapila Reviewed-by: Alvaro Herrera, Amit Kapila, Andres Freund, Michael Paquier and Sawada Masahiko Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com
* Make deduplication use number of key attributes.Peter Geoghegan2020-03-28
| | | | | | | | Use IndexRelationGetNumberOfKeyAttributes() rather than IndexRelationGetNumberOfAttributes() when determining whether or not two index tuples are suitable for merging together into a single posting list tuple. This is a little bit tidier. It brings affected code in nbtdedup.c a little closer to similar, related code in nbtsplitloc.c.
* Trigger autovacuum based on number of INSERTsDavid Rowley2020-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally autovacuum has only ever invoked a worker based on the estimated number of dead tuples in a table and for anti-wraparound purposes. For the latter, with certain classes of tables such as insert-only tables, anti-wraparound vacuums could be the first vacuum that the table ever receives. This could often lead to autovacuum workers being busy for extended periods of time due to having to potentially freeze every page in the table. This could be particularly bad for very large tables. New clusters, or recently pg_restored clusters could suffer even more as many large tables may have the same relfrozenxid, which could result in large numbers of tables requiring an anti-wraparound vacuum all at once. Here we aim to reduce the work required by anti-wraparound and aggressive vacuums in general, by triggering autovacuum when the table has received enough INSERTs. This is controlled by adding two new GUCs and reloptions; autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor. These work exactly the same as the existing scale factor and threshold controls, only base themselves off the number of inserts since the last vacuum, rather than the number of dead tuples. New controls were added rather than reusing the existing controls, to allow these new vacuums to be tuned independently and perhaps even completely disabled altogether, which can be done by setting autovacuum_vacuum_insert_threshold to -1. We make no attempt to skip index cleanup operations on these vacuums as they may trigger for an insert-mostly table which continually doesn't have enough dead tuples to trigger an autovacuum for the purpose of removing those dead tuples. If we were to skip cleaning the indexes in this case, then it is possible for the index(es) to become bloated over time. There are additional benefits to triggering autovacuums based on inserts, as tables which never contain enough dead tuples to trigger an autovacuum are now more likely to receive a vacuum, which can mark more of the table as "allvisible" and encourage the query planner to make use of Index Only Scans. Currently, we still obey vacuum_freeze_min_age when triggering these new autovacuums based on INSERTs. For large insert-only tables, it may be beneficial to lower the table's autovacuum_freeze_min_age so that tuples are eligible to be frozen sooner. Here we've opted not to zero that for these types of vacuums, since the table may just be insert-mostly and we may otherwise freeze tuples that are still destined to be updated or removed in the near future. There was some debate to what exactly the new scale factor and threshold should default to. For now, these are set to 0.2 and 1000, respectively. There may be some motivation to adjust these before the release. Author: Laurenz Albe, Darafei Praliaskouski Reviewed-by: Alvaro Herrera, Masahiko Sawada, Chris Travers, Andres Freund, Justin Pryzby Discussion: https://postgr.es/m/CAC8Q8t%2Bj36G_bLF%3D%2B0iMo6jGNWnLnWb1tujXuJr-%2Bx8ZCCTqoQ%40mail.gmail.com
* Justify nbtree page split locking in code comment.Peter Geoghegan2020-03-27
| | | | | | | | | | | | | | | Delaying unlocking the right child page until after the point that the left child's parent page has been refound is no longer truly necessary. Commit 40dae7ec made nbtree tolerant of interrupted page splits. VACUUM was taught to avoid deleting a page that happens to be the right half of an incomplete split. As long as page splits don't unlock the left child page until the end of the second/final phase, it should be safe to unlock the right child page earlier (at the end of the first phase). It probably isn't actually useful to release the right child's lock earlier like this (it probably won't improve performance). Even still, pointing out that it ought to be safe to do so should make it easier to understand the overall design.
* Allow walreceiver configuration to change on reloadAlvaro Herrera2020-03-27
| | | | | | | | | | | | | | | The parameters primary_conninfo, primary_slot_name and wal_receiver_create_temp_slot can now be changed with a simple "reload" signal, no longer requiring a server restart. This is achieved by signalling the walreceiver process to terminate and having it start again with the new values. Thanks to Andres Freund, Kyotaro Horiguchi, Fujii Masao for discussion. Author: Sergei Kornilov <sk@zsrv.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/19513901543181143@sas1-19a94364928d.qloud-c.yandex.net
* Set wal_receiver_create_temp_slot PGC_POSTMASTERAlvaro Herrera2020-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | Commit 329730827848 gave walreceiver the ability to create and use a temporary replication slot, and made it controllable by a GUC (enabled by default) that can be changed with SIGHUP. That's useful but has two problems: one, it's possible to cause the origin server to fill its disk if the slot doesn't advance in time; and also there's a disconnect between state passed down via the startup process and GUCs that walreceiver reads directly. We handle the first problem by setting the option to disabled by default. If the user enables it, its on their head to make sure that disk doesn't fill up. We handle the second problem by passing the flag via startup rather than having walreceiver acquire it directly, and making it PGC_POSTMASTER (which ensures a walreceiver always has the fresh value). A future commit can relax this (to PGC_SIGHUP again) by having the startup process signal walreceiver to shutdown whenever the value changes. Author: Sergei Kornilov <sk@zsrv.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200122055510.GH174860@paquier.xyz
* Fix nbtree deduplication README commentary.Peter Geoghegan2020-03-24
| | | | | Descriptions of some aspects of how deduplication works were unclear in a couple of places.
* Prefer standby promotion over recovery pause.Fujii Masao2020-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously if a promotion was triggered while recovery was paused, the paused state continued. Also recovery could be paused by executing pg_wal_replay_pause() even while a promotion was ongoing. That is, recovery pause had higher priority over a standby promotion. But this behavior was not desirable because most users basically wanted the recovery to complete as soon as possible and the server to become the master when they requested a promotion. This commit changes recovery so that it prefers a promotion over recovery pause. That is, if a promotion is triggered while recovery is paused, the paused state ends and a promotion continues. Also this commit makes recovery pause functions like pg_wal_replay_pause() throw an error if they are executed while a promotion is ongoing. Internally, this commit adds new internal function PromoteIsTriggered() that returns true if a promotion is triggered. Since the name of this function and the existing function IsPromoteTriggered() are confusingly similar, the commit changes the name of IsPromoteTriggered() to IsPromoteSignaled, as more appropriate name. Author: Fujii Masao Reviewed-by: Atsushi Torikoshi, Sergei Kornilov Discussion: https://postgr.es/m/00c194b2-dbbb-2e8a-5b39-13f14048ef0a@oss.nttdata.com
* Move routine building restore_command to src/common/Michael Paquier2020-03-24
| | | | | | | | | | restore_command has only been used until now by the backend, but there is a pending patch for pg_rewind to make use of that in the frontend. Author: Alexey Kondratov Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander Korotkov, Michael Paquier Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
* Add wait events for WAL archive and recovery pause.Fujii Masao2020-03-24
| | | | | | | | | | | This commit introduces new wait events BackupWaitWalArchive and RecoveryPause. The former is reported while waiting for the WAL files required for the backup to be successfully archived. The latter is reported while waiting for recovery in pause state to be resumed. Author: Fujii Masao Reviewed-by: Michael Paquier, Atsushi Torikoshi, Robert Haas Discussion: https://postgr.es/m/f0651f8c-9c96-9f29-0ff9-80414a15308a@oss.nttdata.com
* Revert "Skip WAL for new relfilenodes, under wal_level=minimal."Noah Misch2020-03-22
| | | | | | | | This reverts commit cb2fd7eac285b1b0a24eeb2b8ed4456b66c5a09f. Per numerous buildfarm members, it was incompatible with parallel query, and a test case assumed LP64. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
* Skip WAL for new relfilenodes, under wal_level=minimal.Noah Misch2020-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Back-patch to 9.5 (all supported versions). This introduces a new WAL record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As always, update standby systems before master systems. This changes sizeof(RelationData) and sizeof(IndexStmt), breaking binary compatibility for affected extensions. (The most recent commit to affect the same class of extensions was 089e4d405d0f3b94c74a2c6a54357a84a681754b.) Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
* In log_newpage_range(), heed forkNum and page_std arguments.Noah Misch2020-03-21
| | | | | | | | | The function assumed forkNum=MAIN_FORKNUM and page_std=true, ignoring the actual arguments. Existing callers passed exactly those values, so there's no live bug. Back-patch to v12, where the function first appeared, because another fix needs this. Discussion: https://postgr.es/m/20191118045434.GA1173436@rfd.leadboat.com
* During heap rebuild, lock any TOAST index until end of transaction.Noah Misch2020-03-21
| | | | | | | | | | | | swap_relation_files() calls toast_get_valid_index() to find and lock this index, just before swapping with the rebuilt TOAST index. The latter function releases the lock before returning. Potential for mischief is low; a concurrent session can issue ALTER INDEX ... SET (fillfactor = ...), which is not alarming. Nonetheless, changing pg_class.relfilenode without a lock is unconventional. Back-patch to 9.5 (all supported versions), because another fix needs this. Discussion: https://postgr.es/m/20191226001521.GA1772687@rfd.leadboat.com
* nbtree: Remove obsolete _bt_pgaddtup() comments.Peter Geoghegan2020-03-19
| | | | | | Remove comments that are a throw back to a time when nbtree cared about write-ordering dependencies. The comments are similar to those removed by commit 9ee7414e, among others.
* Rename the recovery-related wait events.Fujii Masao2020-03-19
| | | | | | | | | | | | | | | | | | | | This commit renames RecoveryWalAll and RecoveryWalStream wait events to RecoveryWalStream and RecoveryRetrieveRetryInterval, respectively, in order to make the names and what they are more consistent. For example, previously RecoveryWalAll was reported as a wait event while the recovery was waiting for WAL from a stream, and which was confusing because the name was very different from the situation where the wait actually could happen. The names of macro variables for those wait events also are renamed accordingly. This commit also changes the category of RecoveryRetrieveRetryInterval to Timeout from Activity because the wait event is reported while waiting based on wal_retrieve_retry_interval. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi, Atsushi Torikoshi Discussion: https://postgr.es/m/124997ee-096a-5d09-d8da-2c7a57d0816e@oss.nttdata.com
* nbtree: Use raw PageAddItem() for retail inserts.Peter Geoghegan2020-03-18
| | | | | | | | | | | | | | | | | | | | Only internal page splits need to call _bt_pgaddtup() instead of PageAddItem(), and only for data items, one of which will end up at the first offset (or first offset after the high key offset) on the new right page. This data item alone will need to be truncated in _bt_pgaddtup(). Since there is no reason why retail inserts ever need to truncate the incoming item, use a raw PageAddItem() call there instead. Even _bt_split() uses raw PageAddItem() calls for left page and right page high keys. Clearly the _bt_pgaddtup() shim function wasn't really encapsulating anything. _bt_pgaddtup() should now be thought of as a _bt_split() helper function. Note that the assertions from commit d1e241c2 verify that retail inserts never insert an item at an internal page's negative infinity offset. This invariant could only ever be violated as a result of a basic logic error in nbtinsert.c.
* Refactor nbtree fastpath optimization.Peter Geoghegan2020-03-18
| | | | | | | | | | | | | | | | | | Commit 2b272734, which added the fastpath rightmost leaf page cache insert optimization, added code to _bt_doinsert() to handle using and invalidating the backend local block cache. It doesn't seem like a good place to handle these low level details, though. _bt_doinsert() is supposed to be a high level function -- it is the main entry point to nbtinsert.c. Restructure the code by placing handling of the rightmost block cache at the start of a new _bt_search() shim function, _bt_search_insert(). The new function is called from _bt_doinsert(), which uses it as a _bt_search() variant that conveniently accepts its BTInsertState state as an argument. _bt_doinsert() no longer needs to directly consider the fastpath optimization. Discussion: https://postgr.es/m/CAH2-Wzk59cxKJRd=rfbyub6-V4yWRjsOYRkUNHBLT1P1GdtCQQ@mail.gmail.com
* nbtree: Remove useless local variables.Peter Geoghegan2020-03-17
| | | | | | | | | | | | | | Copying block and offset numbers to local variables in _bt_insertonpg() made the code less readable. Remove the variables. There is already code that conditionally calls BufferGetBlockNumber() in the same block, so consistently do it that way instead. Calling BufferGetBlockNumber() is very cheap, but we might as well avoid it when it isn't truly necessary. It isn't truly necessary for _bt_insertonpg() to call BufferGetBlockNumber() in almost all cases. Spotted while working on a patch that refactors the fastpath rightmost leaf page cache optimization, which was added by commit 2b272734.
* Fix comment in xlog.c.Fujii Masao2020-03-17
| | | | | | | | This commit fixes the comment about SharedHotStandbyActive variable. The comment was apparently copy-and-pasted. Author: Atsushi Torikoshi Discussion: https://postgr.es/m/CACZ0uYEjpqZB9wN2Rwc_RMvDybyYqdbkPuDr1NyxJg4f9yGfMw@mail.gmail.com
* Remove useless pfree()s at the ends of various ValuePerCall SRFs.Tom Lane2020-03-16
| | | | | | | | | | | | | | | | | | | | We don't need to manually clean up allocations in a SRF's multi_call_memory_ctx, because the SRF_RETURN_DONE infrastructure takes care of that (and also ensures that it will happen even if the function never gets a final call, which simple manual cleanup cannot do). Hence, the code removed by this patch is a waste of code and cycles. Worse, it gives the impression that cleaning up manually is a thing, which can lead to more serious errors such as those fixed in commits 085b6b667 and b4570d33a. So we should get rid of it. These are not quite actual bugs though, so I couldn't muster the enthusiasm to back-patch. Fix in HEAD only. Justin Pryzby Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
* nbtree: Fix obsolete _bt_search() comment.Peter Geoghegan2020-03-16
| | | | Oversight in commit d2086b08b02.
* nbtree: Pass down MAXALIGN()'d itemsz for new item.Peter Geoghegan2020-03-16
| | | | | | | | | | | | | | Refactor nbtinsert.c so that the final itemsz of each new non-pivot tuple (the MAXALIGN()'d size) is determined once. Most of the functions used by leaf page inserts used the insertstate.itemsz value already. This commit makes everything use insertstate.itemsz as standard practice. The goal is to decouple tuple size from "effective" tuple size. Making this distinction isn't truly necessary right now, but that might change in the future. Also explain why we consistently apply MAXALIGN() to get an effective index tuple size. This was rather unclear, in part because it isn't actually strictly necessary right now.
* Introduce a maintenance_io_concurrency setting.Thomas Munro2020-03-16
| | | | | | | | | | | | | Introduce a GUC and a tablespace option to control I/O prefetching, much like effective_io_concurrency, but for work that is done on behalf of many client sessions. Use the new setting in heapam.c instead of the hard-coded formula effective_io_concurrency + 10 introduced by commit 558a9165e08. Go with a default value of 10 for now, because it's a round number pretty close to the value used for that existing case. Discussion: https://postgr.es/m/CA%2BhUKGJUw08dPs_3EUcdO6M90GnjofPYrWp4YSLaBkgYwS-AqA%40mail.gmail.com
* nbtree: Reorder nbtinsert.c prototypes.Peter Geoghegan2020-03-15
| | | | | Relocate _bt_newroot() prototype, so that the order that prototypes appear in matches the order that the functions are defined in.
* Refactor ps_status.c APIPeter Eisentraut2020-03-11
| | | | | | | | | | | | | | | | | | | | | | The init_ps_display() arguments were mostly lies by now, so to match typical usage, just use one argument and let the caller assemble it from multiple sources if necessary. The only user of the additional arguments is BackendInitialize(), which was already doing string assembly on the caller side anyway. Remove the second argument of set_ps_display() ("force") and just handle that in init_ps_display() internally. BackendInitialize() also used to set the initial status as "authentication", but that was very far from where authentication actually happened. So now it's set to "initializing" and then "authentication" just before the actual call to ClientAuthentication(). Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
* Remove HAVE_WORKING_LINKPeter Eisentraut2020-03-11
| | | | | | | | | | | | | | Previously, hard links were not used on Windows and Cygwin, but they support them just fine in currently supported OS versions, so we can use them there as well. Since all supported platforms now support hard links, we can remove the alternative code paths. Rename durable_link_or_rename() to durable_rename_excl() to make the purpose more clear without referencing the implementation details. Discussion: https://www.postgresql.org/message-id/flat/72fff73f-dc9c-4ef4-83e8-d2e60c98df48%402ndquadrant.com
* nbtree: Move fastpath NULL descent stack assertion.Peter Geoghegan2020-03-10
| | | | | | | | | | | | | | | | Commit 074251db added an assertion that verified the fastpath/rightmost page insert optimization's assumption about free space: There should always be enough free space on the page to insert the new item without splitting the page. Otherwise, we end up using the "concurrent root page split" phony/fake stack path in _bt_insert_parent(). This does not lead to incorrect behavior, but it is likely to be far slower than simply using the regular _bt_search() path. The assertion catches serious performance bugs that would probably take a long time to detect any other way. It seems much more natural to make this assertion just before the point that we generate a fake/phony descent stack. Move the assert there. This also makes _bt_insertonpg() a bit more readable.
* nbtree: Demote minus infinity "can't happen" error.Peter Geoghegan2020-03-10
| | | | | | | | | | | | | | | Only a very basic logic bug in a _bt_insertonpg() caller could lead to a violation of this invariant. Besides, any newitemoff used for an internal page is sanitized using other "can't happen" errors in _bt_getstackbuf() or its callers, before _bt_insertonpg() even gets called. Also, move the error/assertion from the insert-without-split path of _bt_insertonpg() to the top of the same function. There is no reason why this invariant only applies to insertions that happen to not result in a page split; cover every insertion. The assertion naturally belongs next to the existing generic assertions that document relatively high-level invariants for the item being inserted.
* Remove utils/acl.h from catalog/objectaddress.hPeter Eisentraut2020-03-10
| | | | | | | | | | | | | | | | | | The need for this was removed by 8b9e9644dc6a9bd4b7a97950e6212f63880cf18b. A number of files now need to include utils/acl.h or parser/parse_node.h explicitly where they previously got it indirectly somehow. Since parser/parse_node.h already includes nodes/parsenodes.h, the latter is then removed where the former was added. Also, remove nodes/pg_list.h from objectaddress.h, since that's included via nodes/parsenodes.h. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/7601e258-26b2-8481-36d0-dc9dca6f28f1%402ndquadrant.com
* Tidy up XLogSource code in xlog.c.Fujii Masao2020-03-10
| | | | | | | | | | | This commit replaces 0 used as an initial value of XLogSource variable, with XLOG_FROM_ANY. Also this commit changes those variable so that XLogSource instead of int is used as the type for them. These changes are for code readability and debugger-friendliness. Author: Kyotaro Horiguchi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20200227.124830.2197604521555566121.horikyota.ntt@gmail.com
* Avoid assertion failure with targeted recovery in standby mode.Fujii Masao2020-03-09
| | | | | | | | | | | | | | | | | | | | | | | | At the end of recovery, standby mode is turned off to re-fetch the last valid record from archive or pg_wal. Previously, if recovery target was reached and standby mode was turned off while the current WAL source was stream, recovery could try to retrieve WAL file containing the last valid record unexpectedly from stream even though not in standby mode. This caused an assertion failure. That is, the assertion test confirms that WAL file should not be retrieved from stream if standby mode is not true. This commit moves back the current WAL source to archive if it's stream even though not in standby mode, to avoid that assertion failure. This issue doesn't cause the server to crash when built with assertion disabled. In this case, the attempt to retrieve WAL file from stream not in standby mode just fails. And then recovery tries to retrieve WAL file from archive or pg_wal. Back-patch to all supported branches. Author: Kyotaro Horiguchi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20200227.124830.2197604521555566121.horikyota.ntt@gmail.com
* Simplify/speed up assertion cross-check in ginCompressPostingList().Tom Lane2020-03-07
| | | | | | | | | I noticed while testing some other stuff that the CHECK_ENCODING_ROUNDTRIP logic in ginCompressPostingList could account for over 50% of the runtime of an INSERT with a GIN index. While that's not relevant to production performance, it's still kind of annoying in a debug build. Replacing the loop around short memcmp's with one long memcmp works just as well and is significantly faster, at least on my machine.
* Introduce macros for typalign and typstorage constants.Tom Lane2020-03-04
| | | | | | | | | | | | | | | | | | | | | Our usual practice for "poor man's enum" catalog columns is to define macros for the possible values and use those, not literal constants, in C code. But for some reason lost in the mists of time, this was never done for typalign/attalign or typstorage/attstorage. It's never too late to make it better though, so let's do that. The reason I got interested in this right now is the need to duplicate some uses of the TYPSTORAGE constants in an upcoming ALTER TYPE patch. But in general, this sort of change aids greppability and readability, so it's a good idea even without any specific motivation. I may have missed a few places that could be converted, and it's even more likely that pending patches will re-introduce some hard-coded references. But that's not fatal --- there's no expectation that we'd actually change any of these values. We can clean up stragglers over time. Discussion: https://postgr.es/m/16457.1583189537@sss.pgh.pa.us
* Remove overzealous _bt_split() assertions.Peter Geoghegan2020-03-02
| | | | | | | | | | | _bt_split() is passed NULL as its insertion scankey for internal page splits. Two recently added Assert() statements failed to consider this, leading to a crash with pg_upgrade'd BREE_VERSION < 4 indexes. Remove the assertions. The assertions in question were added by commit 0d861bbb, which added nbtree deduplication. It would be possible to fix the assertions directly instead, but they weren't adding much anyway.
* Report progress of streaming base backup.Fujii Masao2020-03-03
| | | | | | | | | | | | | This commit adds pg_stat_progress_basebackup view that reports the progress while an application like pg_basebackup is taking a base backup. This uses the progress reporting infrastructure added by c16dc1aca5e0, adding support for streaming base backup. Bump catversion. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi, Amit Langote, Sergei Kornilov Discussion: https://postgr.es/m/9ed8b801-8215-1f3d-62d7-65bff53f6e94@oss.nttdata.com
* Add assertions to _bt_update_posting().Peter Geoghegan2020-03-02
| | | | | | | Copy some assertions from _bt_form_posting() to its sibling function, _bt_update_posting(). Discussion: https://postgr.es/m/CAH2-WzkPR8KMwkL0ap976kmXwBCeukTeHz6fB-U__wvuP1S9Zg@mail.gmail.com
* Remove dead code from _bt_update_posting().Peter Geoghegan2020-03-01
| | | | Discussion: https://postgr.es/m/CAH2-WzmAufHiOku6AGiFD=81VQs5nYJ1L2YkhW1t+BH4CMsgRw@mail.gmail.com
* Move src/backend/utils/hash/hashfn.c to src/commonRobert Haas2020-02-27
| | | | | | | | | | | | | | This also involves renaming src/include/utils/hashutils.h, which becomes src/include/common/hashfn.h. Perhaps an argument can be made for keeping the hashutils.h name, but it seemed more consistent to make it match the name of the file, and also more descriptive of what is actually going on here. Patch by me, reviewed by Suraj Kharage and Mark Dilger. Off-list advice on how not to break the Windows build from Davinder Singh and Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
* Silence another compiler warning in nbtinsert.c.Peter Geoghegan2020-02-26
| | | | Per complaint from Álvaro Herrera.