aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* WAL Log invalidations at command end with wal_level=logical.Amit Kapila2020-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When wal_level=logical, write invalidations at command end into WAL so that decoding can use this information. This patch is required to allow the streaming of in-progress transactions in logical decoding.  The actual work to allow streaming will be committed as a separate patch. We still add the invalidations to the cache and write them to WAL at commit time in RecordTransactionCommit(). This uses the existing XLOG_INVALIDATIONS xlog record type, from the RM_STANDBY_ID resource manager (see LogStandbyInvalidations for details). So existing code relying on those invalidations (e.g. redo) does not need to be changed. The invalidations written at command end uses a new xlog record type XLOG_XACT_INVALIDATIONS, from RM_XACT_ID resource manager. See LogLogicalInvalidations for details. These new xlog records are ignored by existing redo procedures, which still rely on the invalidations written to commit records. The invalidations are decoded and accumulated in top-transaction, and then executed during replay.  This obviates the need to decode the invalidations as part of a commit record. Bump XLOG_PAGE_MAGIC, since this introduces XLOG_XACT_INVALIDATIONS. Author: Dilip Kumar, Tomas Vondra, Amit Kapila Reviewed-by: Amit Kapila Tested-by: Neha Sharma and Mahendra Singh Thalor Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
* Add nbtree Valgrind buffer lock checks.Peter Geoghegan2020-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | Holding just a buffer pin (with no buffer lock) on an nbtree buffer/page provides very weak guarantees, especially compared to heapam, where it's often safe to read a page while only holding a buffer pin. This commit has Valgrind enforce the following rule: it is never okay to access an nbtree buffer without holding both a pin and a lock on the buffer. A draft version of this patch detected questionable code that was cleaned up by commits fa7ff642 and 7154aa16. The code in question used to access an nbtree buffer page's special/opaque area with no buffer lock (only a buffer pin). This practice (which isn't obviously unsafe) is hereby formally disallowed in nbtree. There doesn't seem to be any reason to allow it, and banning it keeps things simple for Valgrind. The new checks are implemented by adding custom nbtree client requests (located in LockBuffer() wrapper functions); these requests are "superimposed" on top of the generic bufmgr.c Valgrind client requests added by commit 1e0dfd16. No custom resource management cleanup code is needed to undo the effects of marking buffers as non-accessible under this scheme. Author: Peter Geoghegan Reviewed-By: Anastasia Lubennikova, Georgios Kokolatos Discussion: https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com
* Rename wal_keep_segments to wal_keep_size.Fujii Masao2020-07-20
| | | | | | | | | | | | | | | | | | | | | | | | | max_slot_wal_keep_size that was added in v13 and wal_keep_segments are the GUC parameters to specify how much WAL files to retain for the standby servers. While max_slot_wal_keep_size accepts the number of bytes of WAL files, wal_keep_segments accepts the number of WAL files. This difference of setting units between those similar parameters could be confusing to users. To alleviate this situation, this commit renames wal_keep_segments to wal_keep_size, and make users specify the WAL size in it instead of the number of WAL files. There was also the idea to rename max_slot_wal_keep_size to max_slot_wal_keep_segments, in the discussion. But we have been moving away from measuring in segments, for example, checkpoint_segments was replaced by max_wal_size. So we concluded to rename wal_keep_segments to wal_keep_size. Back-patch to v13 where max_slot_wal_keep_size was added. Author: Fujii Masao Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, David Steele Discussion: https://postgr.es/m/574b4ea3-e0f9-b175-ead2-ebea7faea855@oss.nttdata.com
* Immediately WAL-log subtransaction and top-level XID association.Amit Kapila2020-07-20
| | | | | | | | | | | | | | | | | | | | The logical decoding infrastructure needs to know which top-level transaction the subxact belongs to, in order to decode all the changes. Until now that might be delayed until commit, due to the caching (GPROC_MAX_CACHED_SUBXIDS), preventing features requiring incremental decoding. So we also write the assignment info into WAL immediately, as part of the next WAL record (to minimize overhead) only when wal_level=logical. We can not remove the existing XLOG_XACT_ASSIGNMENT WAL as that is required for avoiding overflow in the hot standby snapshot. Bump XLOG_PAGE_MAGIC, since this introduces XLR_BLOCK_ID_TOPLEVEL_XID. Author: Tomas Vondra, Dilip Kumar, Amit Kapila Reviewed-by: Amit Kapila Tested-by: Neha Sharma and Mahendra Singh Thalor Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
* Avoid harmless Valgrind no-buffer-pin errors.Peter Geoghegan2020-07-19
| | | | | | | | | | | | | | | Valgrind builds with assertions enabled sometimes perform a theoretically unsafe page access inside an assertion in heapam_tuple_lock(). This happened when the eval-plan-qual isolation test ran one of the permutations added by commit a2418f9e238. Avoid complaints from Valgrind by moving the assertion ever so slightly. This is minor cleanup for commit 1e0dfd16, which added Valgrind buffer access instrumentation. No backpatch, since this only happens within an assertion, and seems very unlikely to cause any real problems even with assert-enabled builds.
* Avoid CREATE INDEX unique index deduplication.Peter Geoghegan2020-07-17
| | | | | | | | | | | | | | | | | There is no advantage to attempting deduplication for a unique index during CREATE INDEX, since there cannot possibly be any duplicates. Doing so wastes cycles due to unnecessary copying. Make sure that we avoid it consistently. We already avoided unique index deduplication in the case where there were some spool2 tuples to merge. That didn't account for the fact that spool2 is removed early/unset in the common case where it has no tuples that need to be merged (i.e. it failed to account for the "spool2 turns out to be unnecessary" optimization in _bt_spools_heapscan()). Oversight in commit 0d861bbb, which added nbtree deduplication Backpatch: 13-, where nbtree deduplication was introduced.
* Fix comments related to table AMsMichael Paquier2020-07-14
| | | | | | | | | | | Incorrect function names were referenced. As this fixes some portions of tableam.h, that is mentioned in the docs as something to look at when implementing a table AM, backpatch down to 12 where this has been introduced. Author: Hironobu Suzuki Discussion: https://postgr.es/m/8fe6d672-28dd-3f1d-7aed-ac2f6d599d3f@interdb.jp Backpatch-through: 12
* Fix uninitialized value in segno calculationAlvaro Herrera2020-07-13
| | | | | | | | | | | | Remove previous hack in KeepLogSeg that added a case to deal with a (badly represented) invalid segment number. This was added for the sake of GetWALAvailability. But it's not needed if in that function we initialize the segment number to be retreated to the currently being written segment, so do that instead. Per valgrind-running buildfarm member skink, and some sparc64 animals. Discussion: https://postgr.es/m/1724648.1594230917@sss.pgh.pa.us
* Include replication origins in SQL functions for commit timestampMichael Paquier2020-07-12
| | | | | | | | | | | | | | | | | | | | | | | | | | This includes two changes: - Addition of a new function pg_xact_commit_timestamp_origin() able, for a given transaction ID, to return the commit timestamp and replication origin of this transaction. An equivalent function existed in pglogical. - Addition of the replication origin to pg_last_committed_xact(). The commit timestamp manager includes already APIs able to return the replication origin of a transaction on top of its commit timestamp, but the code paths for replication origins were never stressed as those functions have never looked for a replication origin, and the SQL functions available have never included this information since their introduction in 73c986a. While on it, refactor a test of modules/commit_ts/ to use tstzrange() to check that a transaction timestamp is within the wanted range, making the test a bit easier to read. Bump catalog version. Author: Movead Li Reviewed-by: Madan Kumar, Michael Paquier Discussion: https://postgr.es/m/2020051116430836450630@highgo.ca
* Remove WARNING message from brin_desummarize_rangeAlvaro Herrera2020-07-09
| | | | | | | | | | This message was being emitted on the grounds that only crashed summarization could cause it, but in reality even an aborted vacuum could do it ... which makes it way too noisy, particularly since it shows up in regression tests and makes them die. Reported by Tom Lane. Discussion: https://postgr.es/m/489091.1593534251@sss.pgh.pa.us
* code: replace most remaining uses of 'master'.Andres Freund2020-07-08
| | | | | | Author: Andres Freund Reviewed-By: David Steele Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
* code: replace 'master' with 'leader' where appropriate.Andres Freund2020-07-08
| | | | | | | | | Leader already is the more widely used terminology, but a few places didn't get the message. Author: Andres Freund Reviewed-By: David Steele Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
* code: replace 'master' with 'primary' where appropriate.Andres Freund2020-07-08
| | | | | | | | | Also changed "in the primary" to "on the primary", and added a few "the" before "primary". Author: Andres Freund Reviewed-By: David Steele Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
* Fix incorrect variable datatype.Fujii Masao2020-07-08
| | | | | | | | | | | Since slot_keep_segs indicates the number of WAL segments not LSN, its datatype should not be XLogRecPtr. Back-patch to v13 where this issue was added. Reported-by: Atsushi Torikoshi Author: Atsushi Torikoshi, tweaked by Fujii Masao Discussion: https://postgr.es/m/ebd0d674f3e050222238a960cac5251a@oss.nttdata.com
* Morph pg_replication_slots.min_safe_lsn to safe_wal_sizeAlvaro Herrera2020-07-07
| | | | | | | | | | | | | | | The previous definition of the column was almost universally disliked, so provide this updated definition which is more useful for monitoring purposes: a large positive value is good, while zero or a negative value means danger. This should be operationally more convenient. Backpatch to 13, where the new column to pg_replication_slots (and the feature it represents) were added. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com> Discussion: https://postgr.es/m/9ddfbf8c-2f67-904d-44ed-cf8bc5916228@oss.nttdata.com
* Remove unnecessary PageIsEmpty() nbtree build check.Peter Geoghegan2020-07-06
| | | | | | | | | | | | | | | nbtree index builds cannot write out an empty page. That would mean that there was no way to create a pivot tuple pointing to the page one level up, since _bt_truncate() generates one based on page's firstright tuple. Replace the unnecessary PageIsEmpty() check with an assertion that checks that the page has space for at least two line pointers (the would-be high key line pointer, plus at least one valid "data item" tuple line pointer). The PageIsEmpty() check was added by commit 5d9f146c over 20 years ago. It looks like it has always been unnecessary.
* Remove unused function parameter in end_parallel_vacuum.Amit Kapila2020-07-06
| | | | | | | Author: Vignesh C Reviewed-by: Sawada Masahiko Backpatch-through: 13, where it was introduced Discussion: https://postgr.es/m/CALDaNm3Ppt71NafGY5mk3V2i3Q+mm93pVibDq-0NpW7WU67Jcg@mail.gmail.com
* nbtree: Rename _bt_search() variables.Peter Geoghegan2020-07-02
| | | | | | | Make some of the variable names in _bt_search() consistent with corresponding variables within _bt_getstackbuf(). This naming scheme is clearer because the variable names always express a relationship between the currently locked buffer/page and some other page.
* Improve vacuum error context handling.Amit Kapila2020-07-01
| | | | | | | | | | Use separate functions to save and restore error context information as that made code easier to understand.  Also, make it clear that the index information required for error context is sane. Author: Andres Freund, Justin Pryzby, Amit Kapila Backpatch-through: 13, where it was introduced Discussion: https://postgr.es/m/CAA4eK1LWo+v1OWu=Sky27GTGSCuOmr7iaURNbc5xz6jO+SaPeA@mail.gmail.com
* nbtree: Correct inaccurate split location comment.Peter Geoghegan2020-06-29
| | | | Minor oversight in commit fab25024338.
* Remove duplicate check added by commit b2a5545bd6.Amit Kapila2020-06-27
| | | | | | | As this doesn't cause any harm so we decided to this clean up in HEAD only. Author: Ádám Balogh Discussion: https://postgr.es/m/VI1PR0702MB36631BD67559461AFDE1FEEE81920@VI1PR0702MB3663.eurprd07.prod.outlook.com
* Fix misuse of table_index_fetch_tuple_check().Peter Geoghegan2020-06-25
| | | | | | | | | | | | | Commit 0d861bbb, which added deduplication to nbtree, had _bt_check_unique() pass a TID to table_index_fetch_tuple_check() that isn't safe to mutate. table_index_fetch_tuple_check()'s tid argument is modified when the TID in question is not the latest visible tuple in a hot chain, though this wasn't documented. To fix, go back to using a local copy of the TID in _bt_check_unique(), and update comments above table_index_fetch_tuple_check(). Backpatch: 13-, where B-Tree deduplication was introduced.
* Adjust max_slot_wal_keep_size behavior per reviewAlvaro Herrera2020-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | In pg_replication_slot, change output from normal/reserved/lost to reserved/extended/unreserved/ lost, which better expresses the possible states particularly near the time where segments are no longer safe but checkpoint has not run yet. Under the new definition, reserved means the slot is consuming WAL that's still under the normal WAL size constraints; extended means it's consuming WAL that's being protected by wal_keep_segments or the slot itself, whose size is below max_slot_wal_keep_size; unreserved means the WAL is no longer safe, but checkpoint has not yet removed those files. Such as slot is in imminent danger, but can still continue for a little while and may catch up to the reserved WAL space. Also, there were some bugs in the calculations used to report the status; fixed those. Backpatch to 13. Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com> Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200616.120236.1809496990963386593.horikyota.ntt@gmail.com
* Add parens to ConvertToXSegs macroAlvaro Herrera2020-06-24
| | | | | | | The current definition is dangerous. No bugs exist in our code at present, but backpatch to 11 nonetheless where it was introduced. Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
* Fix masking of SP-GiST pages during xlog consistency checkAlexander Korotkov2020-06-20
| | | | | | | | | | spg_mask() didn't take into account that pd_lower equal to SizeOfPageHeaderData is still valid value. This commit fixes that. Backpatch to 11, where spg_mask() pg_lower check was introduced. Reported-by: Michael Paquier Discussion: https://postgr.es/m/20200615131405.GM52676%40paquier.xyz Backpatch-through: 11
* Remove dead forceSync parameter of XactLogCommitRecord().Noah Misch2020-06-20
| | | | | | | | | | | | The function has been reading global variable forceSyncCommit, mirroring the intent of the caller that passed forceSync=forceSyncCommit. The other caller, RecordTransactionCommitPrepared(), passed false. Since COMMIT PREPARED can't share a transaction with any command, it certainly doesn't share a transaction with a command that sets forceSyncCommit. Reviewed by Michael Paquier. Discussion: https://postgr.es/m/20200617032615.GC2916904@rfd.leadboat.com
* Fix deduplication "single value" strategy bug.Peter Geoghegan2020-06-19
| | | | | | | | | | | | | | | | It was possible for deduplication's single value strategy to mistakenly believe that a very small duplicate tuple counts as one of the six large tuples that it aims to leave behind after the page finally splits. This could cause slightly suboptimal space utilization with very low cardinality indexes, though only under fairly narrow conditions. To fix, be particular about what kind of tuple counts as a maxpostingsize-capped tuple. This avoids confusion in the event of a small tuple that gets "wedged" between two large tuples, where all tuples on the page are duplicates of the same value. Discussion: https://postgr.es/m/CAH2-Wz=Y+sgSFc-O3LpiZX-POx2bC+okec2KafERHuzdVa7-rQ@mail.gmail.com Backpatch: 13-, where deduplication was introduced (by commit 0d861bbb)
* Don't export basebackup.c's sendTablespace().Robert Haas2020-06-17
| | | | | | | | | | | | | Commit 72d422a5227ef6f76f412486a395aba9f53bf3f0 made xlog.c call sendTablespace() with the 'sizeonly' argument set to true, which required basebackup.c to export sendTablespace(). However, that's kind of ugly, so instead defer the call to sendTablespace() until basebackup.c regains control. That way, it can still be a static function. Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi. Discussion: http://postgr.es/m/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com
* Remove STATUS_WAITINGPeter Eisentraut2020-06-17
| | | | | | | Add a separate enum for use in the locking APIs, which were the only user. Discussion: https://www.postgresql.org/message-id/flat/a6f91ead-0ce4-2a34-062b-7ab9813ea308%402ndquadrant.com
* Fix buffile.c error handling.Thomas Munro2020-06-16
| | | | | | | | | | | | | | | | | | Convert buffile.c error handling to use ereport. This fixes cases where I/O errors were indistinguishable from EOF or not reported. Also remove "%m" from error messages where errno would be bogus. While we're modifying those strings, add block numbers and short read byte counts where appropriate. Back-patch to all supported releases. Reported-by: Amit Khandekar <amitdkhan.pg@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGJE04G%3D8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw%40mail.gmail.com
* Fix some comments referring to past featuresMichael Paquier2020-06-15
| | | | | | | | Timestamp can only be an int64 since b9d092c, and support for WITH OIDS has been removed as of 578b229. Author: Justin Pryzby Discussion: https://postgr.es/m/20200612023709.GC14879@telsasoft.com
* Silence _bt_check_unique compiler warning.Peter Geoghegan2020-06-13
| | | | | Reported-By: Tom Lane Discussion: https://postgr.es/m/841649.1592065060@sss.pgh.pa.us
* Have pg_itoa, pg_ltoa and pg_lltoa return the length of the stringDavid Rowley2020-06-13
| | | | | | | | | | | Core by no means makes excessive use of these functions, but quite a large number of those usages do require the caller to call strlen() on the returned string. This is quite wasteful since these functions do already have a good idea of the length of the string, so we might as well just have them return that. Reviewed-by: Andrew Gierth Discussion: https://postgr.es/m/CAApHDvrm2A5x2uHYxsqriO2cUaGcFvND%2BksC9e7Tjep0t2RK_A%40mail.gmail.com
* Improve comments for [Heap]CheckForSerializableConflictOut().Thomas Munro2020-06-12
| | | | | | | | | | | | Rewrite the documentation of these functions, in light of recent bug fix commit 5940ffb2. Back-patch to 13 where the check-for-conflict-out code was split up into AM-specific and generic parts, and new documentation was added that now looked wrong. Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443%40jepsen.io
* Avoid update conflict out serialization anomalies.Peter Geoghegan2020-06-11
| | | | | | | | | | | | | | | | | | | | | | | | | | SSI's HeapCheckForSerializableConflictOut() test failed to correctly handle conditions involving a concurrently inserted tuple which is later concurrently updated by a separate transaction . A SELECT statement that called HeapCheckForSerializableConflictOut() could end up using the same XID (updater's XID) for both the original tuple, and the successor tuple, missing the XID of the xact that created the original tuple entirely. This only happened when neither tuple from the chain was visible to the transaction's MVCC snapshot. The observable symptoms of this bug were subtle. A pair of transactions could commit, with the later transaction failing to observe the effects of the earlier transaction (because of the confusion created by the update to the non-visible row). This bug dates all the way back to commit dafaa3ef, which added SSI. To fix, make sure that we check the xmin of concurrently inserted tuples that happen to also have been updated concurrently. Author: Peter Geoghegan Reported-By: Kyle Kingsbury Reviewed-By: Thomas Munro Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443@jepsen.io Backpatch: All supported versions
* Fix locking bugs that could corrupt pg_control.Thomas Munro2020-06-08
| | | | | | | | | | | | | | | | | | The redo routines for XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} must acquire ControlFileLock before modifying ControlFile->checkPointCopy, or the checkpointer could write out a control file with a bad checksum. Likewise, XLogReportParameters() must acquire ControlFileLock before modifying ControlFile and calling UpdateControlFile(). Back-patch to all supported releases. Author: Nathan Bossart <bossartn@amazon.com> Author: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/70BF24D6-DC51-443F-B55A-95735803842A%40amazon.com
* Fix crash in WAL sender when starting physical replicationMichael Paquier2020-06-08
| | | | | | | | | | | | | | | | | | | | | Since database connections can be used with WAL senders in 9.4, it is possible to use physical replication. This commit fixes a crash when starting physical replication with a WAL sender using a database connection, caused by the refactoring done in 850196b. There have been discussions about forbidding the use of physical replication in a database connection, but this is left for later, taking care only of the crash new to 13. While on it, add a test to check for a failure when attempting logical replication if the WAL sender does not have a database connection. This part is extracted from a larger patch by Kyotaro Horiguchi. Reported-by: Vladimir Sitnikov Author: Michael Paquier, Kyotaro Horiguchi Reviewed-by: Kyotaro Horiguchi, Álvaro Herrera Discussion: https://postgr.es/m/CAB=Je-GOWMj1PTPkeUhjqQp-4W3=nW-pXe2Hjax6rJFffB5_Aw@mail.gmail.com Backpatch-through: 13
* Reconsider nbtree page deletion assertion.Peter Geoghegan2020-05-19
| | | | | | | | | | | | | | | | Commit 624686abcf8 added an assertion that verified that _bt_search successfully relocated the leaf page undergoing deletion. Page deletion cannot deal with the case where the descent stack is to the right of the page, so this seemed critical (deletion can only handle the case where the descent stack is to the left of the leaf/target page). However, the assertion went a bit too far. Since only a buffer pin is held on the leaf page throughout the call to _bt_search, nothing guarantees that it can't have split during this small window. And if does actually split, _bt_search may end up "relocating" a page to the right of the original target leaf page. This scenario seems extremely unlikely, but it must still be considered. Remove the assertion, and document how we cope in this scenario.
* Mop-up for wait event naming issues.Tom Lane2020-05-16
| | | | | | | | | | | | Synchronize the event names for parallel hash join waits with other event names, by getting rid of the slashes and dropping "-ing" suffixes. Rename ClogGroupUpdate to XactGroupUpdate, to match the new SLRU name. Move the ProcSignalBarrier event to the IPC category; it doesn't belong under IO. Also a bit more wordsmithing in the wait event documentation tables. Discussion: https://postgr.es/m/4505.1589640417@sss.pgh.pa.us
* Final pgindent run with pg_bsd_indent version 2.1.Tom Lane2020-05-16
| | | | | | | This is just to provide a clean basis for comparison of the results of the new version. I did fix a typo that crept into 242dfcbaf. Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
* Rename assorted LWLock tranches.Tom Lane2020-05-15
| | | | | | | | | | | | | | | | Choose names that fit into the conventions for wait event names (particularly, that multi-word names are in the style MultiWordName) and hopefully convey more information to non-hacker users than the previous names did. Also rename SerializablePredicateLockListLock to SerializablePredicateListLock; the old name was long enough to cause table formatting problems, plus the double occurrence of "Lock" seems confusing/error-prone. Also change a couple of particularly opaque LWLock field names. Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
* Avoid killing btree items that are already deadAlvaro Herrera2020-05-15
| | | | | | | | | | | | | | | | | | | | | | _bt_killitems marks btree items dead when a scan leaves the page where they live, but it does so with only share lock (to improve concurrency). This was historicall okay, since killing a dead item has no consequences. However, with the advent of data checksums and wal_log_hints, this action incurs a WAL full-page-image record of the page. Multiple concurrent processes would write the same page several times, leading to WAL bloat. The probability of this happening can be reduced by only killing items if they're not already dead, so change the code to do that. The problem could eliminated completely by having _bt_killitems upgrade to exclusive lock upon seeing a killable item, but that would reduce concurrency so it's considered a cure worse than the disease. Backpatch all the way back to 9.5, since wal_log_hints was introduced in 9.4. Author: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Discussion: https://postgr.es/m/CA+fd4k6PeRj2CkzapWNrERkja5G0-6D-YQiKfbukJV+qZGFZ_Q@mail.gmail.com
* Rename SLRU structures and associated LWLocks.Tom Lane2020-05-15
| | | | | | | | | | | | | | | | | | | | | | | | | | Originally, the names assigned to SLRUs had no purpose other than being shmem lookup keys, so not a lot of thought went into them. As of v13, though, we're exposing them in the pg_stat_slru view and the pg_stat_reset_slru function, so it seems advisable to take a bit more care. Rename them to names based on the associated on-disk storage directories (which fortunately we *did* think about, to some extent; since those are also visible to DBAs, consistency seems like a good thing). Also rename the associated LWLocks, since those names are likewise user-exposed now as wait event names. For the most part I only touched symbols used in the respective modules' SimpleLruInit() calls, not the names of other related objects. This renaming could have been taken further, and maybe someday we will do so. But for now it seems undesirable to change the names of any globally visible functions or structs, so some inconsistency is unavoidable. (But I *did* terminate "oldserxid" with prejudice, as I found that name both unreadable and not descriptive of the SLRU's contents.) Table 27.12 needs re-alphabetization now, but I'll leave that till after the other LWLock renamings I have in mind. Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
* Initial pgindent and pgperltidy run for v13.Tom Lane2020-05-14
| | | | | | | | | | | Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit c9d297751, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.
* Collect built-in LWLock tranche names statically, not dynamically.Tom Lane2020-05-14
| | | | | | | | | | | | | | | | | | There is little point in using the LWLockRegisterTranche mechanism for built-in tranche names. It wastes cycles, it creates opportunities for bugs (since failing to register a tranche name is a very hard-to-detect problem), and the lack of any centralized list of names encourages sloppy nonconformity in name choices. Moreover, since we have a centralized list of the tranches anyway in enum BuiltinTrancheIds, we're certainly not buying any flexibility in return for these disadvantages. Hence, nuke all the backend-internal LWLockRegisterTranche calls, and instead provide a const array of the builtin tranche names. (I have in mind to change a bunch of these names shortly, but this patch is just about getting them into one place.) Discussion: https://postgr.es/m/9056.1589419765@sss.pgh.pa.us
* Dial back -Wimplicit-fallthrough to level 3Alvaro Herrera2020-05-13
| | | | | | | | | The additional pain from level 4 is excessive for the gain. Also revert all the source annotation changes to their original wordings, to avoid back-patching pain. Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us
* Improve management of SLRU statistics collection.Tom Lane2020-05-13
| | | | | | | | | | | | | | | | | | | | | | | Instead of re-identifying which statistics bucket to use for a given SLRU on every counter increment, do it once during shmem initialization. This saves a fair number of cycles, and there's no real cost because we could not have a bucket assignment that varies over time or across backends anyway. Also, get rid of the ill-considered decision to let pgstat.c pry directly into SLRU's shared state; it's cleaner just to have slru.c pass the stats bucket number. In consequence of these changes, there's no longer any need to store an SLRU's LWLock tranche info in shared memory, so get rid of that, making this a net reduction in shmem consumption. (That partly reverts fe702a7b3.) This is basically code review for 28cac71bd, so I also cleaned up some comments, removed a dangling extern declaration, fixed some things that should be static and/or const, etc. Discussion: https://postgr.es/m/3618.1589313035@sss.pgh.pa.us
* Adjust walsender usage of xlogreader, simplify APIsAlvaro Herrera2020-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Have both physical and logical walsender share a 'xlogreader' state struct for tracking state. This replaces the existing globals sendSeg and sendCxt. * Change WALRead not to receive XLogReaderState->seg and ->segcxt as separate arguments anymore; just use the ones from 'state'. This is made possible by the above change. * have the XLogReader segment_open contract require the callbacks to install the file descriptor in the state struct themselves instead of returning it. xlogreader was already ignoring any possible failed return from the callbacks, relying solely on them never returning. (This point is not altogether excellent, as it means the callbacks have to know more of XLogReaderState; but to really improve on that we would have to pass back error info from the callbacks to xlogreader. And the complexity would not be saved but instead just transferred to the callers of WALRead, which would have to learn how to throw errors from the open_segment callback in addition of, as currently, from pg_pread.) * segment_open no longer receives the 'segcxt' as a separate argument, since it's part of the XLogReaderState argument. Per comments from Kyotaro Horiguchi. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200511203336.GA9913@alvherre.pgsql
* Add -Wimplicit-fallthrough to CFLAGS and CXXFLAGSAlvaro Herrera2020-05-12
| | | | | | | | | | | | | | | Use it at level 4, a bit more restrictive than the default level, and tweak our commanding comments to FALLTHROUGH. (However, leave zic.c alone, since it's external code; to avoid the warnings that would appear there, change CFLAGS for that file in the Makefile.) Author: Julien Rouhaud <rjuju123@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/20200412081825.qyo5vwwco3fv4gdo@nol Discussion: https://postgr.es/m/flat/E1fDenm-0000C8-IJ@gemulon.postgresql.org
* Fix comment in xlogutils.cMichael Paquier2020-05-12
| | | | | | | | | | | The existing callers of XLogReadDetermineTimeline() performing recovery need to check a replay LSN position when determining on which timeline to read a WAL page. A portion of the comment describing this function said exactly that, while referring to a routine for fetching a write LSN, something not available in recovery. Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20200511.101619.2043820539323292957.horikyota.ntt@gmail.com