aboutsummaryrefslogtreecommitdiff
path: root/src/backend/storage/ipc
Commit message (Collapse)AuthorAge
...
* Pre-beta mechanical code beautification.Tom Lane2023-05-19
| | | | | | | | | | | | | | | Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
* Fix typos in commentsMichael Paquier2023-05-02
| | | | | | | | | The changes done in this commit impact comments with no direct user-visible changes, with fixes for incorrect function, variable or structure names. Author: Alexander Lakhin Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com
* Remove vacuum_defer_cleanup_ageAndres Freund2023-04-24
| | | | | | | | | | | | | | | | | | | | vacuum_defer_cleanup_age was introduced before hot_standby_feedback and replication slots existed. It is hard to use reasonably - commonly it will either be set too low (not preventing recovery conflicts, while still causing some bloat), or too high (causing a lot of bloat). The alternatives do not have that issue. That on its own might not be sufficient reason to remove vacuum_defer_cleanup_age, but it also complicates computation of xid horizons. See e.g. the bug fixed in be504a3e974. It also is untested. This commit removes TransactionIdRetreatSafely(), as there are no users anymore. There might be potential future users, hence noting that here. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230317230930.nhsgk3qfk7f4axls@awork3.anarazel.de
* Fix various typos and incorrect/outdated name referencesDavid Rowley2023-04-19
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/699beab4-a6ca-92c9-f152-f559caf6dc25@gmail.com
* Handle logical slot conflicts on standbyAndres Freund2023-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During WAL replay on the standby, when a conflict with a logical slot is identified, invalidate such slots. There are two sources of conflicts: 1) Using the information added in 6af1793954e, logical slots are invalidated if required rows are removed 2) wal_level on the primary server is reduced to below logical Uses the infrastructure introduced in the prior commit. FIXME: add commit reference. Change InvalidatePossiblyObsoleteSlot() to use a recovery conflict to interrupt use of a slot, if called in the startup process. The new recovery conflict is added to pg_stat_database_conflicts, as confl_active_logicalslot. See 6af1793954e for an overall design of logical decoding on a standby. Bumps catversion for the addition of the pg_stat_database_conflicts column. Bumps PGSTAT_FILE_FORMAT_ID for the same reason. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de
* Fix incorrect format placeholdersPeter Eisentraut2023-03-30
|
* Improve several permission-related error messages.Peter Eisentraut2023-03-17
| | | | | | | | | Mainly move some detail from errmsg to errdetail, remove explicit mention of superuser where appropriate, since that is implied in most permission checks, and make messages more uniform. Author: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/20230316234701.GA903298@nathanxps13
* Fix corruption due to vacuum_defer_cleanup_age underflowing 64bit xidsAndres Freund2023-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When vacuum_defer_cleanup_age is bigger than the current xid, including the epoch, the subtraction of vacuum_defer_cleanup_age would lead to a wrapped around xid. While that normally is not a problem, the subsequent conversion to a 64bit xid results in a 64bit-xid very far into the future. As that xid is used as a horizon to detect whether rows versions are old enough to be removed, that allows removal of rows that are still visible (i.e. corruption). If vacuum_defer_cleanup_age was never changed from the default, there is no chance of this bug occurring. This bug was introduced in dc7420c2c92. A lesser version of it exists in 12-13, introduced by fb5344c969a, affecting only GiST. The 12-13 version of the issue can, in rare cases, lead to pages in a gist index getting recycled too early, potentially causing index entries to be found multiple times. The fix is fairly simple - don't allow vacuum_defer_cleanup_age to retreat further than FirstNormalTransactionId. Patches to make similar bugs easier to find, by adding asserts to the 64bit xid infrastructure, have been proposed, but are not suitable for backpatching. Currently there are no tests for vacuum_defer_cleanup_age. A patch introducing infrastructure to make writing a test easier has been posted to the list. Reported-by: Michail Nikolaev <michail.nikolaev@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de Backpatch: 12-, but impact/fix is smaller for 12-13
* Speedup and increase usability of set proc title functionsDavid Rowley2023-02-20
| | | | | | | | | | | | | | | | | | | | | | The setting of the process title could be seen on profiles of very fast-to-execute queries. In many locations where we call set_ps_display() we pass along a string constant, the length of which is known during compilation. Here we effectively rename set_ps_display() to set_ps_display_with_len() and then add a static inline function named set_ps_display() which calls strlen() on the given string. This allows the compiler to optimize away the strlen() call when dealing with call sites passing a string constant. We can then also use memcpy() instead of strlcpy() to copy the string into the destination buffer. That's significantly faster than strlcpy's byte-at-a-time way of copying. Here we also take measures to improve some code which was adjusting the process title to add a " waiting" suffix to it. Call sites which require this can now just call set_ps_display_suffix() to add or adjust the suffix and call set_ps_display_remove_suffix() to remove it again. Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CAApHDvocBvvk-0gWNA2Gohe+sv9fMcv+fK_G+siBKJrgDG4O7g@mail.gmail.com
* Remove useless casts to (void *) in arguments of some system functionsPeter Eisentraut2023-02-07
| | | | | | | | The affected functions are: bsearch, memcmp, memcpy, memset, memmove, qsort, repalloc Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
* Avoid type cheats for invalid dsa_handles and dshash_table_handles.Tom Lane2023-01-25
| | | | | | | | | | | | | | Invent separate macros for "invalid" values of these types, so that we needn't embed knowledge of their representations into calling code. These are all zeroes anyway ATM, so this is not fixing any live bug, but it makes the code cleaner and more future-proof. I (tgl) also chose to move DSM_HANDLE_INVALID into dsm_impl.h, since it seems like it should live beside the typedef for dsm_handle. Hou Zhijie, Nathan Bossart, Kyotaro Horiguchi, Tom Lane Discussion: https://postgr.es/m/OS0PR01MB5716860B1454C34E5B179B6694C99@OS0PR01MB5716.jpnprd01.prod.outlook.com
* Zero initialize uses of instr_time about to trigger compiler warningsAndres Freund2023-01-20
| | | | | | | | | These are all not necessary from a correctness POV. However, in the near future instr_time will be simplified to an int64, at which point gcc would otherwise start to warn about the changed places. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/20230116023639.rn36vf6ajqmfciua@awork3.anarazel.de
* Remove SHM_QUEUEAndres Freund2023-01-19
| | | | | | | | | | Prior patches got rid of all the uses of SHM_QUEUE. ilist.h style lists are more widely used and have an easier to use interface. As there are no users left, remove SHM_QUEUE. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> (in an older version) Discussion: https://postgr.es/m/20221120055930.t6kl3tyivzhlrzu2@awork3.anarazel.de Discussion: https://postgr.es/m/20200211042229.msv23badgqljrdg2@alap3.anarazel.de
* Fix WaitEventSetWait() buffer overrun.Thomas Munro2023-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | The WAIT_USE_EPOLL and WAIT_USE_KQUEUE implementations of WaitEventSetWaitBlock() confused the size of their internal buffer with the size of the caller's output buffer, and could ask the kernel for too many events. In fact the set of events retrieved from the kernel needs to be able to fit in both buffers, so take the smaller of the two. The WAIT_USE_POLL and WAIT_USE WIN32 implementations didn't have this confusion. This probably didn't come up before because we always used the same number in both places, but commit 7389aad6 calculates a dynamic size at construction time, while using MAXLISTEN for its output event buffer on the stack. That seems like a reasonable thing to want to do, so consider this to be a pre-existing bug worth fixing. As discovered by valgrind on skink. Back-patch to all supported releases for epoll, and to release 13 for the kqueue part, which copied the incorrect epoll code. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/901504.1673504836%40sss.pgh.pa.us
* Perform apply of large transactions by parallel workers.Amit Kapila2023-01-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, for large transactions, the publisher sends the data in multiple streams (changes divided into chunks depending upon logical_decoding_work_mem), and then on the subscriber-side, the apply worker writes the changes into temporary files and once it receives the commit, it reads from those files and applies the entire transaction. To improve the performance of such transactions, we can instead allow them to be applied via parallel workers. In this approach, we assign a new parallel apply worker (if available) as soon as the xact's first stream is received and the leader apply worker will send changes to this new worker via shared memory. The parallel apply worker will directly apply the change instead of writing it to temporary files. However, if the leader apply worker times out while attempting to send a message to the parallel apply worker, it will switch to "partial serialize" mode - in this mode, the leader serializes all remaining changes to a file and notifies the parallel apply workers to read and apply them at the end of the transaction. We use a non-blocking way to send the messages from the leader apply worker to the parallel apply to avoid deadlocks. We keep this parallel apply assigned till the transaction commit is received and also wait for the worker to finish at commit. This preserves commit ordering and avoid writing to and reading from files in most cases. We still need to spill if there is no worker available. This patch also extends the SUBSCRIPTION 'streaming' parameter so that the user can control whether to apply the streaming transaction in a parallel apply worker or spill the change to disk. The user can set the streaming parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means the streaming will be applied via a parallel apply worker, if available. The parameter value 'on' means the streaming transaction will be spilled to disk. The default value is 'off' (same as current behaviour). In addition, the patch extends the logical replication STREAM_ABORT message so that abort_lsn and abort_time can also be sent which can be used to update the replication origin in parallel apply worker when the streaming transaction is aborted. Because this message extension is needed to support parallel streaming, parallel streaming is not supported for publications on servers < PG16. Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Avoid special XID snapshotConflictHorizon values.Peter Geoghegan2023-01-02
| | | | | | | | | | | | | Don't allow VACUUM to WAL-log the value FrozenTransactionId as the snapshotConflictHorizon of freezing or visibility map related WAL records. The only special XID value that's an allowable snapshotConflictHorizon is InvalidTransactionId, which is interpreted as "record definitely doesn't require a recovery conflict". Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznuNGSzF8v6OsgjaC5aYsb3cZ6HW6MLm30X0d65cmSH6A@mail.gmail.com
* Allow parent's WaitEventSets to be freed after fork().Thomas Munro2022-12-23
| | | | | | | | | | | | | An epoll fd belonging to the parent should be closed in the child. A kqueue fd is automatically closed by fork(), but we should still adjust our counter. For poll and Windows systems, nothing special is required. On all systems we free the memory. No caller yet, but we'll need this if we start using WaitEventSet in the postmaster as planned. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
* Don't leak a signalfd when using latches in the postmaster.Thomas Munro2022-12-23
| | | | | | | | | | | At the time of commit 6a2a70a02 we didn't use latch infrastructure in the postmaster. We're planning to start doing that, so we'd better make sure that the signalfd inherited from a postmaster is not duplicated and then leaked in the child. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
* Add WL_SOCKET_ACCEPT event to WaitEventSet API.Thomas Munro2022-12-23
| | | | | | | | | | | | | | | To be able to handle incoming connections on a server socket with the WaitEventSet API, we'll need a new kind of event to indicate that the the socket is ready to accept a connection. On Unix, it's just the same as WL_SOCKET_READABLE, but on Windows there is a different underlying kernel event that we need to map our abstraction to. No user yet, but a proposed patch would use this. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com
* Add copyright notices to meson filesAndrew Dunstan2022-12-20
| | | | Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net
* Expose some information about backend subxact status.Robert Haas2022-12-19
| | | | | | | | | | | | | | | A new function pg_stat_get_backend_subxact() can be used to get information about the number of subtransactions in the cache of a particular backend and whether that cache has overflowed. This can be useful for tracking down performance problems that can result from overflowed snapshots. Dilip Kumar, reviewed by Zhihong Yu, Nikolay Samokhvalov, Justin Pryzby, Nathan Bossart, Ashutosh Sharma, Julien Rouhaud. Additional design comments from Andres Freund, Tom Lane, Bruce Momjian, and David G. Johnston. Discussion: http://postgr.es/m/CAFiTN-ut0uwkRJDQJeDPXpVyTWD46m3gt3JDToE02hTfONEN=Q@mail.gmail.com
* Improve heuristics for compressing the KnownAssignedXids array.Tom Lane2022-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we'd compress only when the active range of array entries reached Max(4 * PROCARRAY_MAXPROCS, 2 * pArray->numKnownAssignedXids). If max_connections is large, the first term could result in not compressing for a long time, resulting in much wastage of cycles in hot-standby backends scanning the array to take snapshots. Get rid of that term, and just bound it to 2 * pArray->numKnownAssignedXids. That however creates the opposite risk, that we might spend too much effort compressing. Hence, consider compressing only once every 128 commit records. (This frequency was chosen by benchmarking. While we only tried one benchmark scenario, the results seem stable over a fairly wide range of frequencies.) Also, force compression when processing RecoveryInfo WAL records (which should be infrequent); the old code could perform compression then, but would do so only after the same array-range check as for the transaction-commit path. Also, opportunistically run compression if the startup process is about to wait for WAL, though not oftener than once a second. This should prevent cases where we waste lots of time by leaving the array not-compressed for long intervals due to low WAL traffic. Lastly, add a simple check to keep us from uselessly compressing when the array storage is already compact. Back-patch, as the performance problem is worse in pre-v14 branches than in HEAD. Simon Riggs and Michail Nikolaev, with help from Tom Lane and Andres Freund. Discussion: https://postgr.es/m/CALdSSPgahNUD_=pB_j=1zSnDBaiOtqVfzo8Ejt5J_k7qZiU1Tw@mail.gmail.com
* Ignore invalidated slots while computing oldest catalog XminAlvaro Herrera2022-11-22
| | | | | | | | | | | | | | | | | Once a logical slot has acquired a catalog_xmin, it doesn't let go of it, even when invalidated by exceeding the max_slot_wal_keep_size, which means that dead catalog tuples are not removed by vacuum anymore since the point is invalidated, until the slot is dropped. This could be catastrophic if catalog churn is high. Change the computation of Xmin to ignore invalidated slots, to prevent dead rows from accumulating. Backpatch to 13, where slot invalidation appeared. Author: Sirisha Chamarthi <sirichamarthi22@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAKrAKeUEDeqquN9vwzNeG-CN8wuVsfRYbeOUV9qKO_RHok=j+g@mail.gmail.com
* Standardize rmgrdesc recovery conflict XID output.Peter Geoghegan2022-11-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Standardize on the name snapshotConflictHorizon for all XID fields from WAL records that generate recovery conflicts when in hot standby mode. This supersedes the previous latestRemovedXid naming convention. The new naming convention places emphasis on how the values are actually used by REDO routines. How the values are generated during original execution (details of which vary by record type) is deemphasized. Users of tools like pg_waldump can now grep for snapshotConflictHorizon to see all potential sources of recovery conflicts in a standardized way, without necessarily having to consider which specific record types might be involved. Also bring a couple of WAL record types that didn't follow any kind of naming convention into line. These are heapam's VISIBLE record type and SP-GiST's VACUUM_REDIRECT record type. Now every WAL record whose REDO routine calls ResolveRecoveryConflictWithSnapshot() passes through the snapshotConflictHorizon field from its WAL record. This is follow-up work to the refactoring from commit 9e540599 that made FREEZE_PAGE WAL records use a standard snapshotConflictHorizon style XID cutoff. No bump in XLOG_PAGE_MAGIC, since the underlying format of affected WAL records doesn't change. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wzm2CQUmViUq7Opgk=McVREHSOorYaAjR1ZpLYkRN7_dPw@mail.gmail.com
* Improve comments referring snapshot's subxip array.Amit Kapila2022-11-15
| | | | | | | | | | It was referred to as subxact array in a few places and subxip array in others. By changing it to subxip array, we make it consistent with similar references to xip array. Author: Japin Li Reviewd by: Julien Rouhaud, Richard Guo Discussion: https://postgr.es/m/MEYP282MB1669DCE7AC193A947CED2A95B6009@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
* Clean up some inconsistencies with GUC declarationsMichael Paquier2022-10-31
| | | | | | | | | | | | | | | | | | | | This is similar to 7d25958, and this commit takes care of all the remaining inconsistencies between the initial value used in the C variable associated to a GUC and its default value stored in the GUC tables (as of pg_settings.boot_val). Some of the initial values of the GUCs updated rely on a compile-time default. These are refactored so as the GUC table and its C declaration use the same values. This makes everything consistent with other places, backend_flush_after, bgwriter_flush_after, port, checkpoint_flush_after doing so already, for example. Extracted from a larger patch by Peter Smith. The spots updated in the modules are from me. Author: Peter Smith, Michael Paquier Reviewed-by: Nathan Bossart, Tom Lane, Justin Pryzby Discussion: https://postgr.es/m/CAHut+PtHE0XSfjjRQ6D4v7+dqzCw=d+1a64ujra4EX8aoc_Z+w@mail.gmail.com
* Rename SetSingleFuncCall() to InitMaterializedSRF()Michael Paquier2022-10-18
| | | | | | | | | | | | | | | | | | Per discussion, the existing routine name able to initialize a SRF function with materialize mode is unpopular, so rename it. Equally, the flags of this function are renamed, as of: - SRF_SINGLE_USE_EXPECTED -> MAT_SRF_USE_EXPECTED_DESC - SRF_SINGLE_BLESS -> MAT_SRF_BLESS The previous function and flags introduced in 9e98583 are kept around for compatibility purposes, so as any extension code already compiled with v15 continues to work as-is. The declarations introduced here for compatibility will be removed from HEAD in a follow-up commit. The new names have been suggested by Andres Freund and Melanie Plageman. Discussion: https://postgr.es/m/20221013194820.ciktb2sbbpw7cljm@awork3.anarazel.de Backpatch-through: 15
* Standardize format for printing PIDsPeter Eisentraut2022-10-14
| | | | | | | | | | | | | | Most code prints PIDs as %d, but some code tried to print them as long or unsigned long. While this is in theory allowed, the fact that PIDs fit into int is deeply baked into all PostgreSQL code, so these random deviations don't accomplish anything except confusion. Note that we still need casts from pid_t to int, because on 64-bit MinGW, pid_t is long long int. (But per above, actually supporting that range in PostgreSQL code would be major surgery and probably not useful.) Discussion: https://www.postgresql.org/message-id/289c2e45-c7d9-5ce4-7eff-a9e2a33e1580@enterprisedb.com
* Harden pmsignal.c against clobbered shared memory.Tom Lane2022-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | The postmaster is not supposed to do anything that depends fundamentally on shared memory contents, because that creates the risk that a backend crash that trashes shared memory will take the postmaster down with it, preventing automatic recovery. In commit 969d7cd43 I lost sight of this principle and coded AssignPostmasterChildSlot() in such a way that it could fail or even crash if the shared PMSignalState structure became corrupted. Remarkably, we've not seen field reports of such crashes; but I managed to induce one while testing the recent changes around palloc chunk headers. To fix, make a semi-duplicative state array inside the postmaster so that we need consult only local state while choosing a "child slot" for a new backend. Ensure that other postmaster-executed routines in pmsignal.c don't have critical dependencies on the shared state, either. Corruption of PMSignalState might now lead ReleasePostmasterChildSlot() to conclude that backend X failed, when actually backend Y was the one that trashed things. But that doesn't matter, because we'll force a cluster-wide reset regardless. Back-patch to all supported branches, since this is an old bug. Discussion: https://postgr.es/m/3436789.1665187055@sss.pgh.pa.us
* Take care to de-duplicate entries in standby.c's table of locks.Tom Lane2022-10-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The RecoveryLockLists data structure, which tracks all exclusive locks that the startup process is holding on behalf of transactions being replayed, did not have any provision for avoiding duplicate entries for the same lock. Maybe that was okay when the code was first written. However, modern practice is for checkpoints to write fresh lists of all active exclusive locks into the WAL. Thus, an exclusive lock that survives across multiple checkpoints causes bloat in standbys' startup processes. If there are a lot of such locks this can look like a memory leak, and it's even possible to drive the startup process into a palloc failure from an over-length List. To fix, use a hash table instead of simple lists to track the locks being held. Allowing for dynahash overhead, this requires a little more space per lock than the old way (although it's the same size as what we were allocating prior to c6e0fe1f2). It's probably a shade slower too. However, testing indicates that the penalty is negligible on ordinary workloads, so let's make this change to improve robustness in extreme cases. Patch by me, per report from Dmitriy Kuzmin. No back-patch (for now anyway), since it seems that a significant improvement would only occur in corner cases. Discussion: https://postgr.es/m/CAHLDt=_ts0A7Agn=hCpUh+RCFkxd+G6uuT=kcTfqFtGur0dp=A@mail.gmail.com
* meson: Add initial version of meson based build systemAndres Freund2022-09-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Autoconf is showing its age, fewer and fewer contributors know how to wrangle it. Recursive make has a lot of hard to resolve dependency issues and slow incremental rebuilds. Our home-grown MSVC build system is hard to maintain for developers not using Windows and runs tests serially. While these and other issues could individually be addressed with incremental improvements, together they seem best addressed by moving to a more modern build system. After evaluating different build system choices, we chose to use meson, to a good degree based on the adoption by other open source projects. We decided that it's more realistic to commit a relatively early version of the new build system and mature it in tree. This commit adds an initial version of a meson based build system. It supports building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD, Solaris and Windows (however only gcc is supported on aix, solaris). For Windows/MSVC postgres can now be built with ninja (faster, particularly for incremental builds) and msbuild (supporting the visual studio GUI, but building slower). Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM bitcode generation, documentation adjustments) are done in subsequent commits requiring further review. Other aspects (e.g. not installing test-only extensions) are not yet addressed. When building on Windows with msbuild, builds are slower when using a visual studio version older than 2019, because those versions do not support MultiToolTask, required by meson for intra-target parallelism. The plan is to remove the MSVC specific build system in src/tools/msvc soon after reaching feature parity. However, we're not planning to remove the autoconf/make build system in the near future. Likely we're going to keep at least the parts required for PGXS to keep working around until all supported versions build with meson. Some initial help for postgres developers is at https://wiki.postgresql.org/wiki/Meson With contributions from Thomas Munro, John Naylor, Stone Tickle and others. Author: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
* Used optimized linear search in more code pathsMichael Paquier2022-09-22
| | | | | | | | | | | | | | | | | This commit updates two code paths to use pg_lfind32() introduced by b6ef167 with TransactionId arrays: - At the end of TransactionIdIsInProgress(), when checking for the case of still running but overflowed subxids. - XidIsConcurrent(), when checking for a serializable conflict. These cases are less impactful than 37a6e5d, but a bit of micro-benchmarking of this API shows that linear search speeds up by ~20% depending on the number of items involved (x86_64 and amd64 looked at here). Author: Nathan Bossart Reviewed-by: Richard Guo, Michael Paquier Discussion: https://postgr.es/m/20220901185153.GA783106@nathanxps13
* Harmonize parameter names in storage and AM code.Peter Geoghegan2022-09-19
| | | | | | | | | | | | | | | Make sure that function declarations use names that exactly match the corresponding names from function definitions in storage, catalog, access method, executor, and logical replication code, as well as in miscellaneous utility/library code. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
* Use SIGNAL_ARGS consistently to declare signal handlers.Tom Lane2022-09-14
| | | | | | | | | | | | Various bits of code were declaring signal handlers manually, using "int signum" or variants of that. We evidently have no platforms where that's actually wrong, but let's use our SIGNAL_ARGS macro everywhere anyway. If nothing else, it's good for finding signal handlers easily. No need for back-patch, since this is just cosmetic AFAICS. Discussion: https://postgr.es/m/2684964.1663167995@sss.pgh.pa.us
* Split up guc.c for better build speed and ease of maintenance.Tom Lane2022-09-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | guc.c has grown to be one of our largest .c files, making it a bottleneck for compilation. It's also acquired a bunch of knowledge that'd be better kept elsewhere, because of our not very good habit of putting variable-specific check hooks here. Hence, split it up along these lines: * guc.c itself retains just the core GUC housekeeping mechanisms. * New file guc_funcs.c contains the SET/SHOW interfaces and some SQL-accessible functions for GUC manipulation. * New file guc_tables.c contains the data arrays that define the built-in GUC variables, along with some already-exported constant tables. * GUC check/assign/show hook functions are moved to the variable's home module, whenever that's clearly identifiable. A few hard- to-classify hooks ended up in commands/variable.c, which was already a home for miscellaneous GUC hook functions. To avoid cluttering a lot more header files with #include "guc.h", I also invented a new header file utils/guc_hooks.h and put all the GUC hook functions' declarations there, regardless of their originating module. That allowed removal of #include "guc.h" from some existing headers. The fallout from that (hopefully all caught here) demonstrates clearly why such inclusions are best minimized: there are a lot of files that, for example, were getting array.h at two or more levels of remove, despite not having any connection at all to GUCs in themselves. There is some very minor code beautification here, such as renaming a couple of inconsistently-named hook functions and improving some comments. But mostly this just moves code from point A to point B and deals with the ensuing needs for #include adjustments and exporting a few functions that previously weren't exported. Patch by me, per a suggestion from Andres Freund; thanks also to Michael Paquier for the idea to invent guc_funcs.c. Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us
* Remove configure probes for sys/ipc.h, sys/sem.h, sys/shm.h.Thomas Munro2022-08-14
| | | | | | | | | These are in SUSv2 and every targeted Unix system has them. It's not hard to avoid including them on Windows system because they're mostly used in platform-specific translation units. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
* Fix comment in procarray.c.Fujii Masao2022-07-28
| | | | | | | | | Commit fea10a6434 renamed VariableCacheData.nextFullXid to nextXid. But commit dc7420c2c9 introduced the comment mentioning nextFullXid. This commit changes"nextFullXid" to "nextXid" in the comment. Author: Zhang Mingli Discussion: https://postgr.es/m/642BA615-4B28-4B0C-BDF6-4D33E366BCDF@gmail.com
* Make dsm_impl_posix_resize more future-proof.Thomas Munro2022-07-16
| | | | | | | | | | | | | | | | | | | | Commit 4518c798 blocks signals for a short region of code, but it assumed that whatever called it had the signal mask set to UnBlockSig on entry. That may be true today (or may even not be, in extensions in the wild), but it would be better not to make that assumption. We should save-and-restore the caller's signal mask. The PG_SETMASK() portability macro couldn't be used for that, which is why it wasn't done before. But... considering that commit a65e0864 established back in 9.6 that supported POSIX systems have sigprocmask(), and that this is POSIX-only code, there is no reason not to use standard sigprocmask() directly to achieve that. Back-patch to all supported releases, like 4518c798 and 80845b7c. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGKx6Biq7_UuV0kn9DW%2B8QWcpJC1qwhizdtD9tN-fn0H0g%40mail.gmail.com
* Don't clobber postmaster sigmask in dsm_impl_resize.Thomas Munro2022-07-15
| | | | | | | | | | | | Commit 4518c798 intended to block signals in regular backends that allocate DSM segments, but dsm_impl_resize() is also reached by dsm_postmaster_startup(). It's not OK to clobber the postmaster's signal mask, so only manipulate the signal mask when under the postmaster. Back-patch to all releases, like 4518c798. Discussion: https://postgr.es/m/CA%2BhUKGKNpK%3D2OMeea_AZwpLg7Bm4%3DgYWk7eDjZ5F6YbozfOf8w%40mail.gmail.com
* Create a distinct wait event for POSIX DSM allocation.Thomas Munro2022-07-14
| | | | | | | | | Previously we displayed "DSMFillZeroWrite" while in posix_fallocate(), because we shared the same wait event for "mmap" and "posix" DSM types. Let's introduce a new wait event "DSMAllocate", to be more accurate. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220711174518.yldckniicknsxgzl%40awork3.anarazel.de
* Remove redundant ftruncate() for POSIX DSM memory.Thomas Munro2022-07-14
| | | | | | | | | | | | | In early releases of the DSM infrastructure, it was possible to resize segments. That was removed in release 12 by commit 3c60d0fa. Now the ftruncate() + posix_fallocate() sequence during DSM segment creation has a redundant step: we're always extending from zero to the desired size, so we might as well just call posix_fallocate(). Let's also include the remaining ftruncate() call (non-Linux POSIX systems) in the wait event reporting, for good measure. Discussion: https://postgr.es/m/CA%2BhUKGJSm-nq8s%2B_59zb7NbFQF-OS%3DxTnTAiGLrQpuSmU2y_1A%40mail.gmail.com
* Block signals while allocating DSM memory.Thomas Munro2022-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Linux, we call posix_fallocate() on shm_open()'d memory to avoid later potential SIGBUS (see commit 899bd785). Based on field reports of systems stuck in an EINTR retry loop there, there, we made it possible to break out of that loop via slightly odd coding where the CHECK_FOR_INTERRUPTS() call was somewhat removed from the loop (see commit 422952ee). On further reflection, that was not a great choice for at least two reasons: 1. If interrupts were held, the CHECK_FOR_INTERRUPTS() would do nothing and the EINTR error would be surfaced to the user. 2. If EINTR was reported but neither QueryCancelPending nor ProcDiePending was set, then we'd dutifully retry, but with a bit more understanding of how posix_fallocate() works, it's now clear that you can get into a loop that never terminates. posix_fallocate() is not a function that can do some of the job and tell you about progress if it's interrupted, it has to undo what it's done so far and report EINTR, and if signals keep arriving faster than it can complete (cf recovery conflict signals), you're stuck. Therefore, for now, we'll simply block most signals to guarantee progress. SIGQUIT is not blocked (see InitPostmasterChild()), because its expected handler doesn't return, and unblockable signals like SIGCONT are not expected to arrive at a high rate. For good measure, we'll include the ftruncate() call in the blocked region, and add a retry loop. Back-patch to all supported releases. Reported-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Nicola Contu <nicola.contu@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220701154105.jjfutmngoedgiad3%40alvherre.pgsql
* Change internal RelFileNode references to RelFileNumber or RelFileLocator.Robert Haas2022-07-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
* Harden dsm_impl.c against unexpected EEXIST.Thomas Munro2022-07-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we trusted the OS not to report EEXIST unless we'd passed in IPC_CREAT | IPC_EXCL or O_CREAT | O_EXCL, as appropriate. Solaris's shm_open() can in fact do that, causing us to crash because we didn't ereport and then we blithely assumed the mapping was successful. Let's treat EEXIST just like any other error, unless we're actually trying to create a new segment. This applies to shm_open(), where this behavior has been seen, and also to the equivalent operations for our sysv and mmap modes just on principle. Based on the underlying reason for the error, namely contention on a lock file managed by Solaris librt for each distinct name, this problem is only likely to happen on 15 and later, because the new shared memory stats system produces shm_open() calls for the same path from potentially large numbers of backends concurrently during authentication. Earlier releases only shared memory segments between a small number of parallel workers under one Gather node. You could probably hit it if you tried hard enough though, and we should have been more defensive in the first place. Therefore, back-patch to all supported releases. Per build farm animal margay. This isn't the end of the story, though, it just changes random crashes into random "File exists" errors; more work needed for a green build farm. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGKqKrCV5xKWfh9rnm%3Do%3DDwZLTLtnsj_XpUi9g5%3DV%2B9oyg%40mail.gmail.com
* Fix visibility check when XID is committed in CLOG but not in procarray.Heikki Linnakangas2022-06-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TransactionIdIsInProgress had a fast path to return 'false' if the single-item CLOG cache said that the transaction was known to be committed. However, that was wrong, because a transaction is first marked as committed in the CLOG but doesn't become visible to others until it has removed its XID from the proc array. That could lead to an error: ERROR: t_xmin is uncommitted in tuple to be updated or for an UPDATE to go ahead without blocking, before the previous UPDATE on the same row was made visible. The window is usually very short, but synchronous replication makes it much wider, because the wait for synchronous replica happens in that window. Another thing that makes it hard to hit is that it's hard to get such a commit-in-progress transaction into the single item CLOG cache. Normally, if you call TransactionIdIsInProgress on such a transaction, it determines that the XID is in progress without checking the CLOG and without populating the cache. One way to prime the cache is to explicitly call pg_xact_status() on the XID. Another way is to use a lot of subtransactions, so that the subxid cache in the proc array is overflown, making TransactionIdIsInProgress rely on pg_subtrans and CLOG checks. This has been broken ever since it was introduced in 2008, but the race condition is very hard to hit, especially without synchronous replication. There were a couple of reports of the error starting from summer 2021, but no one was able to find the root cause then. TransactionIdIsKnownCompleted() is now unused. In 'master', remove it, but I left it in place in backbranches in case it's used by extensions. Also change pg_xact_status() to check TransactionIdIsInProgress(). Previously, it only checked the CLOG, and returned "committed" before the transaction was actually made visible to other queries. Note that this also means that you cannot use pg_xact_status() to reproduce the bug anymore, even if the code wasn't fixed. Report and analysis by Konstantin Knizhnik. Patch by Simon Riggs, with the pg_xact_status() change added by me. Author: Simon Riggs Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru
* Don't trust signalfd() on illumos.Thomas Munro2022-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 6a2a70a02, we've used signalfd() to receive latch wakeups when building with WAIT_USE_EPOLL (default for Linux and illumos), and our traditional self-pipe when falling back to WAIT_USE_POLL (default for other Unixes with neither epoll() nor kqueue()). Unexplained hangs and kernel panics have been reported on illumos systems, apparently linked to this use of signalfd(), leading illumos users and build farm members to have to define WAIT_USE_POLL explicitly as a work-around. A bug report exists at https://www.illumos.org/issues/13700 but no fix is available yet. Let's provide a way for illumos users to go back to self-pipes with epoll(), like releases before 14, and choose that by default. No change for Linux users. To help with development/debugging, macros WAIT_USE_{EPOLL,POLL} and WAIT_USE_{SIGNALFD,SELF_PIPE} can be defined explicitly to override the defaults. Back-patch to 14, where we started using signalfd(). Reported-by: Japin Li <japinli@hotmail.com> Reported-by: Olaf Bohlen <olbohlen@eenfach.de> (off-list) Reviewed-by: Japin Li <japinli@hotmail.com> Discussion: https://postgr.es/m/MEYP282MB1669C8D88F0997354C2313C1B6CA9%40MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
* Be more careful about GucSource for internally-driven GUC settings.Tom Lane2022-06-08
| | | | | | | | | | | | | | | | | | | | | | | The original advice for hard-wired SetConfigOption calls was to use PGC_S_OVERRIDE, particularly for PGC_INTERNAL GUCs. However, that's really overkill for PGC_INTERNAL GUCs, since there is no possibility that we need to override a user-provided setting. Instead use PGC_S_DYNAMIC_DEFAULT in most places, so that the value will appear with source = 'default' in pg_settings and thereby not be shown by psql's new \dconfig command. The one exception is that when changing in_hot_standby in a hot-standby session, we still use PGC_S_OVERRIDE, because people felt that seeing that in \dconfig would be a good thing. Similarly use PGC_S_DYNAMIC_DEFAULT for the auto-tune value of wal_buffers (if possible, that is if wal_buffers wasn't explicitly set to -1), and for the typical 2MB value of max_stack_depth. In combination these changes remove four not-very-interesting entries from the typical output of \dconfig, all of which people fingered as "why is that showing up?" in the discussion thread. Discussion: https://postgr.es/m/3118455.1649267333@sss.pgh.pa.us
* Revert changes to CONCURRENTLY that "sped up" Xmin advanceAlvaro Herrera2022-05-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit d9d076222f5b "VACUUM: ignore indexing operations with CONCURRENTLY". These changes caused indexes created with the CONCURRENTLY option to miss heap tuples that were HOT-updated and HOT-pruned during the index creation. Before these changes, HOT pruning would have been prevented by the Xmin of the transaction creating the index, but because this change was precisely to allow the Xmin to move forward ignoring that backend, now other backends scanning the table can prune them. This is not a problem for VACUUM (which requires a lock that conflicts with a CREATE INDEX CONCURRENTLY operation), but HOT-prune can definitely occur. In other words, Xmin advancement was sped up, but at the cost of corrupting the resulting index. Regrettably, this means that the new feature in PG14 that RIC/CIC on very large tables no longer force VACUUM to retain very old tuples goes away. We might try to implement it again in a later release, but for now the risk of indexes missing tuples is too high and there's no easy fix. Backpatch to 14, where this change appeared. Reported-by: Peter Slavov <pet.slavov@gmail.com> Diagnosys-by: Andrey Borodin <x4mmm@yandex-team.ru> Diagnosys-by: Michael Paquier <michael@paquier.xyz> Diagnosys-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/17485-396609c6925b982d%40postgresql.org
* shm_mq_sendv: Fix flushing bug when receiver not yet attached.Robert Haas2022-05-31
| | | | | | | | | | | | | | | | | With the old logic, when the reciever had not yet attached, we would never call shm_mq_inc_bytes_written(), even if force_flush = true was specified. That could result in a situation where data that the sender believes it has sent is never received. Along the way, remove a useless function prototype for a nonexistent function from shm_mq.h. Commit 46846433a03dff4f2e08c8a161e54a842da360d6 introduced these problems. Pavan Deolasee, with a few changes by me. Discussion: https://postgr.es/m/CABOikdPkwtLLCTnzzmpSMXo3QZa2yXq0J7Q61ssdLFAJYrOVvQ@mail.gmail.com