aboutsummaryrefslogtreecommitdiff
path: root/src/include/storage/proc.h
Commit message (Collapse)AuthorAge
* Preserve conflict-relevant data during logical replication.Amit Kapila2025-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Logical replication requires reliable conflict detection to maintain data consistency across nodes. To achieve this, we must prevent premature removal of tuples deleted by other origins and their associated commit_ts data by VACUUM, which could otherwise lead to incorrect conflict reporting and resolution. This patch introduces a mechanism to retain deleted tuples on the subscriber during the application of concurrent transactions from remote nodes. Retaining these tuples allows us to correctly ignore concurrent updates to the same tuple. Without this, an UPDATE might be misinterpreted as an INSERT during resolutions due to the absence of the original tuple. Additionally, we ensure that origin metadata is not prematurely removed by vacuum freeze, which is essential for detecting update_origin_differs and delete_origin_differs conflicts. To support this, a new replication slot named pg_conflict_detection is created and maintained by the launcher on the subscriber. Each apply worker tracks its own non-removable transaction ID, which the launcher aggregates to determine the appropriate xmin for the slot, thereby retaining necessary tuples. Conflict information retention (deleted tuples and commit_ts) can be enabled per subscription via the retain_conflict_info option. This is disabled by default to avoid unnecessary overhead for configurations that do not require conflict resolution or logging. During upgrades, if any subscription on the old cluster has retain_conflict_info enabled, a conflict detection slot will be created to protect relevant tuples from deletion when the new cluster starts. This is a foundational work to correctly detect update_deleted conflict which will be done in a follow-up patch. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
* Eliminate divide in new fast-path locking codeDavid Rowley2025-04-27
| | | | | | | | | | | | | | | | | | | | | | | c4d5cb71d2 adjusted the fast-path locking code to allow some configuration of the number of fast-path locking slots via the max_locks_per_transaction GUC. In that commit the FAST_PATH_REL_GROUP() macro used integer division to determine the fast-path locking group slot to use for the lock. The divisor in this case is always a power-of-two value. Here we swap out the divide by a bitwise-AND, which is a significantly faster operation to perform. In passing, adjust the code that's setting FastPathLockGroupsPerBackend so that it's more clear that the value being set is a power-of-two. Also, adjust some comments in the area which contained some magic numbers. It seems better to justify the 1024 upper limit in the location where the #define is made instead of where it is used. Author: David Rowley <drowleyml@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAApHDvodr3bcnpxcs7+k-3cFwYR0tP-BYhyd2PpDhe-bCx9i=g@mail.gmail.com
* Comment on need to MarkBufferDirty() if omitting DELAY_CHKPT_START.Noah Misch2025-04-20
| | | | | | | | | | | | | | | | Blocking checkpoint phase 2 requires MarkBufferDirty() and BUFFER_LOCK_EXCLUSIVE; neither suffices by itself. transam/README documents this, citing SyncOneBuffer(). Update the DELAY_CHKPT_START documentation to say this. Expand the heap_inplace_update_and_unlock() comment that cites XLogSaveBufferForHint() as precedent, since heap_inplace_update_and_unlock() could have opted not to use DELAY_CHKPT_START. Commit 8e7e672cdaa6bfec85d4d5dd9be84159df23bb41 added DELAY_CHKPT_START to heap_inplace_update_and_unlock(). Since commit bc6bad88572501aecaa2ac5d4bc900ac0fd457d5 reverted it in non-master branches, no back-patch. Discussion: https://postgr.es/m/20250406180054.26.nmisch@google.com
* aio: Infrastructure for io_method=workerAndres Freund2025-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit contains the basic, system-wide, infrastructure for io_method=worker. It does not yet actually execute IO, this commit just provides the infrastructure for running IO workers, kept separate for easier review. The number of IO workers can be adjusted with a PGC_SIGHUP GUC. Eventually we'd like to make the number of workers dynamically scale up/down based on the current "IO load". To allow the number of IO workers to be increased without a restart, we need to reserve PGPROC entries for the workers unconditionally. This has been judged to be worth the cost. If it turns out to be problematic, we can introduce a PGC_POSTMASTER GUC to control the maximum number. As io workers might be needed during shutdown, e.g. for AIO during the shutdown checkpoint, a new PMState phase is added. IO workers are shut down after the shutdown checkpoint has been performed and walsender/archiver have shut down, but before the checkpointer itself shuts down. See also 87a6690cc69. Updates PGSTAT_FILE_FORMAT_ID due to the addition of a new BackendType. Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Co-authored-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
* Add GUC option to log lock acquisition failures.Fujii Masao2025-03-14
| | | | | | | | | | | | | | | | | | | | | | This commit introduces a new GUC, log_lock_failure, which controls whether a detailed log message is produced when a lock acquisition fails. Currently, it only supports logging lock failures caused by SELECT ... NOWAIT. The log message includes information about all processes holding or waiting for the lock that couldn't be acquired, helping users analyze and diagnose the causes of lock failures. Currently, this option does not log failures from SELECT ... SKIP LOCKED, as that could generate excessive log messages if many locks are skipped, causing unnecessary noise. This mechanism can be extended in the future to support for logging lock failures from other commands, such as LOCK TABLE ... NOWAIT. Author: Yuki Seino <seinoyu@oss.nttdata.com> Co-authored-by: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://postgr.es/m/411280a186cc26ef7034e0f2dfe54131@oss.nttdata.com
* Make FP_LOCK_SLOTS_PER_BACKEND look like a functionTomas Vondra2025-03-04
| | | | | | | | | | | | | The FP_LOCK_SLOTS_PER_BACKEND macro looks like a constant, but it depends on the max_locks_per_transaction GUC, and thus can change. This is non-obvious and confusing, so make it look more like a function by renaming it to FastPathLockSlotsPerBackend(). While at it, use the macro when initializing fast-path shared memory, instead of using the formula. Reported-by: Andres Freund Discussion: https://postgr.es/m/ffiwtzc6vedo6wb4gbwelon5nefqg675t5c7an2ta7pcz646cg%40qwmkdb3l4ett
* Remove duplicate definitions in proc.hHeikki Linnakangas2025-01-06
| | | | | | | These are also present in procnumber.h Reported-by: Peter Eisentraut Discussion: https://www.postgresql.org/message-id/bd04d675-4672-4f87-800a-eb5d470c15fc@eisentraut.org
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Replace PGPROC.isBackgroundWorker with isRegularBackend.Tom Lane2024-12-28
| | | | | | | | | Commit 34486b609 effectively redefined isBackgroundWorker as meaning "not a regular backend", whereas before it had the narrower meaning of AmBackgroundWorkerProcess(). For clarity, rename the field to isRegularBackend and invert its sense. Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
* Exclude parallel workers from connection privilege/limit checks.Tom Lane2024-12-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cause parallel workers to not check datallowconn, rolcanlogin, and ACL_CONNECT privileges. The leader already checked these things (except for rolcanlogin which might have been checked for a different role). Re-checking can accomplish little except to induce unexpected failures in applications that might not even be aware that their query has been parallelized. We already had the principle that parallel workers rely on their leader to pass a valid set of authorization information, so this change just extends that a bit further. Also, modify the ReservedConnections, datconnlimit and rolconnlimit logic so that these limits are only enforced against regular backends, and only regular backends are counted while checking if the limits were already reached. Previously, background processes that had an assigned database or role were subject to these limits (with rather random exclusions for autovac workers and walsenders), and the set of existing processes that counted against each limit was quite haphazard as well. The point of these limits, AFAICS, is to ensure the availability of PGPROC slots for regular backends. Since all other types of processes have their own separate pools of PGPROC slots, it makes no sense either to enforce these limits against them or to count them while enforcing the limit. While edge-case failures of these sorts have been possible for a long time, the problem got a good deal worse with commit 5a2fed911 (CVE-2024-10978), which caused parallel workers to make some of these checks using the leader's current role where before we had used its AuthenticatedUserId, thus allowing parallel queries to fail after SET ROLE. The previous behavior was fairly accidental and I have no desire to return to it. This patch includes reverting 73c9f91a1, which was an emergency hack to suppress these same checks in some cases. It wasn't complete, as shown by a recent bug report from Laurenz Albe. We can also revert fd4d93d26 and 492217301, which hacked around the same problems in one regression test. In passing, remove the special case for autovac workers in CheckMyDatabase; it seems cleaner to have AutoVacWorkerMain pass the INIT_PG_OVERRIDE_ALLOW_CONNS flag, now that that does what's needed. Like 5a2fed911, back-patch to supported branches (which sadly no longer includes v12). Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
* Reserve a PGPROC slot and semaphore for the slotsync worker process.Tom Lane2024-12-28
| | | | | | | | | | | | | | | | | | | | | | | | | The need for this was missed in commit 93db6cbda, with the result being that if we launch a slotsync worker it would consume one of the PGPROCs in the max_connections pool. That could lead to inability to launch the worker, or to subsequent failures of connection requests that should have succeeded according to the configured settings. Rather than create some one-off infrastructure to support this, let's group the slotsync worker with the existing autovac launcher in a new category of "special worker" processes. These are kind of like auxiliary processes, but they cannot use that infrastructure because they need to be able to run transactions. For the moment, make these processes share the PGPROC freelist used for autovac workers (which previously supplied the autovac launcher too). This is partly to avoid an ABI change in v17, and partly because it seems silly to have a freelist with at most two members. This might be worth revisiting if we grow enough workers in this category. Tom Lane and Hou Zhijie. Back-patch to v17. Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
* Split ProcSleep function into JoinWaitQueue and ProcSleepHeikki Linnakangas2024-11-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split ProcSleep into two functions: JoinWaitQueue and ProcSleep. JoinWaitQueue is called while holding the partition lock, and inserts the current process to the wait queue, while ProcSleep() does the actual sleeping. ProcSleep() is now called without holding the partition lock, and it no longer re-acquires the partition lock before returning. That makes the wakeup a little cheaper. Once upon a time, re-acquiring the partition lock was needed to prevent a signal handler from longjmping out at a bad time, but these days our signal handlers just set flags, and longjmping can only happen at points where we explicitly run CHECK_FOR_INTERRUPTS(). If JoinWaitQueue detects an "early deadlock" before even joining the wait queue, it returns without changing the shared lock entry, leaving the cleanup of the shared lock entry to the caller. This makes the handling of an early deadlock the same as the dontWait=true case. One small user-visible side-effect of this refactoring is that we now only set the 'ps' title to say "waiting" when we actually enter the sleep, not when the lock is skipped because dontWait=true, or when a deadlock is detected early before entering the sleep. This eliminates the 'lockAwaited' global variable in proc.c, which was largely redundant with 'awaitedLock' in lock.c Note: Updating the local lock table is now the caller's responsibility. JoinWaitQueue and ProcSleep are now only responsible for modifying the shared state. Seems a little nicer that way. Based on Thomas Munro's earlier patch and observation that ProcSleep doesn't really need to re-acquire the partition lock. Reviewed-by: Maxim Orlov Discussion: https://www.postgresql.org/message-id/7c2090cd-a72a-4e34-afaa-6dd2ef31440e@iki.fi
* Use ProcNumbers instead of direct Latch pointers to address other procsHeikki Linnakangas2024-11-01
| | | | | | | | This is in preparation for replacing Latches with a new abstraction. That's still work in progress, but this seems a little tidier anyway, so let's get this refactoring out of the way already. Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c%40iki.fi
* Increase the number of fast-path lock slotsTomas Vondra2024-09-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the fixed-size array of fast-path locks with arrays, sized on startup based on max_locks_per_transaction. This allows using fast-path locking for workloads that need more locks. The fast-path locking introduced in 9.2 allowed each backend to acquire a small number (16) of weak relation locks cheaply. If a backend needs to hold more locks, it has to insert them into the shared lock table. This is considerably more expensive, and may be subject to contention (especially on many-core systems). The limit of 16 fast-path locks was always rather low, because we have to lock all relations - not just tables, but also indexes, views, etc. For planning we need to lock all relations that might be used in the plan, not just those that actually get used in the final plan. So even with rather simple queries and schemas, we often need significantly more than 16 locks. As partitioning gets used more widely, and the number of partitions increases, this limit is trivial to hit. Complex queries may easily use hundreds or even thousands of locks. For workloads doing a lot of I/O this is not noticeable, but for workloads accessing only data in RAM, the access to the shared lock table may be a serious issue. This commit removes the hard-coded limit of the number of fast-path locks. Instead, the size of the fast-path arrays is calculated at startup, and can be set much higher than the original 16-lock limit. The overall fast-path locking protocol remains unchanged. The variable-sized fast-path arrays can no longer be part of PGPROC, but are allocated as a separate chunk of shared memory and then references from the PGPROC entries. The fast-path slots are organized as a 16-way set associative cache. You can imagine it as a hash table of 16-slot "groups". Each relation is mapped to exactly one group using hash(relid), and the group is then processed using linear search, just like the original fast-path cache. With only 16 entries this is cheap, with good locality. Treating this as a simple hash table with open addressing would not be efficient, especially once the hash table gets almost full. The usual remedy is to grow the table, but we can't do that here easily. The access would also be more random, with worse locality. The fast-path arrays are sized using the max_locks_per_transaction GUC. We try to have enough capacity for the number of locks specified in the GUC, using the traditional 2^n formula, with an upper limit of 1024 lock groups (i.e. 16k locks). The default value of max_locks_per_transaction is 64, which means those instances will have 64 fast-path slots. The main purpose of the max_locks_per_transaction GUC is to size the shared lock table. It is often set to the "average" number of locks needed by backends, with some backends using significantly more locks. This should not be a major issue, however. Some backens may have to insert locks into the shared lock table, but there can't be too many of them, limiting the contention. The only solution is to increase the GUC, even if the shared lock table already has sufficient capacity. That is not free, especially in terms of memory usage (the shared lock table entries are fairly large). It should only happen on machines with plenty of memory, though. In the future we may consider a separate GUC for the number of fast-path slots, but let's try without one first. Reviewed-by: Robert Haas, Jakub Wartak Discussion: https://postgr.es/m/510b887e-c0ce-4a0c-a17a-2c6abb8d9a5c@enterprisedb.com
* Apply PGDLLIMPORT markings to some GUC variablesPeter Eisentraut2024-08-14
| | | | | | | | | | | | According to the commit message in 8ec569479, we must have all variables in header files marked with PGDLLIMPORT. In commit d3cc5ffe81f6 some variables were moved from launch_backend.c file to several header files. This adds PGDLLIMPORT to moved variables. Author: Sofia Kopikova <s.kopikova@postgrespro.ru> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/e0b17014-5319-4dd6-91cd-93d9c8fc9539%40postgrespro.ru
* Move extern declarations for EXEC_BACKEND to header filesPeter Eisentraut2024-07-23
| | | | | | | | | | | | | This fixes warnings from -Wmissing-variable-declarations (not yet part of the standard warning options) under EXEC_BACKEND. The NON_EXEC_STATIC variables need a suitable declaration in a header file under EXEC_BACKEND. Also fix the inconsistent application of the volatile qualifier for PMSignalState, which was revealed by this change. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org
* Lift limitation that PGPROC->links must be the first fieldHeikki Linnakangas2024-07-05
| | | | | | | | | | | | | | | Since commit 5764f611e1, we've been using the ilist.h functions for handling the linked list. There's no need for 'links' to be the first element of the struct anymore, except for one call in InitProcess where we used a straight cast from the 'dlist_node *' to PGPROC *, without the dlist_container() macro. That was just an oversight in commit 5764f611e1, fix it. There no imminent need to move 'links' from being the first field, but let's be tidy. Reviewed-by: Aleksander Alekseev, Andres Freund Discussion: https://www.postgresql.org/message-id/22aa749e-cc1a-424a-b455-21325473a794@iki.fi
* Fix typos and duplicate wordsDaniel Gustafsson2024-04-18
| | | | | | | | | | | | This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
* Allow a no-wait lock acquisition to succeed in more cases.Robert Haas2024-03-14
| | | | | | | | | | | | | | | | | | | | | We don't determine the position at which a process waiting for a lock should insert itself into the wait queue until we reach ProcSleep(), and we may at that point discover that we must insert ourselves ahead of everyone who wants a conflicting lock, in which case we obtain the lock immediately. Up until now, a no-wait lock acquisition would fail in such cases, erroneously claiming that the lock couldn't be obtained immediately. Fix that by trying ProcSleep even in the no-wait case. No back-patch for now, because I'm treating this as an improvement to the existing no-wait feature. It could instead be argued that it's a bug fix, on the theory that there should never be any case whatsoever where no-wait fails to obtain a lock that would have been obtained immediately without no-wait, but I'm reluctant to interpret the semantics of no-wait that strictly. Robert Haas and Jingxian Li Discussion: http://postgr.es/m/CA+TgmobCH-kMXGVpb0BB-iNMdtcNkTvcZ4JBxDJows3kYM+GDg@mail.gmail.com
* Replace BackendIds with 0-based ProcNumbersHeikki Linnakangas2024-03-03
| | | | | | | | | | | | | | | | | | Now that BackendId was just another index into the proc array, it was redundant with the 0-based proc numbers used in other places. Replace all usage of backend IDs with proc numbers. The only place where the term "backend id" remains is in a few pgstat functions that expose backend IDs at the SQL level. Those IDs are now in fact 0-based ProcNumbers too, but the documentation still calls them "backend ids". That term still seems appropriate to describe what the numbers are, so I let it be. One user-visible effect is that pg_temp_0 is now a valid temp schema name, for backend with ProcNumber 0. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
* Redefine backend ID to be an index into the proc arrayHeikki Linnakangas2024-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, backend ID was an index into the ProcState array, in the shared cache invalidation manager (sinvaladt.c). The entry in the ProcState array was reserved at backend startup by scanning the array for a free entry, and that was also when the backend got its backend ID. Things become slightly simpler if we redefine backend ID to be the index into the PGPROC array, and directly use it also as an index to the ProcState array. This uses a little more memory, as we reserve a few extra slots in the ProcState array for aux processes that don't need them, but the simplicity is worth it. Aux processes now also have a backend ID. This simplifies the reservation of BackendStatusArray and ProcSignal slots. You can now convert a backend ID into an index into the PGPROC array simply by subtracting 1. We still use 0-based "pgprocnos" in various places, for indexes into the PGPROC array, but the only difference now is that backend IDs start at 1 while pgprocnos start at 0. (The next commmit will get rid of the term "backend ID" altogether and make everything 0-based.) There is still a 'backendId' field in PGPROC, now part of 'vxid' which encapsulates the backend ID and local transaction ID together. It's needed for prepared xacts. For regular backends, the backendId is always equal to pgprocno + 1, but for prepared xact PGPROC entries, it's the ID of the original backend that processed the transaction. Reviewed-by: Andres Freund, Reid Thompson Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
* Remove superfluous 'pgprocno' field from PGPROCHeikki Linnakangas2024-02-22
| | | | | | | | | It was always just the index of the PGPROC entry from the beginning of the proc array. Introduce a macro to compute it from the pointer instead. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
* Introduce transaction_timeoutAlexander Korotkov2024-02-15
| | | | | | | | | | | | | | | | | | | This commit adds timeout that is expected to be used as a prevention of long-running queries. Any session within the transaction will be terminated after spanning longer than this timeout. However, this timeout is not applied to prepared transactions. Only transactions with user connections are affected. Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com Author: Andrey Borodin <amborodin@acm.org> Author: Japin Li <japinli@hotmail.com> Author: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Nikolay Samokhvalov <samokhvalov@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: bt23nguyent <bt23nguyent@oss.nttdata.com> Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Add a new WAL summarizer process.Robert Haas2023-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When active, this process writes WAL summary files to $PGDATA/pg_wal/summaries. Each summary file contains information for a certain range of LSNs on a certain TLI. For each relation, it stores a "limit block" which is 0 if a relation is created or destroyed within a certain range of WAL records, or otherwise the shortest length to which the relation was truncated during that range of WAL records, or otherwise InvalidBlockNumber. In addition, it stores a list of blocks which have been modified during that range of WAL records, but excluding blocks which were removed by truncation after they were modified and never subsequently modified again. In other words, it tells us which blocks need to copied in case of an incremental backup covering that range of WAL records. But this doesn't yet add the capability to actually perform an incremental backup; the next patch will do that. A new parameter summarize_wal enables or disables this new background process. The background process also automatically deletes summary files that are older than wal_summarize_keep_time, if that parameter has a non-zero value and the summarizer is configured to run. Patch by me, with some design help from Dilip Kumar and Andres Freund. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter Eisentraut, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
* Index SLRUs by 64-bit integers rather than by 32-bit integersAlexander Korotkov2023-11-29
| | | | | | | | | | | | | | | | | | We've had repeated bugs in the area of handling SLRU wraparound in the past, some of which have caused data loss. Switching to an indexing system for SLRUs that does not wrap around should allow us to get rid of a whole bunch of problems and improve the overall reliability of the system. This particular patch however only changes the indexing and doesn't address the wraparound per se. This is going to be done in the following patches. Author: Maxim Orlov, Aleksander Alekseev, Alexander Korotkov, Teodor Sigaev Author: Nikita Glukhov, Pavel Borisov, Yura Sokolov Reviewed-by: Jacob Champion, Heikki Linnakangas, Alexander Korotkov Reviewed-by: Japin Li, Pavel Borisov, Tom Lane, Peter Eisentraut, Andres Freund Reviewed-by: Andrey Borodin, Dilip Kumar, Aleksander Alekseev Discussion: https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com Discussion: https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com
* Pre-beta mechanical code beautification.Tom Lane2023-05-19
| | | | | | | | | | | | | | | Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
* Add new GUC reserved_connections.Robert Haas2023-01-20
| | | | | | | | | | | | This provides a way to reserve connection slots for non-superusers. The slots reserved via the new GUC are available only to users who have the new predefined role pg_use_reserved_connections. superuser_reserved_connections remains as a final reserve in case reserved_connections has been exhausted. Patch by Nathan Bossart. Reviewed by Tushar Ahuja and by me. Discussion: http://postgr.es/m/20230119194601.GA4105788@nathanxps13
* Use dlists instead of SHM_QUEUE for syncrep queueAndres Freund2023-01-18
| | | | | | | | | Part of a series to remove SHM_QUEUE. ilist.h style lists are more widely used and have an easier to use interface. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> (in an older version) Discussion: https://postgr.es/m/20221120055930.t6kl3tyivzhlrzu2@awork3.anarazel.de Discussion: https://postgr.es/m/20200211042229.msv23badgqljrdg2@alap3.anarazel.de
* Use dlist/dclist instead of PROC_QUEUE / SHM_QUEUE for heavyweight locksAndres Freund2023-01-18
| | | | | | | | | | | Part of a series to remove SHM_QUEUE. ilist.h style lists are more widely used and have an easier to use interface. As PROC_QUEUE is now unused, remove it. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> (in an older version) Discussion: https://postgr.es/m/20221120055930.t6kl3tyivzhlrzu2@awork3.anarazel.de Discussion: https://postgr.es/m/20200211042229.msv23badgqljrdg2@alap3.anarazel.de
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* lwlock: Fix quadratic behavior with very long wait listsAndres Freund2022-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now LWLockDequeueSelf() sequentially searched the list of waiters to see if the current proc is still is on the list of waiters, or has already been removed. In extreme workloads, where the wait lists are very long, this leads to a quadratic behavior. #backends iterating over a list #backends long. Additionally, the likelihood of needing to call LWLockDequeueSelf() in the first place also increases with the increased length of the wait queue, as it becomes more likely that a lock is released while waiting for the wait list lock, which is held for longer during lock release. Due to the exponential back-off in perform_spin_delay() this is surprisingly hard to detect. We should make that easier, e.g. by adding a wait event around the pg_usleep() - but that's a separate patch. The fix is simple - track whether a proc is currently waiting in the wait list or already removed but waiting to be woken up in PGPROC->lwWaiting. In some workloads with a lot of clients contending for a small number of lwlocks (e.g. WALWriteLock), the fix can substantially increase throughput. As the quadratic behavior arguably is a bug, we might want to decide to backpatch this fix in the future. Author: Andres Freund <andres@anarazel.de> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/20221027165914.2hofzp4cvutj6gin@awork3.anarazel.de Discussion: https://postgr.es/m/CALj2ACXktNbG=K8Xi7PSqbofTZozavhaxjatVc14iYaLu4Maag@mail.gmail.com
* Fix some comments in proc.hMichael Paquier2022-10-15
| | | | | | | | There was a typo and two places where delayChkpt was still mentioned, but it is called delayChkptFlags these days. Author: David Christensen Discussion: https://postgr.es/m/CAOxo6XLB=ab_Y9jRw4iKyMZDns0wo=EGSRvijhhaL67RzqbtMg@mail.gmail.com
* C comment: explain procArray->pgprocnos[]Bruce Momjian2022-10-11
| | | | | | | | | | Reported-by: Aleksander Alekseev Discussion: https://postgr.es/m/CAJ7c6TOs9Dh3KNR2kiQJ3Ow0=TBucL_57DAbm--2p8w5x_8YXQ@mail.gmail.com Author: Aleksander Alekseev Backpatch-through: master
* Add subxid-overflow "isolation" testAlvaro Herrera2022-09-14
| | | | | | | | | This test covers a few lines of subxid-overflow-handling code in various part of the backend, which are otherwise uncovered. Author: Simon Riggs <simon.riggs@enterprisedb.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/CANbhV-H8ov5+nCMBYQFKhO+UZJjrFgY_ORiMWr3RhS4+x44PzA@mail.gmail.com
* Repurpose PROC_COPYABLE_FLAGS as PROC_XMIN_FLAGSAlvaro Herrera2022-05-19
| | | | | | | | | | | | | | | This is a slight, convenient semantics change from what commit 0f0cfb494004 ("Fix parallel operations that prevent oldest xmin from advancing") introduced that lets us simplify the coding in the one place where it is used. Backpatch to 13. This is related to commit 6fea65508a1a ("Tighten ComputeXidHorizons' handling of walsenders") rewriting the code site where this is used, which has not yet been backpatched, but it may well be in the future. Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/202204191637.eldwa2exvguw@alvherre.pgsql
* Tighten ComputeXidHorizons' handling of walsenders.Tom Lane2022-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ComputeXidHorizons (nee GetOldestXmin) thought that it could identify walsenders by checking for proc->databaseId == 0. Perhaps that was safe when the code was written, but it's been wrong at least since autovacuum was invented. Background processes that aren't connected to any particular database, such as the autovacuum launcher and logical replication launcher, look like that too. This imprecision is harmful because when such a process advertises an xmin, the result is to hold back dead-tuple cleanup in all databases, though it'd be sufficient to hold it back in shared catalogs (which are the only relations such a process can access). Aside from being generally inefficient, this has recently been seen to cause regression test failures in the buildfarm, as a consequence of the logical replication launcher's startup transaction preventing VACUUM from marking pages of a user table as all-visible. We only want that global hold-back effect for the case where a walsender is advertising a hot standby feedback xmin. Therefore, invent a new PGPROC flag that says that a process' xmin should be considered globally, and check that instead of using the incorrect databaseId == 0 test. Currently only a walsender sets that flag, and only if it is not connected to any particular database. (This is for bug-compatibility with the undocumented behavior of the existing code, namely that feedback sent by a client who has connected to a particular database would not be applied globally. I'm not sure this is a great definition; however, such a client is capable of issuing plain SQL commands, and I don't think we want xmins advertised for such commands to be applied globally. Perhaps this could do with refinement later.) While at it, I rewrote the comment in ComputeXidHorizons, and re-ordered the commented-upon if-tests, to make them match up for intelligibility's sake. This is arguably a back-patchable bug fix, but given the lack of complaints I think it prudent to let it age awhile in HEAD first. Discussion: https://postgr.es/m/1346227.1649887693@sss.pgh.pa.us
* Rename delayChkpt to delayChkptFlags.Robert Haas2022-04-08
| | | | | | | | | | | | | | | | | | Before commit 412ad7a55639516f284cd0ef9757d6ae5c7abd43, delayChkpt was a Boolean. Now it's an integer. Extensions using it need to be appropriately updated, so let's rename the field to make sure that a hard compilation failure occurs. Replacing delayChkpt with delayChkptFlags made a few comments extend past 80 characters, so I reflowed them and changed some wording very slightly. The back-branches will need a different change to restore compatibility with existing minor releases; this is just for master. Per suggestion from Tom Lane. Discussion: http://postgr.es/m/a7880f4d-1d74-582a-ada7-dad168d046d1@enterprisedb.com
* Apply PGDLLIMPORT markings broadly.Robert Haas2022-04-08
| | | | | | | | | | | Up until now, we've had a policy of only marking certain variables in the PostgreSQL header files with PGDLLIMPORT, but now we've decided to mark them all. This means that extensions running on Windows should no longer operate at a disadvantage as compared to extensions running on Linux: if the variable is present in a header file, it should be accessible. Discussion: http://postgr.es/m/CA+TgmoYanc1_FSfimhgiWSqVyP5KKmh5NP2BWNwDhO8Pg2vGYQ@mail.gmail.com
* Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.Robert Haas2022-03-24
| | | | | | | | | | | | | | | | If TRUNCATE causes some buffers to be invalidated and thus the checkpoint does not flush them, TRUNCATE must also ensure that the corresponding files are truncated on disk. Otherwise, a replay from the checkpoint might find that the buffers exist but have the wrong contents, which may cause replay to fail. Report by Teja Mupparti. Patch by Kyotaro Horiguchi, per a design suggestion from Heikki Linnakangas, with some changes to the comments by me. Review of this and a prior patch that approached the issue differently by Heikki Linnakangas, Andres Freund, Álvaro Herrera, Masahiko Sawada, and Tom Lane. Discussion: http://postgr.es/m/BYAPR06MB6373BF50B469CA393C614257ABF00@BYAPR06MB6373.namprd06.prod.outlook.com
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Change ProcSendSignal() to take pgprocno.Thomas Munro2021-12-16
| | | | | | | | | | | Instead of referring to target backends by pid, use pgprocno. This means that we don't have to scan the ProcArray and we can drop some special case code for dealing with the startup process. Discussion: https://postgr.es/m/CA%2BhUKGLYRyDaneEwz5Uya_OgFLMx5BgJfkQSD%3Dq9HmwsfRRb-w%40mail.gmail.com Reviewed-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com> Reviewed-by: Ashwin Agrawal <ashwinstar@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de>
* Fix parallel operations that prevent oldest xmin from advancing.Amit Kapila2021-11-19
| | | | | | | | | | | | | | | | | While determining xid horizons, we skip over backends that are running Vacuum. We also ignore Create Index Concurrently, or Reindex Concurrently for the purposes of computing Xmin for Vacuum. But we were not setting the flags corresponding to these operations when they are performed in parallel which was preventing Xid horizon from advancing. The optimization related to skipping Create Index Concurrently, or Reindex Concurrently operations was implemented in PG-14 but the fix is the same for the Parallel Vacuum as well so back-patched till PG-13. Author: Masahiko Sawada Reviewed-by: Amit Kapila Backpatch-through: 13 Discussion: https://postgr.es/m/CAD21AoCLQqgM1sXh9BrDFq0uzd3RBFKi=Vfo6cjjKODm0Onr5w@mail.gmail.com
* Initial pgindent and pgperltidy run for v14.Tom Lane2021-05-12
| | | | | | | | Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.
* Make archiver process an auxiliary process.Fujii Masao2021-03-15
| | | | | | | | | | | | | | | | | | | | | | | This commit changes WAL archiver process so that it's treated as an auxiliary process and can use shared memory. This is an infrastructure patch required for upcoming shared-memory based stats collector patch series. These patch series basically need any processes including archiver that can report the statistics to access to shared memory. Since this patch itself is useful to simplify the code and when users monitor the status of archiver, it's committed separately in advance. This commit simplifies the code for WAL archiving. For example, previously backends need to signal to archiver via postmaster when they notify archiver that there are some WAL files to archive. On the other hand, this commit removes that signal to postmaster and enables backends to notify archier directly using shared latch. Also, as the side of this change, the information about archiver process becomes viewable at pg_stat_activity view. Author: Kyotaro Horiguchi Reviewed-by: Andres Freund, Álvaro Herrera, Julien Rouhaud, Tomas Vondra, Arthur Zakirov, Fujii Masao Discussion: https://postgr.es/m/20180629.173418.190173462.horiguchi.kyotaro@lab.ntt.co.jp
* Display the time when the process started waiting for the lock, in pg_locks, ↵Fujii Masao2021-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | take 2 This commit adds new column "waitstart" into pg_locks view. This column reports the time when the server process started waiting for the lock if the lock is not held. This information is useful, for example, when examining the amount of time to wait on a lock by subtracting "waitstart" in pg_locks from the current time, and identify the lock that the processes are waiting for very long. This feature uses the current time obtained for the deadlock timeout timer as "waitstart" (i.e., the time when this process started waiting for the lock). Since getting the current time newly can cause overhead, we reuse the already-obtained time to avoid that overhead. Note that "waitstart" is updated without holding the lock table's partition lock, to avoid the overhead by additional lock acquisition. This can cause "waitstart" in pg_locks to become NULL for a very short period of time after the wait started even though "granted" is false. This is OK in practice because we can assume that users are likely to look at "waitstart" when waiting for the lock for a long time. The first attempt of this patch (commit 3b733fcd04) caused the buildfarm member "rorqual" (built with --disable-atomics --disable-spinlocks) to report the failure of the regression test. It was reverted by commit 890d2182a2. The cause of this failure was that the atomic variable for "waitstart" in the dummy process entry created at the end of prepare transaction was not initialized. This second attempt fixes that issue. Bump catalog version. Author: Atsushi Torikoshi Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
* Revert "Display the time when the process started waiting for the lock, in ↵Fujii Masao2021-02-09
| | | | | | | | pg_locks." This reverts commit 3b733fcd04195399db56f73f0616b4f5c6828e18. Per buildfarm members prion and rorqual.
* Display the time when the process started waiting for the lock, in pg_locks.Fujii Masao2021-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds new column "waitstart" into pg_locks view. This column reports the time when the server process started waiting for the lock if the lock is not held. This information is useful, for example, when examining the amount of time to wait on a lock by subtracting "waitstart" in pg_locks from the current time, and identify the lock that the processes are waiting for very long. This feature uses the current time obtained for the deadlock timeout timer as "waitstart" (i.e., the time when this process started waiting for the lock). Since getting the current time newly can cause overhead, we reuse the already-obtained time to avoid that overhead. Note that "waitstart" is updated without holding the lock table's partition lock, to avoid the overhead by additional lock acquisition. This can cause "waitstart" in pg_locks to become NULL for a very short period of time after the wait started even though "granted" is false. This is OK in practice because we can assume that users are likely to look at "waitstart" when waiting for the lock for a long time. Bump catalog version. Author: Atsushi Torikoshi Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
* Avoid spurious wait in concurrent reindexAlvaro Herrera2021-01-15
| | | | | | | | | | | | | | This is like commit c98763bf51bf, but for REINDEX CONCURRENTLY. To wit: this flags indicates that the current process is safe to ignore for the purposes of waiting for other snapshots, when doing CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY. This helps two processes doing either of those things not deadlock, and also avoids spurious waits. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql
* Add idle_session_timeout.Tom Lane2021-01-06
| | | | | | | | | | | | This GUC variable works much like idle_in_transaction_session_timeout, in that it kills sessions that have waited too long for a new client query. But it applies when we're not in a transaction, rather than when we are. Li Japin, reviewed by David Johnston and Hayato Kuroda, some fixes by me Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com