aboutsummaryrefslogtreecommitdiff
path: root/src/backend/postmaster
Commit message (Collapse)AuthorAge
* Fix typo in AutoVacLauncherMain().Masahiko Sawada28 hours
| | | | | Author: Yugo Nagata <nagata@sraoss.co.jp> Discussion: https://postgr.es/m/20250802002027.cd35c481f6c6bae7ca2a3e26@sraoss.co.jp
* Process sync requests incrementally in AbsorbSyncRequestsAlexander Korotkov6 days
| | | | | | | | | | | | | | | | | | | | | | | | | | | If the number of sync requests is big enough, the palloc() call in AbsorbSyncRequests() will attempt to allocate more than 1 GB of memory, resulting in failure. This can lead to an infinite loop in the checkpointer process, as it repeatedly fails to absorb the pending requests. This commit introduces the following changes to cope with this problem: 1. Turn pending checkpointer requests array in shared memory into a bounded ring buffer. 2. Limit maximum ring buffer size to 10M items. 3. Make AbsorbSyncRequests() process requests incrementally in 10K batches. Even #2 makes the whole queue size fit the maximum palloc() size of 1 GB. of continuous lock holding. This commit is for master only. Simpler fix, which just limits a request queue size to 10M, will be backpatched. Reported-by: Ekaterina Sokolova <e.sokolova@postgrespro.ru> Discussion: https://postgr.es/m/db4534f83a22a29ab5ee2566ad86ca92%40postgrespro.ru Author: Maxim Orlov <orlovmg@gmail.com> Co-authored-by: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
* Fix background worker not restarting after crash-and-restart cycle.Fujii Masao9 days
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if a background worker crashed (e.g., due to a SIGKILL) and the server restarted due to restart_after_crash being enabled, the worker was not restarted as expected. Background workers without the never-restart flag should automatically restart in this case. This issue was introduced in commit 28a520c0b77, which failed to reset the rw_pid field in the RegisteredBgWorker struct for the crashed worker. This commit fixes the problem by resetting rw_pid for all eligible background workers during the crash-and-restart cycle. Back-patched to v18, where the bug was introduced. Bug fix patches were proposed by Andrey Rudometov and ChangAo Chen, but this commit uses a different approach. Reported-by: Andrey Rudometov <unlimitedhikari@gmail.com> Reported-by: ChangAo Chen <cca5507@qq.com> Author: Andrey Rudometov <unlimitedhikari@gmail.com> Author: ChangAo Chen <cca5507@qq.com> Co-authored-by: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: ChangAo Chen <cca5507@qq.com> Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Discussion: https://postgr.es/m/CAF6JsWiO=i24qYitWe6ns1sXqcL86rYxdyU+pNYk-WueKPSySg@mail.gmail.com Discussion: https://postgr.es/m/tencent_E00A056B3953EE6440F0F40F80EC30427D09@qq.com Backpatch-through: 18
* aio: Remove obsolete IO worker ID references.Thomas Munro2025-07-12
| | | | | | | | | | | | | In an ancient ancestor of this code, the postmaster assigned IDs to IO workers. Now it tracks them in an unordered array and doesn't know their IDs, so it might be confusing to readers that it still referred to their indexes as IDs. No change in behavior, just variable name and error message cleanup. Back-patch to 18. Discussion: https://postgr.es/m/CA%2BhUKG%2BwbaZZ9Nwc_bTopm4f-7vDmCwLk80uKDHj9mq%2BUp0E%2Bg%40mail.gmail.com
* Add FLUSH_UNLOGGED option to CHECKPOINT command.Nathan Bossart2025-07-11
| | | | | | | | | | | | | | | This option, which is disabled by default, can be used to request the checkpoint also flush dirty buffers of unlogged relations. As with the MODE option, the server may consolidate the options for concurrently requested checkpoints. For example, if one session uses (FLUSH_UNLOGGED FALSE) and another uses (FLUSH_UNLOGGED TRUE), the server may perform one checkpoint with FLUSH_UNLOGGED enabled. Author: Christoph Berg <myon@debian.org> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
* Add MODE option to CHECKPOINT command.Nathan Bossart2025-07-11
| | | | | | | | | | | | | | | | | | This option may be set to FAST (the default) to request the checkpoint be completed as fast as possible, or SPREAD to request the checkpoint be spread over a longer interval (based on the checkpoint-related configuration parameters). Note that the server may consolidate the options for concurrently requested checkpoints. For example, if one session requests a "fast" checkpoint and another requests a "spread" checkpoint, the server may perform one "fast" checkpoint. Author: Christoph Berg <myon@debian.org> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
* Add option list to CHECKPOINT command.Nathan Bossart2025-07-11
| | | | | | | | | | | | | This commit adds the boilerplate code for supporting a list of options in CHECKPOINT commands. No actual options are supported yet, but follow-up commits will add support for MODE and FLUSH_UNLOGGED. While at it, this commit refactors the code for executing CHECKPOINT commands to its own function since it's about to become significantly larger. Author: Christoph Berg <myon@debian.org> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
* Rename CHECKPOINT_IMMEDIATE to CHECKPOINT_FAST.Nathan Bossart2025-07-11
| | | | | | | | | | | | | The new name more accurately reflects the effects of this flag on a requested checkpoint. Checkpoint-related log messages (i.e., those controlled by the log_checkpoints configuration parameter) will now say "fast" instead of "immediate", too. Likewise, references to "immediate" checkpoints in the documentation have been updated to say "fast". This is preparatory work for a follow-up commit that will add a MODE option to the CHECKPOINT command. Author: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/aDnaKTEf-0dLiEfz%40msg.df7cb.de
* Standardize LSN formatting by zero paddingÁlvaro Herrera2025-07-07
| | | | | | | | | | | | | | | | This commit standardizes the output format for LSNs to ensure consistent representation across various tools and messages. Previously, LSNs were inconsistently printed as `%X/%X` in some contexts, while others used zero-padding. This often led to confusion when comparing. To address this, the LSN format is now uniformly set to `%X/%08X`, ensuring the lower 32-bit part is always zero-padded to eight hexadecimal digits. Author: Japin Li <japinli@hotmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/ME0P300MB0445CA53CA0E4B8C1879AF84B641A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
* Make more use of binaryheap_empty() and binaryheap_size().Nathan Bossart2025-07-01
| | | | | | | | A few places were accessing bh_size directly instead of via these handy macros. Author: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://postgr.es/m/CAJ7c6TPQMVL%2B028T4zuw9ZqL5Du9JavOLhBQLkJeK0RznYx_6w%40mail.gmail.com
* Rationalize handling of VacuumParamsMichael Paquier2025-06-30
| | | | | | | | | | | | | | | | | | | | This commit refactors the vacuum routines that rely on VacuumParams, adding const markers where necessary to force a new policy in the code. This structure should not use a pointer as it may be used across multiple relations, and its contents should never be updated. vacuum_rel() stands as an exception as it touches the "index_cleanup" and "truncate" options. VacuumParams has been introduced in 0d831389749a, and 661643dedad9 has fixed a bug impacting VACUUM operating on multiple relations. The changes done in tableam.h break ABI compatibility, so this commit can only happen on HEAD. Author: Shihao Zhong <zhong950419@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAGRkXqTo+aK=GTy5pSc-9cy8H2F2TJvcrZ-zXEiNJj93np1UUw@mail.gmail.com
* Ensure we have a snapshot when updating various system catalogs.Nathan Bossart2025-05-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A few places that access system catalogs don't set up an active snapshot before potentially accessing their TOAST tables. To fix, push an active snapshot just before each section of code that might require accessing one of these TOAST tables, and pop it shortly afterwards. While at it, this commit adds some rather strict assertions in an attempt to prevent such issues in the future. Commit 16bf24e0e4 recently removed pg_replication_origin's TOAST table in order to fix the same problem for that catalog. On the back-branches, those bugs are left in place. We cannot easily remove a catalog's TOAST table on released major versions, and only replication origins with extremely long names are affected. Given the low severity of the issue, fixing older versions doesn't seem worth the trouble of significantly modifying the patch. Also, on v13 and v14, the aforementioned strict assertions have been omitted because commit 2776922201, which added HaveRegisteredOrActiveSnapshot(), was not back-patched. While we could probably back-patch it now, I've opted against it because it seems unlikely that new TOAST snapshot issues will be introduced in the oldest supported versions. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/18127-fe54b6a667f29658%40postgresql.org Discussion: https://postgr.es/m/18309-c0bf914950c46692%40postgresql.org Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan Backpatch-through: 13
* Fix per-relation memory leakage in autovacuum.Tom Lane2025-05-23
| | | | | | | | | | | | | | | | | | | | | | | | PgStat_StatTabEntry and AutoVacOpts structs were leaked until the end of the autovacuum worker's run, which is bad news if there are a lot of relations in the database. Note: pfree'ing the PgStat_StatTabEntry structs here seems a bit risky, because pgstat_fetch_stat_tabentry_ext does not guarantee anything about whether its result is long-lived. It appears okay so long as autovacuum forces PGSTAT_FETCH_CONSISTENCY_NONE, but I think that API could use a re-think. Also ensure that the VacuumRelation structure passed to vacuum() is in recoverable storage. Back-patch to v15 where we started to manage table statistics this way. (The AutoVacOpts leakage is probably older, but I'm not excited enough to worry about just that part.) Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us Backpatch-through: 15
* Revert function to get memory context stats for processesDaniel Gustafsson2025-05-23
| | | | | | | | | Due to concerns raised about the approach, and memory leaks found in sensitive contexts the functionality is reverted. This reverts commits 45e7e8ca9, f8c115a6c, d2a1ed172, 55ef7abf8 and 042a66291 for v18 with an intent to revisit this patch for v19. Discussion: https://postgr.es/m/594293.1747708165@sss.pgh.pa.us
* Add support for runtime arguments in injection pointsMichael Paquier2025-05-10
| | | | | | | | | | | | | | | | | | | The macros INJECTION_POINT() and INJECTION_POINT_CACHED() are extended with an optional argument that can be passed down to the callback attached when an injection point is run, giving to callbacks the possibility to manipulate a stack state given by the caller. The existing callbacks in modules injection_points and test_aio have their declarations adjusted based on that. da7226993fd4 (core AIO infrastructure) and 93bc3d75d8e1 (test_aio) and been relying on a set of workarounds where a static variable called pgaio_inj_cur_handle is used as runtime argument in the injection point callbacks used by the AIO tests, in combination with a TRY/CATCH block to reset the argument value. The infrastructure introduced in this commit will be reused for the AIO tests, simplifying them. Reviewed-by: Greg Burd <greg@burd.me> Discussion: https://postgr.es/m/Z_y9TtnXubvYAApS@paquier.xyz
* Fix typos and grammar in the codeMichael Paquier2025-04-19
| | | | | | | | The large majority of these have been introduced by recent commits done in the v18 development cycle. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/9a7763ab-5252-429d-a943-b28941e0e28b@gmail.com
* Fix a few oversights in the longer cancel keys patchHeikki Linnakangas2025-04-09
| | | | | | | | | | | | Change MyCancelKeyLength's type from uint8 to int. While it always fits in a uint8, plain int is less surprising, as there's no particular reason for it to be uint8. Fix one ProcSignalInit caller that passed 'false' instead of NULL for the pointer argument. Author: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/61be9e31-7b7d-49d5-bc11-721800d89d64@eisentraut.org
* Add function to get memory context stats for processesDaniel Gustafsson2025-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a function for retrieving memory context statistics and information from backends as well as auxiliary processes. The intended usecase is cluster debugging when under memory pressure or unanticipated memory usage characteristics. When calling the function it sends a signal to the specified process to submit statistics regarding its memory contexts into dynamic shared memory. Each memory context is returned in detail, followed by a cumulative total in case the number of contexts exceed the max allocated amount of shared memory. Each process is limited to use at most 1Mb memory for this. A summary can also be explicitly requested by the user, this will return the TopMemoryContext and a cumulative total of all lower contexts. In order to not block on busy processes the caller specifies the number of seconds during which to retry before timing out. In the case where no statistics are published within the set timeout, the last known statistics are returned, or NULL if no previously published statistics exist. This allows dash- board type queries to continually publish even if the target process is temporarily congested. Context records contain a timestamp to indicate when they were submitted. Author: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Discussion: https://postgr.es/m/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw@mail.gmail.com
* Use XLOG_CONTROL_FILE macro consistently for control file name.Fujii Masao2025-04-07
| | | | | | | | | | | | | | | | | The XLOG_CONTROL_FILE macro (defined in access/xlog_internal.h) represents the control file name. While some parts of the codebase already use this macro, others previously hardcoded the file name as a string. This commit replaces those hardcoded strings with the macro, ensuring consistent usage throughout the code. This makes future maintenance easier and improves searchability, for example when grepping for control file usage. Author: Anton A. Melnikov <a.melnikov@postgrespro.ru> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Masao Fujii <masao.fujii@gmail.com> Discussion: https://postgr.es/m/0841ec77-47e5-452a-adb4-c6fa55d605fc@postgrespro.ru
* Improve error message when standby does accept connections.Fujii Masao2025-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even after reaching the minimum recovery point, if there are long-lived write transactions with 64 subtransactions on the primary, the recovery snapshot may not yet be ready for hot standby, delaying read-only connections on the standby. Previously, when read-only connections were not accepted due to this condition, the following error message was logged: FATAL: the database system is not yet accepting connections DETAIL: Consistent recovery state has not been yet reached. This DETAIL message was misleading because the following message was already logged in this case: LOG: consistent recovery state reached This contradiction, i.e., indicating that the recovery state was consistent while also stating it wasn’t, caused confusion. This commit improves the error message to better reflect the actual state: FATAL: the database system is not yet accepting connections DETAIL: Recovery snapshot is not yet ready for hot standby. HINT: To enable hot standby, close write transactions with more than 64 subtransactions on the primary server. To implement this, the commit introduces a new postmaster signal, PMSIGNAL_RECOVERY_CONSISTENT. When the startup process reaches a consistent recovery state, it sends this signal to the postmaster, allowing it to correctly recognize that state. Since this is not a clear bug, the change is applied only to the master branch and is not back-patched. Author: Atsushi Torikoshi <torikoshia@oss.nttdata.com> Co-authored-by: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp> Discussion: https://postgr.es/m/02db8cd8e1f527a8b999b94a4bee3165@oss.nttdata.com
* aio, bufmgr: Comment fixes/improvementsAndres Freund2025-03-29
| | | | | | | | | | | Some of these comments have been wrong for a while (12f3867f5534), some I recently introduced (da7226993fd, 55b454d0e14). This includes an update to a comment in FlushBuffer(), which will be copied in a future commit. These changes seem big enough to be worth doing in separate commits. Suggested-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20250319212530.80.nmisch@google.com
* Remove the query_id_squash_values GUCÁlvaro Herrera2025-03-27
| | | | | | | | | | | | | Commit 62d712ecfd94 introduced the capability to calculate the same queryId for queries with different lengths of constants in a list for an IN clause. This behavior was originally enabled with a GUC query_id_squash_values. After a discussion about the value of such a GUC, it was decided to back out of the use of a GUC and make the squashing behavior the only available option. Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/Z-LZyygkkNyA8-kR@msg.df7cb.de Discussion: https://postgr.es/m/CA+q6zcVTK-3C-8NWV1oY2NZrvtnMCDqnyYYyk1T7WMUG65MeOQ@mail.gmail.com
* Introduce squashing of constant lists in query jumblingÁlvaro Herrera2025-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pg_stat_statements produces multiple entries for queries like SELECT something FROM table WHERE col IN (1, 2, 3, ...) depending on the number of parameters, because every element of ArrayExpr is individually jumbled. Most of the time that's undesirable, especially if the list becomes too large. Fix this by introducing a new GUC query_id_squash_values which modifies the node jumbling code to only consider the first and last element of a list of constants, rather than each list element individually. This affects both the query_id generated by query jumbling, as well as pg_stat_statements query normalization so that it suppresses printing of the individual elements of such a list. The default value is off, meaning the previous behavior is maintained. Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Sergey Dudoladov (mysterious, off-list) Reviewed-by: David Geier <geidav.pg@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: Sutou Kouhei <kou@clear-code.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Marcos Pegoraro <marcos@f10.com.br> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Tested-by: Yasuo Honda <yasuo.honda@gmail.com> Tested-by: Sergei Kornilov <sk@zsrv.org> Tested-by: Maciek Sakrejda <m.sakrejda@gmail.com> Tested-by: Chengxi Sun <sunchengxi@highgo.com> Tested-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Discussion: https://postgr.es/m/CA+q6zcWtUbT_Sxj0V6HY6EZ89uv5wuG5aefpe_9n0Jr3VwntFg@mail.gmail.com
* aio: Infrastructure for io_method=workerAndres Freund2025-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit contains the basic, system-wide, infrastructure for io_method=worker. It does not yet actually execute IO, this commit just provides the infrastructure for running IO workers, kept separate for easier review. The number of IO workers can be adjusted with a PGC_SIGHUP GUC. Eventually we'd like to make the number of workers dynamically scale up/down based on the current "IO load". To allow the number of IO workers to be increased without a restart, we need to reserve PGPROC entries for the workers unconditionally. This has been judged to be worth the cost. If it turns out to be problematic, we can introduce a PGC_POSTMASTER GUC to control the maximum number. As io workers might be needed during shutdown, e.g. for AIO during the shutdown checkpoint, a new PMState phase is added. IO workers are shut down after the shutdown checkpoint has been performed and walsender/archiver have shut down, but before the checkpointer itself shuts down. See also 87a6690cc69. Updates PGSTAT_FILE_FORMAT_ID due to the addition of a new BackendType. Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Co-authored-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
* aio: Basic subsystem initializationAndres Freund2025-03-17
| | | | | | | | | | | | | | | | This commit just does the minimal wiring up of the AIO subsystem, added in the next commit, to the rest of the system. The next commit contains more details about motivation and architecture. This commit is kept separate to make it easier to review, separating the changes across the tree, from the implementation of the new subsystem. We discussed squashing this commit with the main commit before merging AIO, but there has been a mild preference for keeping it separate. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
* pg_noreturn to replace pg_attribute_noreturn()Peter Eisentraut2025-03-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to support a "noreturn" decoration on more compilers besides just GCC-compatible ones, but for that we need to move the decoration in front of the function declaration instead of either behind it or wherever, which is the current style afforded by GCC-style attributes. Also rename the macro to "pg_noreturn" to be similar to the C11 standard "noreturn". pg_noreturn is now supported on all compilers that support C11 (using _Noreturn), as well as GCC-compatible ones (using __attribute__, as before), as well as MSVC (using __declspec). (When PostgreSQL requires C11, the latter two variants can be dropped.) Now, all supported compilers effectively support pg_noreturn, so the extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped. This also fixes a possible problem if third-party code includes stdnoreturn.h, because then the current definition of #define pg_attribute_noreturn() __attribute__((noreturn)) would cause an error. Note that the C standard does not support a noreturn attribute on function pointer types. So we have to drop these here. There are only two instances at this time, so it's not a big loss. In one case, we can make up for it by adding the pg_noreturn to a wrapper function and adding a pg_unreachable(), in the other case, the latter was already done before. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
* Add connection establishment duration loggingMelanie Plageman2025-03-12
| | | | | | | | | | | | | | | | | | | | | | | | | Add log_connections option 'setup_durations' which logs durations of several key parts of connection establishment and backend setup. For an incoming connection, starting from when the postmaster gets a socket from accept() and ending when the forked child backend is first ready for query, there are multiple steps that could each take longer than expected due to external factors. This logging provides visibility into authentication and fork duration as well as the end-to-end connection establishment and backend initialization time. To make this portable, the timings captured in the postmaster (socket creation time, fork initiation time) are passed through the BackendStartupData. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Guillaume Lelarge <guillaume.lelarge@dalibo.com> Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
* Modularize log_connections outputMelanie Plageman2025-03-12
| | | | | | | | | | | | | | | | | | | | | | | Convert the boolean log_connections GUC into a list GUC comprised of the connection aspects to log. This gives users more control over the volume and kind of connection logging. The current log_connections options are 'receipt', 'authentication', and 'authorization'. The empty string disables all connection logging. 'all' enables all available connection logging. For backwards compatibility, the most common values for the log_connections boolean are still supported (on, off, 1, 0, true, false, yes, no). Note that previously supported substrings of on, off, true, false, yes, and no are no longer supported. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
* Split WaitEventSet functions to separate source fileHeikki Linnakangas2025-03-06
| | | | | | | | | latch.c now only contains the Latch related functions, which build on the WaitEventSet abstraction. Most of the platform-dependent stuff is now in waiteventset.c. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/8a507fb6-df28-49d3-81a5-ede180d7f0fb@iki.fi
* Rename some signal and interrupt handling functions for consistencyHeikki Linnakangas2025-03-05
| | | | | | | | | | | | | | | | | | | | | | | | The usual pattern for handling a signal is that the signal handler sets a flag and calls SetLatch(MyLatch), and CHECK_FOR_INTERRUPTS() or other code that is part of a wait loop calls another function to deal with it. The naming of the functions involved was a bit inconsistent, however. CHECK_FOR_INTERRUPTS() calls ProcessInterrupts() to do the heavy-lifting, but the analogous functions in aux processes were called HandleMainLoopInterrupts(), HandleStartupProcInterrupts(), etc. Similarly, most subroutines of ProcessInterrupts() were called Process*(), but some were called Handle*(). To make things less confusing, rename all the functions that are part of the overall signal/interrupt handling system but are not executed in a signal handler to e.g. ProcessSomething(), rather than HandleSomething(). The "Process" prefix is now consistently used in the non-signal-handler functions, and the "Handle" prefix in functions that are part of signal handlers, except for some completely unrelated functions that clearly have nothing to do with signal or interrupt handling. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/8a384b26-1499-41f6-be33-64b801fb98b8@iki.fi
* Fix some gaps in pg_stat_io with WAL receiver and WAL summarizerMichael Paquier2025-03-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The WAL receiver and WAL summarizer processes gain each one a call to pgstat_report_wal(), to make sure that they report their WAL statistics to pgstats, gathering data for pg_stat_io. In the WAL receiver, the stats reports are timed with status updates sent to the primary, that depend on wal_receiver_status_interval and wal_receiver_timeout. This is a conservative choice, but perhaps we could be more aggressive with the frequency of the stats reports. An interesting historical fact is that the WAL receiver does writes and syncs of WAL, but it has never reported its statistics to pgstats in pg_stat_wal. In the WAL summarizer, the stats reports are done each time the process waits for WAL. While on it, pg_stat_io is adjusted so as these two processes do not report any rows when IOObject is not WAL, making the view easier to use with less rows. Two tests are added in TAP, checking statistics for the WAL summarizer and the WAL receiver. Status updates in the WAL receiver are currently possible in the recovery test 001_stream_rep.pl. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/Z8UKZyVSHUUQJHNb@paquier.xyz
* Split pgstat_bestart() into three different routinesMichael Paquier2025-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pgstat_bestart(), used post-authentication to set up a backend entry in the PgBackendStatus array, so as its data becomes visible in pg_stat_activity and related catalogs, has its logic divided into three routines with this commit, called in order at different steps of the backend initialization: * pgstat_bestart_initial() sets up the backend entry with a minimal amount of information, reporting it with a new BackendState called STATE_STARTING while waiting for backend initialization and client authentication to complete. The main benefit that this offers is observability, so as it is possible to monitor the backend activity during authentication. This step happens earlier than in the logic prior to this commit. pgstat_beinit() happens earlier as well, before authentication. * pgstat_bestart_security() reports the SSL/GSS status of the connection, once authentication completes. Auxiliary processes, for example, do not need to call this step, hence it is optional. This step is called after performing authentication, same as previously. * pgstat_bestart_final() reports the user and database IDs, takes the entry out of STATE_STARTING, and reports its application_name. This is called as the last step of the three, once authentication completes. An injection point is added, with a test checking that the "starting" phase of a backend entry is visible in pg_stat_activity. Some follow-up patches are planned to take advantage of this refactoring with more information provided in backend entries during authentication (LDAP hanging was a problem for the author, initially). Author: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAOYmi+=60deN20WDyCoHCiecgivJxr=98s7s7-C8SkXwrCfHXg@mail.gmail.com
* Trigger more frequent autovacuums with relallfrozenMelanie Plageman2025-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | Calculate the insert threshold for triggering an autovacuum of a relation based on the number of unfrozen pages. By only considering the unfrozen portion of the table when calculating how many tuples to add to the insert threshold, we can trigger more frequent vacuums of insert-heavy tables. This increases the chances of vacuuming those pages when they still reside in shared buffers This also increases the number of autovacuums triggered by tuples inserted and not by wraparound risk. We prefer to freeze these pages during insert-triggered autovacuums, as anti-wraparound vacuums are not automatically canceled by conflicting lock requests. We calculate the unfrozen percentage of the table using the recently added (99f8f3fbbc8f) relallfrozen column of pg_class. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_aj-P7YyBz_cPNwztz6ohP%2BvWis%3Diz3YcomkB3NpYA--w%40mail.gmail.com
* backend launchers void * arguments for binary dataPeter Eisentraut2025-02-21
| | | | | | | | Change backend launcher functions to take void * for binary data instead of char *. This removes the need for numerous casts. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
* Eagerly scan all-visible pages to amortize aggressive vacuumMelanie Plageman2025-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Aggressive vacuums must scan every unfrozen tuple in order to advance the relfrozenxid/relminmxid. Because data is often vacuumed before it is old enough to require freezing, relations may build up a large backlog of pages that are set all-visible but not all-frozen in the visibility map. When an aggressive vacuum is triggered, all of these pages must be scanned. These pages have often been evicted from shared buffers and even from the kernel buffer cache. Thus, aggressive vacuums often incur large amounts of extra I/O at the expense of foreground workloads. To amortize the cost of aggressive vacuums, eagerly scan some all-visible but not all-frozen pages during normal vacuums. All-visible pages that are eagerly scanned and set all-frozen in the visibility map are counted as successful eager freezes and those not frozen are counted as failed eager freezes. If too many eager scans fail in a row, eager scanning is temporarily suspended until a later portion of the relation. The number of failures tolerated is configurable globally and per table. To effectively amortize aggressive vacuums, we cap the number of successes as well. Capping eager freeze successes also limits the amount of potentially wasted work if these pages are modified again before the next aggressive vacuum. Once we reach the maximum number of blocks successfully eager frozen, eager scanning is disabled for the remainder of the vacuum of the relation. Original design idea from Robert Haas, with enhancements from Andres Freund, Tomas Vondra, and me Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com
* Introduce autovacuum_vacuum_max_threshold.Nathan Bossart2025-02-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One way autovacuum chooses tables to vacuum is by comparing the number of updated or deleted tuples with a value calculated using autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor. The threshold specifies the base value for comparison, and the scale factor specifies the fraction of the table size to add to it. This strategy ensures that smaller tables are vacuumed after fewer updates/deletes than larger tables, which is reasonable in many cases but can result in infrequent vacuums on very large tables. This is undesirable for a couple of reasons, such as very large tables incurring a huge amount of bloat between vacuums. This new parameter provides a way to set a limit on the value calculated with autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor so that very large tables are vacuumed more frequently. By default, it is set to 100,000,000 tuples, but it can be disabled by setting it to -1. It can also be adjusted for individual tables by changing storage parameters. Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Frédéric Yhuel <frederic.yhuel@dalibo.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Michael Banck <mbanck@gmx.net> Reviewed-by: Joe Conway <mail@joeconway.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Reviewed-by: Vinícius Abrahão <vinnix.bsd@gmail.com> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru> Discussion: https://postgr.es/m/956435f8-3b2f-47a6-8756-8c54ded61802%40dalibo.com
* Remove obsolete restriction on the range of log_rotation_size.Tom Lane2025-01-31
| | | | | | | | | | | | | | | | | When syslogger.c was first written, we didn't want to assume that all platforms have 64-bit ftello. But we've been assuming that since v13 (cf commit 799d22461), so let's use that in syslogger.c and allow log_rotation_size to range up to INT_MAX kilobytes. The old code effectively limited log_rotation_size to 2GB regardless of platform. While nobody's complained, that doesn't seem too far away from what might be thought reasonable these days. I noticed this while searching for instances of "1024L" in connection with commit 041e8b95b. These were the last such instances. (We still have instances of L-suffixed literals, but most of them are associated with wait intervals for pg_usleep or similar functions. I don't see any urgent reason to change that.)
* Change shutdown sequence to terminate checkpointer lastAndres Freund2025-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | The main motivation for this change is to have a process that can serialize stats after all other processes have terminated. Serializing stats already happens in checkpointer, even though walsenders can be active longer. The only reason the current shutdown sequence does not actively cause problems is that walsender currently does not generate any stats. However, there is an upcoming patch changing that. Another need for this change originates in the AIO patchset, where IO workers (which, in some edge cases, can emit stats of their own) need to run while the shutdown checkpoint is being written. This commit changes the shutdown sequence so checkpointer is signalled (via SIGINT) to trigger writing the shutdown checkpoint without also causing checkpointer to exit. Once checkpointer wrote the shutdown checkpoint it notifies postmaster via PMSIGNAL_XLOG_IS_SHUTDOWN and waits for the termination signal (SIGUSR2, as before). Checkpointer now is terminated after all children, other than dead-end children and logger, have been terminated, tracked using the new PM_WAIT_CHECKPOINTER PMState. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
* postmaster: Adjust which processes we expect to have exitedAndres Freund2025-01-24
| | | | | | | | | | | | | | Comments and code stated that we expect checkpointer to have been signalled in case of immediate shutdown / fatal errors, but didn't treat archiver and walsenders the same. That doesn't seem right. I had started digging through the history to see where this oddity was introduced, but it's not the fault of a single commit. Instead treat archiver, checkpointer, and walsenders the same. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
* postmaster: Commonalize FatalError pathsAndres Freund2025-01-24
| | | | | | | | | | | | | | | | | | | | | This includes some behavioral changes: - Previously PM_WAIT_XLOG_ARCHIVAL wasn't handled in HandleFatalError(), that doesn't seem quite right. - Previously a fatal error in PM_WAIT_XLOG_SHUTDOWN lead to jumping back to PM_WAIT_BACKENDS, no we go to PM_WAIT_DEAD_END. Jumping backwards doesn't seem quite right and we didn't do so when checkpointer failed to fork during a shutdown. - Previously a checkpointer fork failure didn't call SetQuitSignalReason(), which would lead to quickdie() reporting "terminating connection because of unexpected SIGQUIT signal" which seems even worse than the PMQUIT_FOR_CRASH message. If I saw that in the log I'd suspect somebody outside of postgres sent SIGQUITs Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
* postmaster: Move code to switch into FatalError state into functionAndres Freund2025-01-24
| | | | | | | | | | | | | There are two places switching to FatalError mode, behaving somewhat differently. An upcoming commit will introduce a third. That doesn't seem seem like a good idea. This commit just moves the FatalError related code from HandleChildCrash() into its own function, a subsequent commit will evolve the state machine change to be suitable for other callers. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
* postmaster: Don't repeatedly transition to crashing stateAndres Freund2025-01-24
| | | | | | | | | | | | | | | Previously HandleChildCrash() skipped logging and signalling child exits if already in an immediate shutdown or in FatalError state, but still transitioned server state in response to a crash. That's redundant. In the other place we transition to FatalError, we do take care to not do so when already in FatalError state. To make it easier to combine different paths for entering FatalError state, only do so once in HandleChildCrash(). Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
* postmaster: Don't open-code TerminateChildren() in HandleChildCrash()Andres Freund2025-01-24
| | | | | | | | After removing the duplication no user of sigquit_child() remains, therefore remove it. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
* checkpointer: Request checkpoint via latch instead of signalAndres Freund2025-01-24
| | | | | | | | | | | | | The motivation for this change is that a future commit will use SIGINT for another purpose (postmaster requesting WAL access to be shut down) and that there no other signals that we could readily use (see code comment for the reason why SIGTERM shouldn't be used). But it's also a tad nicer / more efficient to use SetLatch(), as it avoids sending signals when checkpointer already is busy. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
* Don't ask for bug reports about pthread_is_threaded_np() != 0.Tom Lane2025-01-23
| | | | | | | | | | | | | | | | | | We thought that this condition was unreachable in ExitPostmaster, but actually it's possible if you have both a misconfigured locale setting and some other mistake that causes PostmasterMain to bail out before reaching its own check of pthread_is_threaded_np(). Given the lack of other reports, let's not ask for bug reports if this occurs; instead just give the same hint as in PostmasterMain. Bug: #18783 Reported-by: anani191181515@gmail.com Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/18783-d1873b95a59b9103@postgresql.org Discussion: https://postgr.es/m/206317.1737656533@sss.pgh.pa.us Backpatch-through: 13
* postmaster: Rename some shutdown related PMState phase namesAndres Freund2025-01-10
| | | | | | | | | | The previous names weren't particularly clear. Future patches will add more shutdown phases, making it even more important to have understandable shutdown phases. Suggested-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/d2cd8fd3-396a-4390-8f0b-74be65e72899@iki.fi
* postmaster: Make btmask_add() variadicAndres Freund2025-01-10
| | | | | Suggested-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/d2cd8fd3-396a-4390-8f0b-74be65e72899@iki.fi
* postmaster: Introduce variadic btmask_all_except()Andres Freund2025-01-10
| | | | | | | Upcoming patches would otherwise need btmask_all_except3(). Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/w3z6w3g4aovivs735nk4pzjhmegntecesm3kktpebchegm5o53@aonnq2kn27xi
* postmaster: Improve logging of signals sent by postmasterAndres Freund2025-01-10
| | | | | | | | | | | | | | | | Previously many, in some cases important, signals we never logged. In other cases the signal name was only included numerically. As part of this, change the debug log level the signal is logged at to DEBUG3, previously some where DEBUG2, some DEBUG4. Also move from direct use of kill() to signal the av launcher to signal_child(). There doesn't seem to be a reason for directly using kill(). Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
* postmaster: Update pmState via a wrapper functionAndres Freund2025-01-10
| | | | | | | | | | | This makes logging of state changes easier - state transitions are now logged at DEBUG1. Without that logging it was surprisingly hard to understand the current state of the system while debugging. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp