aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Proper object locking for GRANT/REVOKEPeter Eisentraut2024-11-15
| | | | | | | | | | | | | | Refactor objectNamesToOids() to use get_object_address() internally if possible. Not only does this save a lot of code, it also allows us to use the object locking provided by get_object_address() for GRANT/REVOKE. There was previously a code comment that complained about the lack of locking in objectNamesToOids(), which is now fixed. The check in ExecGrant_Type_check() is obsolete because get_object_address_type() already does the same check. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/bf72b82c-124d-4efa-a484-bb928e9494e4@eisentraut.org
* jit: Stop emitting some unnecessary instructionsHeikki Linnakangas2024-11-15
| | | | | | | | | | | | | | | | | | | In EEOP_BOOL_AND_STEP* and EEOP_BOOL_OR_STEP*, we emitted pointlesss store instructions to store to resnull/resvalue values that were just loaded from the same fields in the previous instructions. They will surely get optimized away by LLVM if any optimizations are enabled, but it's better to not emit them in the first place. In EEOP_BOOL_NOT_STEP, similar story with resnull. In EEOP_NULLIF, when it returns NULL, there was also a redundant store to resvalue just after storing a 0 to it. The value of resvalue doesn't matter when resnull is set, so in fact even storing the 0 is unnecessary, but I kept that because we tend to do that for general tidiness. Author: Xing Guo <higuoxing@gmail.com> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/CACpMh%2BC%3Dg13WdvzLRSponsVWGgxwDSMzQWM4Gz0heOyaA0-N6g@mail.gmail.com
* Add an assertion in get_object_address()Peter Eisentraut2024-11-15
| | | | | | | | | | | Some places declared a Relation before calling get_object_address() only to assert that the relation is NULL after the call. The new assertion allows passing NULL as the relation argument at those places making the code cleaner and easier to understand. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/ZzG34eNrT83W/Orz@ip-10-97-1-34.eu-west-3.compute.internal
* Fix race conditions with drop of reused pgstats entriesMichael Paquier2024-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a set of race conditions with cumulative statistics where a shared stats entry could be dropped while it should still be valid in the event when it is reused: an entry may refer to a different object but requires the same hash key. This can happen with various stats kinds, like: - Replication slots that compute internally an index number, for different slot names. - Stats kinds that use an OID in the object key, where a wraparound causes the same key to be used if an OID is used for the same object. - As of PostgreSQL 18, custom pgstats kinds could also be an issue, depending on their implementation. This issue is fixed by introducing a counter called "generation" in the shared entries via PgStatShared_HashEntry, initialized at 0 when an entry is created and incremented when the same entry is reused, to avoid concurrent issues on drop because of other backends still holding a reference to it. This "generation" is copied to the local copy that a backend holds when looking at an object, then cross-checked with the shared entry to make sure that the entry is not dropped even if its "refcount" justifies that if it has been reused. This problem could show up when a backend shuts down and needs to discard any entries it still holds, causing statistics to be removed when they should not, or even an assertion failure. Another report involved a failure in a standby after an OID wraparound, where the startup process would FATAL on a "can only drop stats once", stopping recovery abruptly. The buildfarm has been sporadically complaining about the problem, as well, but the window is hard to reach with the in-core tests. Note that the issue can be reproduced easily by adding a sleep before dshash_find() in pgstat_release_entry_ref() to enlarge the problematic window while repeating test_decoding's isolation test oldest_xmin a couple of times, for example, as pointed out by Alexander Lakhin. Reported-by: Alexander Lakhin, Peter Smith Author: Kyotaro Horiguchi, Michael Paquier Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/CAA4eK1KxuMVyAryz_Vk5yq3ejgKYcL6F45Hj9ZnMNBS-g+PuZg@mail.gmail.com Discussion: https://postgr.es/m/17947-b9554521ad963c9c@postgresql.org Backpatch-through: 15
* Pass MyPMChildSlot as an explicit argument to child processHeikki Linnakangas2024-11-14
| | | | | | | | | All the other global variables passed from postmaster to child have the same value in all the processes, while MyPMChildSlot is more like a parameter to each child process. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
* Assign a child slot to every postmaster child processHeikki Linnakangas2024-11-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, only backends, autovacuum workers, and background workers had an entry in the PMChildFlags array. With this commit, all postmaster child processes, including all the aux processes, have an entry. Dead-end backends still don't get an entry, though, and other processes that don't touch shared memory will never mark their PMChildFlags entry as active. We now maintain separate freelists for different kinds of child processes. That ensures that there are always slots available for autovacuum and background workers. Previously, pre-authentication backends could prevent autovacuum or background workers from starting up, by using up all the slots. The code to manage the slots in the postmaster process is in a new pmchild.c source file. Because postmaster.c is just so large. Assigning pmsignal slot numbers is now pmchild.c's responsibility. This replaces the PMChildInUse array in pmsignal.c. Some of the comments in postmaster.c still talked about the "stats process", but that was removed in commit 5891c7a8ed. Fix those while we're at it. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
* Kill dead-end children when there's nothing else leftHeikki Linnakangas2024-11-14
| | | | | | | | | | | | | | | | Previously, the postmaster would never try to kill dead-end child processes, even if there were no other processes left. A dead-end backend will eventually exit, when authentication_timeout expires, but if a dead-end backend is the only thing that's preventing the server from shutting down, it seems better to kill it immediately. It's particularly important, if there was a bug in the early startup code that prevented a dead-end child from timing out and exiting normally. Includes a test for that case where a dead-end backend previously prevented the server from shutting down. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
* Replace postmaster.c's own backend type codes with BackendTypeHeikki Linnakangas2024-11-14
| | | | | | | | Introduce a separate BackendType for dead-end children, so that we don't need a separate dead_end flag. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
* Remove a useless cast to (void *) in hash_search() callPeter Eisentraut2024-11-14
| | | | | This pattern was previously cleaned up in 54a177a948b, but a new instance snuck in around the same time in 31966b151e6.
* Add nbtree amgettuple return item function.Peter Geoghegan2024-11-13
| | | | | | | | | | This makes it easier to add precondition assertions. We now assert that the last call to _bt_readpage succeeded, and that the current item index is within the bounds of the currPos items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Discussion: https://postgr.es/m/CAH2-WznFkEs9K1PtNruti5JjawY-dwj+gkaEh_k1ZE+1xLLGkA@mail.gmail.com
* Fix comment in injection_point.cMichael Paquier2024-11-13
| | | | | | | | | | InjectionPointEntry->name was described as a hash key, which was fine when introduced in d86d20f0ba79, but it is not now. Oversight in 86db52a5062a, that has changed the way injection points are stored in shared memory from a hash table to an array. Backpatch-through: 17
* Fix obsolete nbtree page reuse FSM comment.Peter Geoghegan2024-11-12
| | | | Oversight in commit d088ba5a.
* Silence compilers about extractNotNullColumn()Álvaro Herrera2024-11-12
| | | | | | | | | | | | | | | Multiple buildfarm animals warn that a newly added Assert() is impossible to fail; remove it to avoid the noise. While at it, use direct assignment to obtain the value we need, avoiding an unnecessary memcpy(). (I decided to remove the "pfree" call for the detoasted short-datum; because this is only used for DDL, it's not problematic to leak such a small allocation.) Noted by Tom Lane about 14e87ffa5c54. Discussion: https://postgr.es/m/3649828.1731083171@sss.pgh.pa.us
* Fix arrays comparison in CompareOpclassOptions()Alexander Korotkov2024-11-12
| | | | | | | | | | | | | The current code calls array_eq() and does not provide FmgrInfo. This commit provides initialization of FmgrInfo and uses C collation as the safe option for text comparison because we don't know anything about the semantics of opclass options. Backpatch to 13, where opclass options were introduced. Reported-by: Nicolas Maus Discussion: https://postgr.es/m/18692-72ea398df3ec6712%40postgresql.org Backpatch-through: 13
* Parallel workers use AuthenticatedUserId for connection privilege checks.Tom Lane2024-11-11
| | | | | | | | | | | | | | | | | | | | | Commit 5a2fed911 had an unexpected side-effect: the parallel worker launched for the new test case would fail if it couldn't use a superuser-reserved connection slot. The reason that test failed while all our pre-existing ones worked is that the connection privilege tests in InitPostgres had been based on the superuserness of the leader's AuthenticatedUserId, but after the rearrangements of 5a2fed911 we were testing the superuserness of CurrentUserId, which the new test case deliberately made to be a non-superuser. This all seems very accidental and probably not the behavior we really want, but a security patch is no time to be redesigning things. Pending some discussion about desirable semantics, hack it so that InitPostgres continues to pay attention to the superuserness of AuthenticatedUserId when starting a parallel worker. Nathan Bossart and Tom Lane, per buildfarm member sawshark. Security: CVE-2024-10978
* Fix improper interactions between session_authorization and role.Tom Lane2024-11-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SQL spec mandates that SET SESSION AUTHORIZATION implies SET ROLE NONE. We tried to implement that within the lowest-level functions that manipulate these settings, but that was a bad idea. In particular, guc.c assumes that it doesn't matter in what order it applies GUC variable updates, but that was not the case for these two variables. This problem, compounded by some hackish attempts to work around it, led to some security-grade issues: * Rolling back a transaction that had done SET SESSION AUTHORIZATION would revert to SET ROLE NONE, even if that had not been the previous state, so that the effective user ID might now be different from what it had been. * The same for SET SESSION AUTHORIZATION in a function SET clause. * If a parallel worker inspected current_setting('role'), it saw "none" even when it should see something else. Also, although the parallel worker startup code intended to cope with the current role's pg_authid row having disappeared, its implementation of that was incomplete so it would still fail. Fix by fully separating the miscinit.c functions that assign session_authorization from those that assign role. To implement the spec's requirement, teach set_config_option itself to perform "SET ROLE NONE" when it sets session_authorization. (This is undoubtedly ugly, but the alternatives seem worse. In particular, there's no way to do it within assign_session_authorization without incompatible changes in the API for GUC assign hooks.) Also, improve ParallelWorkerMain to directly set all the relevant user-ID variables instead of relying on some of them to get set indirectly. That allows us to survive not finding the pg_authid row during worker startup. In v16 and earlier, this includes back-patching 9987a7bf3 which fixed a violation of GUC coding rules: SetSessionAuthorization is not an appropriate place to be throwing errors from. Security: CVE-2024-10978
* Ensure cached plans are correctly marked as dependent on role.Nathan Bossart2024-11-11
| | | | | | | | | | | | | | If a CTE, subquery, sublink, security invoker view, or coercion projection references a table with row-level security policies, we neglected to mark the plan as potentially dependent on which role is executing it. This could lead to later executions in the same session returning or hiding rows that should have been hidden or returned instead. Reported-by: Wolfgang Walther Reviewed-by: Noah Misch Security: CVE-2024-10976 Backpatch-through: 12
* Add two attributes to pg_stat_database for parallel workers activityMichael Paquier2024-11-11
| | | | | | | | | | | | | | | | | | | | | | Two attributes are added to pg_stat_database: * parallel_workers_to_launch, counting the total number of parallel workers that were planned to be launched. * parallel_workers_launched, counting the total number of parallel workers actually launched. The ratio of both fields can provide hints that there are not enough slots available when launching parallel workers, also useful when pg_stat_statements is not deployed on an instance (i.e. cf54a2c00254). This commit relies on de3a2ea3b264, that has added two fields to EState, that get incremented when executing Gather or GatherMerge nodes. A test is added in select_parallel, where parallel workers are spawned. Bump catalog version. Author: Benoit Lobréau Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com
* Assert consistency of currPage that ended scan.Peter Geoghegan2024-11-08
| | | | | | | | | | | | | | | | | When _bt_readnextpage is called with our nbtree parallel scan already seized (i.e. when it is directly called by _bt_first), we never expect a prior call to _bt_readpage for lastcurrblkno to already indicate that the scan should end -- the _bt_first caller's blkno must always be read. After all, the "prior" _bt_readpage call (the call for lastcurrblkno) probably took place in some other backend (and it might not even have finished by the time our backend reaches _bt_first/_bt_readnextpage). Add a documenting assertion to the path where _bt_readnextpage ends the parallel scan based on information about lastcurrblkno from so->currPos. Assert that the most recent _bt_readpage call that set so->currPos is in fact lastcurrblkno's _bt_readpage call. Follow-up to bugfix commit b5ee4e52.
* Improve fix for not entering parallel mode when holding interrupts.Tom Lane2024-11-08
| | | | | | | | | | | | | | | | | | | | | | | Commit ac04aa84a put the shutoff for this into the planner, which is not ideal because it doesn't prevent us from re-using a previously made parallel plan. Revert the planner change and instead put the shutoff into InitializeParallelDSM, modeling it on the existing code there for recovering from failure to allocate a DSM segment. However, that code path is mostly untested, and testing a bit harder showed there's at least one bug: ExecHashJoinReInitializeDSM is not prepared for us to have skipped doing parallel DSM setup. I also thought the Assert in ReinitializeParallelWorkers is pretty ill-advised, and replaced it with a silent Min() operation. The existing test case added by ac04aa84a serves fine to test this version of the fix, so no change needed there. Patch by me, but thanks to Noah Misch for the core idea that we could shut off worker creation when !INTERRUPTS_CAN_BE_PROCESSED. Back-patch to v12, as ac04aa84a was. Discussion: https://postgr.es/m/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com
* Avoid nbtree parallel scan currPos confusion.Peter Geoghegan2024-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1bd4bc85, which refactored nbtree sibling link traversal, made _bt_parallel_seize reset the scan's currPos so that things were consistent with the state of a serial backend moving between pages. This overlooked the fact that _bt_readnextpage relied on the existing currPos state to decide when to end the scan -- even though it came from before the scan was seized. As a result of all this, parallel nbtree scans could needlessly behave like full index scans. To fix, teach _bt_readnextpage to explicitly allow the use of an already read page's so->currPos when deciding whether to end the scan -- even during parallel index scans (allow it consistently now). This requires moving _bt_readnextpage's seizure of the scan to earlier in its loop. That way _bt_readnextpage either deals with the true so->currPos state, or an initialized-by-_bt_parallel_seize currPos state set from when the scan was seized. Now _bt_steppage (the most important _bt_readnextpage caller) takes the same uniform approach to setting up its call using details taken from so->currPos -- regardless of whether the scan happens to be parallel or serial. The new loop structure in _bt_readnextpage is prone to getting confused by P_NONE blknos set when the rightmost or leftmost page was reached. We could avoid that by adding an explicit check, but that would be ugly. Avoid this problem by teaching _bt_parallel_seize to end the parallel scan instead of returning a P_NONE next block/blkno. Doing things this way was arguably a missed opportunity for commit 1bd4bc85. It allows us to remove a similar "blkno == P_NONE" check from _bt_first. Oversight in commit 1bd4bc85, which refactored sibling link traversal (as part of optimizing nbtree backward scan locking). Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Diagnosed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Discussion: https://postgr.es/m/f8efb9c0f8d1a71b44fd7f8e42e49c25@oss.nttdata.com
* Add pg_constraint rows for not-null constraintsÁlvaro Herrera2024-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We now create contype='n' pg_constraint rows for not-null constraints on user tables. Only one such constraint is allowed for a column. We propagate these constraints to other tables during operations such as adding inheritance relationships, creating and attaching partitions and creating tables LIKE other tables. These related constraints mostly follow the well-known rules of conislocal and coninhcount that we have for CHECK constraints, with some adaptations: for example, as opposed to CHECK constraints, we don't match not-null ones by name when descending a hierarchy to alter or remove it, instead matching by the name of the column that they apply to. This means we don't require the constraint names to be identical across a hierarchy. The inheritance status of these constraints can be controlled: now we can be sure that if a parent table has one, then all children will have it as well. They can optionally be marked NO INHERIT, and then children are free not to have one. (There's currently no support for altering a NO INHERIT constraint into inheriting down the hierarchy, but that's a desirable future feature.) This also opens the door for having these constraints be marked NOT VALID, as well as allowing UNIQUE+NOT NULL to be used for functional dependency determination, as envisioned by commit e49ae8d3bc58. It's likely possible to allow DEFERRABLE constraints as followup work, as well. psql shows these constraints in \d+, though we may want to reconsider if this turns out to be too noisy. Earlier versions of this patch hid constraints that were on the same columns of the primary key, but I'm not sure that that's very useful. If clutter is a problem, we might be better off inventing a new \d++ command and not showing the constraints in \d+. For now, we omit these constraints on system catalog columns, because they're unlikely to achieve anything. The main difference to the previous attempt at this (b0e96f311985) is that we now require that such a constraint always exists when a primary key is in the column; we didn't require this previously which had a number of unpalatable consequences. With this requirement, the code is easier to reason about. For example: - We no longer have "throwaway constraints" during pg_dump. We needed those for the case where a table had a PK without a not-null underneath, to prevent a slow scan of the data during restore of the PK creation, which was particularly problematic for pg_upgrade. - We no longer have to cope with attnotnull being set spuriously in case a primary key is dropped indirectly (e.g., via DROP COLUMN). Some bits of code in this patch were authored by Jian He. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Bernd Helmle <mailings@oopsware.de> Reviewed-by: 何建 (jian he) <jian.universality@gmail.com> Reviewed-by: 王刚 (Tender Wang) <tndrwang@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/202408310358.sdhumtyuy2ht@alvherre.pgsql
* Disallow partitionwise join when collations don't matchAmit Langote2024-11-08
| | | | | | | | | | | | | | | | If the collation of any join key column doesn’t match the collation of the corresponding partition key, partitionwise joins can yield incorrect results. For example, rows that would match under the join key collation might be located in different partitions due to the partitioning collation. In such cases, a partitionwise join would yield different results from a non-partitionwise join, so disallow it in such cases. Reported-by: Tender Wang <tndrwang@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com Backpatch-through: 12
* Disallow partitionwise grouping when collations don't matchAmit Langote2024-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the collation of any grouping column doesn’t match the collation of the corresponding partition key, partitionwise grouping can yield incorrect results. For example, rows that would be grouped under the grouping collation may end up in different partitions under the partitioning collation. In such cases, full partitionwise grouping would produce results that differ from those without partitionwise grouping, so disallowed that. Partial partitionwise aggregation is still allowed, as the Finalize step reconciles partition-level aggregates with grouping requirements across all partitions, ensuring that the final output remains consistent. This commit also fixes group_by_has_partkey() by ensuring the RelabelType node is stripped from grouping expressions when matching them to partition key expressions to avoid false mismatches. Bug: #18568 Reported-by: Webbo Han <1105066510@qq.com> Author: Webbo Han <1105066510@qq.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com Backpatch-through: 12
* Fix inconsistent RestrictInfo serial numbersRichard Guo2024-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we generate multiple clones of the same qual condition to cope with outer join identity 3, we need to ensure that all the clones get the same serial number. To achieve this, we reset the root->last_rinfo_serial counter each time we produce RestrictInfo(s) from the qual list (see deconstruct_distribute_oj_quals). This approach works only if we ensure that we are not changing the qual list in any way that'd affect the number of RestrictInfos built from it. However, with b262ad440, an IS NULL qual on a NOT NULL column might result in an additional constant-FALSE RestrictInfo. And different versions of the same qual clause can lead to different conclusions about whether it can be reduced to constant-FALSE. This would affect the number of RestrictInfos built from the qual list for different versions, causing inconsistent RestrictInfo serial numbers across multiple clones of the same qual. This inconsistency can confuse users of these serial numbers, such as rebuild_joinclause_attr_needed, and lead to planner errors such as "ERROR: variable not found in subplan target lists". To fix, reset the root->last_rinfo_serial counter after generating the additional constant-FALSE RestrictInfo. Back-patch to v17 where the issue crept in. In v17, I failed to make a test case that would expose this bug, so no test case for v17. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs4-B6kafn+LmPuh-TYFwFyEm-vVj3Qqv7Yo-69CEv14rRg@mail.gmail.com
* Clarify a foreign key error messagePeter Eisentraut2024-11-07
| | | | | | | | Clarify the message about type mismatch in foreign key definition to indicate which column the referencing and which is the referenced one. Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/CACJufxEL82ao-aXOa=d_-Xip0bix-qdSyNc9fcWxOdkEZFko8w@mail.gmail.com
* Remove an obsolete comment in gistinsert()Michael Paquier2024-11-07
| | | | | | | | | This is inconsistent since 1f7ef548ec2e where the definition of gistFormTuple() has changed. Author: Tender Wang Reviewed-by: Aleksander Alekseev Discussion: https://postgr.es/m/CAHewXNkjU95_HdioDVU=5yBq_Xt=GfBv=Od-0oKtiA006pWW7Q@mail.gmail.com
* Replicate generated columns when 'publish_generated_columns' is set.Amit Kapila2024-11-07
| | | | | | | | | | | | | | | | | | | | This patch builds on the work done in commit 745217a051 by enabling the replication of generated columns alongside regular column changes through a new publication parameter: publish_generated_columns. Example usage: CREATE PUBLICATION pub1 FOR TABLE tab_gencol WITH (publish_generated_columns = true); The column list takes precedence. If the generated columns are specified in the column list, they will be replicated even if 'publish_generated_columns' is set to false. Conversely, if generated columns are not included in the column list (assuming the user specifies a column list), they will not be replicated even if 'publish_generated_columns' is true. Author: Vignesh C, Shubham Khanna Reviewed-by: Peter Smith, Amit Kapila, Hayato Kuroda, Shlok Kyal, Ajin Cherian, Hou Zhijie, Masahiko Sawada Discussion: https://postgr.es/m/B80D17B2-2C8E-4C7D-87F2-E5B4BE3C069E@gmail.com
* Remove unused variableDaniel Gustafsson2024-11-06
| | | | | | | | | | | The low variable has not been used since it was added in d168b666823 and can be safely removed. The variable is present in the Sedgewick paper "Analysis of Shellsort and Related Algorithms" as a parameter to the shellsort function, but our implementation does not use it. Remove to improve readability of the code. Author: Koki Nakamura <btnakamurakoukil@oss.nttdata.com> Discussion: https://postgr.es/m/8aeb7b3eda53ca4c65fbacf8f43628fb@oss.nttdata.com
* doc: Remove event trigger firing matrixPeter Eisentraut2024-11-06
| | | | | | | | | | | | | | | | This is difficult to maintain accurately, and it was probably already somewhat incorrect, especially in the sql_drop and table_rewrite categories. The prior section already documented which DDL commands are *not* supported (which was also slightly outdated), so let's expand that a bit and just rely on that instead of listing out each command in full detail. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CACJufxE_UAuxcM08BW5oVsg34v0cFWoEt8yBa5xSAoKLmL6LTQ%40mail.gmail.com
* Monkey-patch LLVM code to fix ARM relocation bug.Thomas Munro2024-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Supply a new memory manager for RuntimeDyld, to avoid crashes in generated code caused by memory placement that can overflow a 32 bit data type. This is a drop-in replacement for the llvm::SectionMemoryManager class in the LLVM library, with Michael Smith's proposed fix from https://www.github.com/llvm/llvm-project/pull/71968. We hereby slurp it into our own source tree, after moving into a new namespace llvm::backport and making some minor adjustments so that it can be compiled with older LLVM versions as far back as 12. It's harder to make it work on even older LLVM versions, but it doesn't seem likely that people are really using them so that is not investigated for now. The problem could also be addressed by switching to JITLink instead of RuntimeDyld, and that is the LLVM project's recommended solution as the latter is about to be deprecated. We'll have to do that soon enough anyway, and then when the LLVM version support window advances far enough in a few years we'll be able to delete this code. Unfortunately that wouldn't be enough for PostgreSQL today: in most relevant versions of LLVM, JITLink is missing or incomplete. Several other projects have already back-ported this fix into their fork of LLVM, which is a vote of confidence despite the lack of commit into LLVM as of today. We don't have our own copy of LLVM so we can't do exactly what they've done; instead we have a copy of the whole patched class so we can pass an instance of it to RuntimeDyld. The LLVM project hasn't chosen to commit the fix yet, and even if it did, it wouldn't be back-ported into the releases of LLVM that most of our users care about, so there is not much point in waiting any longer for that. If they make further changes and commit it to LLVM 19 or 20, we'll still need this for older versions, but we may want to resynchronize our copy and update some comments. The changes that we've had to make to our copy can be seen by diffing our SectionMemoryManager.{h,cpp} files against the ones in the tree of the pull request. Per the LLVM project's license requirements, a copy is in SectionMemoryManager.LICENSE. This should fix the spate of crash reports we've been receiving lately from users on large memory ARM systems. Back-patch to all supported releases. Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Co-authored-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> (license aspects) Reported-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Discussion: https://postgr.es/m/CAO6_Xqr63qj%3DSx7HY6ZiiQ6R_JbX%2B-p6sTPwDYwTWZjUmjsYBg%40mail.gmail.com
* Fix hypothetical bug in ExprState building for hashingDavid Rowley2024-11-06
| | | | | | | | | | | | | | | | | | | adf97c156 gave ExprStates the ability to hash expressions and return a single hash value. That commit supports seeding the hash value with an initial value to have that blended into the final hash value. Here we fix a hypothetical bug where if there are zero expressions to hash, the initial value is stored in the wrong location. The existing code stored the initial value in an intermediate location expecting that when the expressions were hashed that those steps would store the final hash value in the ExprState.resvalue field. However, that wouldn't happen when there are zero expressions to hash. The correct thing to do instead is to have a special case for zero expressions and when we hit that case, store the initial value directly in the ExprState.resvalue. The reason that this is a hypothetical bug is that no code currently calls ExecBuildHash32Expr passing a non-zero initial value. Discussion: https://postgr.es/m/CAApHDvpMAL_zxbMRr1LOex3O7Y7R7ZN2i8iUFLQhqQiJMAg3qw@mail.gmail.com
* Clear padding of PgStat_HashKey when handling pgstats entriesMichael Paquier2024-11-05
| | | | | | | | | | | | | | | | | | PgStat_HashKey is currently initialized in a way that could result in random data if the structure has any padding bytes. The structure has no padding bytes currently, fortunately, but it could become a problem should the structure change at some point in the future. The code is changed to use some memset(0) so as any padding would be handled properly, as it would be surprising to see random failures in the pgstats entry lookups. PgStat_HashKey is a structure internal to pgstats, and an ABI change could be possible in the scope of a bug fix, so backpatch down to 15 where this has been introduced. Author: Bertrand Drouvot Reviewed-by: Jelte Fennema-Nio, Michael Paquier Discussion: https://postgr.es/m/Zyb7RW1y9dVfO0UH@ip-10-97-1-34.eu-west-3.compute.internal Backpatch-through: 15
* Revert pg_wal_replay_wait() stored procedureAlexander Korotkov2024-11-04
| | | | | | | | | | | | | | | | This commit reverts 3c5db1d6b0, and subsequent improvements and fixes including 8036d73ae3, 867d396ccd, 3ac3ec580c, 0868d7ae70, 85b98b8d5a, 2520226c95, 014f9f34d2, e658038772, e1555645d7, 5035172e4a, 6cfebfe88b, 73da6b8d1b, and e546989a26. The reason for reverting is a set of remaining issues. Most notably, the stored procedure appears to need more effort than the utility statement to turn the backend into a "snapshot-less" state. This makes an approach to use stored procedures questionable. Catversion is bumped. Discussion: https://postgr.es/m/Zyhj2anOPRKtb0xW%40paquier.xyz
* Fix typo in comment of gistdoinsert().Masahiko Sawada2024-11-04
| | | | | | Author: Tender Wang Reviewed-by: Masahiko Sawada Discussion: https://postgr.es/m/CAHewXN%3D3sH2sNw4nC3QGCEVw1Lftmw9m5y1Xje0bXK6ApDrsPQ%40mail.gmail.com
* Fix obsolete _bt_first comments.Peter Geoghegan2024-11-04
| | | | | | | _bt_first doesn't necessarily hold onto a buffer pin on success exit. Fix header comments that claimed that we'll always hold onto a pin. Oversight in commit 2ed5b87f96.
* nbtree: Remove useless 'strat' local variable.Peter Geoghegan2024-11-04
| | | | | | | | | | | | | | | | | | | | | Remove a local variable that was used to avoid overwriting strat_total with the = operator strategy when a >= operator strategy key was already included in the initial positioning/insertion scan keys by _bt_first (for backwards scans it would have to be a <= key that was included). _bt_first's strat_total local variable now simply tracks the operator strategy of the final scan key that was included in the scan's insertion scan key (barring the case where the !used_all_subkeys row compare path adjusts strat_total in its own way). _bt_first already treated >= keys (or <= keys) as = keys for initial positioning purposes. There is no good reason to remember that that was what happened; no later _bt_first step cares about the distinction. Note, in particular, that the insertion scan key's 'nextkey' and 'backward' fields will be initialized the same way regardless. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
* Split ProcSleep function into JoinWaitQueue and ProcSleepHeikki Linnakangas2024-11-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split ProcSleep into two functions: JoinWaitQueue and ProcSleep. JoinWaitQueue is called while holding the partition lock, and inserts the current process to the wait queue, while ProcSleep() does the actual sleeping. ProcSleep() is now called without holding the partition lock, and it no longer re-acquires the partition lock before returning. That makes the wakeup a little cheaper. Once upon a time, re-acquiring the partition lock was needed to prevent a signal handler from longjmping out at a bad time, but these days our signal handlers just set flags, and longjmping can only happen at points where we explicitly run CHECK_FOR_INTERRUPTS(). If JoinWaitQueue detects an "early deadlock" before even joining the wait queue, it returns without changing the shared lock entry, leaving the cleanup of the shared lock entry to the caller. This makes the handling of an early deadlock the same as the dontWait=true case. One small user-visible side-effect of this refactoring is that we now only set the 'ps' title to say "waiting" when we actually enter the sleep, not when the lock is skipped because dontWait=true, or when a deadlock is detected early before entering the sleep. This eliminates the 'lockAwaited' global variable in proc.c, which was largely redundant with 'awaitedLock' in lock.c Note: Updating the local lock table is now the caller's responsibility. JoinWaitQueue and ProcSleep are now only responsible for modifying the shared state. Seems a little nicer that way. Based on Thomas Munro's earlier patch and observation that ProcSleep doesn't really need to re-acquire the partition lock. Reviewed-by: Maxim Orlov Discussion: https://www.postgresql.org/message-id/7c2090cd-a72a-4e34-afaa-6dd2ef31440e@iki.fi
* Move TRACE calls into WaitOnLock()Heikki Linnakangas2024-11-04
| | | | | | | | LockAcquire is a long and complex function. Pushing more stuff to its subroutines makes it a little more manageable. Reviewed-by: Maxim Orlov Discussion: https://www.postgresql.org/message-id/7c2090cd-a72a-4e34-afaa-6dd2ef31440e@iki.fi
* Set MyProc->heldLocks in ProcSleepHeikki Linnakangas2024-11-04
| | | | | | | | | | | | Previously, ProcSleep()'s caller was responsible for setting MyProc->heldLocks, and we had comments to remind about that. But it seems simpler to make ProcSleep() itself responsible for it. ProcSleep() already set the other info about the lock its waiting for (waitLock, waitProcLock and waitLockMode), so it is natural for it to set heldLocks too. Reviewed-by: Maxim Orlov Discussion: https://www.postgresql.org/message-id/7c2090cd-a72a-4e34-afaa-6dd2ef31440e@iki.fi
* Clarify nbtree parallel scan _bt_endpoint contract.Peter Geoghegan2024-11-04
| | | | | | | | | | | | | | | _bt_endpoint is a helper function for _bt_first that's called whenever no useful insertion scan key can be used, and we need to lock and read either the leftmost or rightmost leaf page in the index. Simplify and document its preconditions, relieving its _bt_first caller from having to end the parallel scan when it returns false. Also stop unnecessarily invalidating the current scan position in nearby code in both _bt_first and _bt_endpoint. This seems to have been copy-pasted from _bt_readnextpage, where invalidating the scan's current position really is necessary. Follow-up to the refactoring work in commit 1bd4bc85.
* Fix comment in LockReleaseAll() on when locallock->nLock can be zeroHeikki Linnakangas2024-11-04
| | | | | | | | We reach this case also e.g. when a deadlock is detected, not only when we run out of memory. Reviewed-by: Maxim Orlov Discussion: https://www.postgresql.org/message-id/7c2090cd-a72a-4e34-afaa-6dd2ef31440e@iki.fi
* Suppress new "may be used uninitialized" warning.Noah Misch2024-11-02
| | | | | | Buildfarm member mamba fails to deduce that the function never uses this variable without initializing it. Back-patch to v12, like commit b412f402d1e020c5dac94f3bf4a005db69519b99.
* Fix inplace update buffer self-deadlock.Noah Misch2024-11-02
| | | | | | | | | | | | | | | A CacheInvalidateHeapTuple* callee might call CatalogCacheInitializeCache(), which needs a relcache entry. Acquiring a valid relcache entry might scan pg_class. Hence, to prevent undetected LWLock self-deadlock, CacheInvalidateHeapTuple* callers must not hold BUFFER_LOCK_EXCLUSIVE on buffers of pg_class. Move the CacheInvalidateHeapTupleInplace() before the BUFFER_LOCK_EXCLUSIVE. No back-patch, since I've reverted commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 from non-master branches. Reported by Alexander Lakhin. Reviewed by Alexander Lakhin. Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
* Move I/O before the index_update_stats() buffer lock region.Noah Misch2024-11-02
| | | | | | | | | | | Commit a07e03fd8fa7daf4d1356f7cb501ffe784ea6257 enlarged the work done here under the pg_class heap buffer lock. Two preexisting actions are best done before holding that lock. Both RelationGetNumberOfBlocks() and visibilitymap_count() do I/O, and the latter might exclusive-lock a visibility map buffer. Moving these reduces contention and risk of undetected LWLock deadlock. Back-patch to v12, like that commit. Discussion: https://postgr.es/m/20241031200139.b4@rfd.leadboat.com
* Clarify nbtree array preprocessing comment.Peter Geoghegan2024-11-01
| | | | Oversight in commit 5bf748b8.
* Rename two functions that wake up other processesHeikki Linnakangas2024-11-01
| | | | | | | | | | | Instead of talking about setting latches, which is a pretty low-level mechanism, emphasize that they wake up other processes. This is in preparation for replacing Latches with a new abstraction. That's still work in progress, but this seems a little tidier anyway, so let's get this refactoring out of the way already. Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c%40iki.fi
* Use ProcNumbers instead of direct Latch pointers to address other procsHeikki Linnakangas2024-11-01
| | | | | | | | This is in preparation for replacing Latches with a new abstraction. That's still work in progress, but this seems a little tidier anyway, so let's get this refactoring out of the way already. Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c%40iki.fi
* Remove use of pg_memory_is_all_zeros() in bufpage.cMichael Paquier2024-11-01
| | | | | | | | | | | | | After a closer lookup, this makes the all-zero check of the page more expensive, so let's remove the new function call in bufpage.c. The maths of the check were also incorrect, checking that the page was full of zeros only for the first 1kB. This brings back the code to the state it was at 49d6c7d8daba. Per discussion with David Rowley and Bertrand Drouvot. Discussion: https://postgr.es/m/CAApHDvrXzPAr3FxoBuB7b3D-okNoNA2jxLun1rW8Yw5wkbqusw@mail.gmail.com
* Add pg_memory_is_all_zeros() in memutils.hMichael Paquier2024-11-01
| | | | | | | | | | | | | | | | | This new function tests if a memory region starting at a given location for a defined length is made only of zeroes. This unifies in a single path the all-zero checks that were happening in a couple of places of the backend code: - For pgstats entries of relation, checkpointer and bgwriter, where some "all_zeroes" variables were previously used with memcpy(). - For all-zero buffer pages in PageIsVerifiedExtended(). This new function uses the same forward scan as the check for all-zero buffer pages, applying it to the three pgstats paths mentioned above. Author: Bertrand Drouvot Reviewed-by: Peter Eisentraut, Heikki Linnakangas, Peter Smith Discussion: https://postgr.es/m/ZupUDDyf1hHI4ibn@ip-10-97-1-34.eu-west-3.compute.internal