| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fc069a3a6319 implements Self-Join Elimination (SJE), which can remove base
relations when appropriate. However, regressions tests for SJE only cover
the case when placeholder variables (PHVs) are evaluated and needed only
in a single base rel. If this baserel is removed due to SJE, its clauses,
including PHVs, will be transferred to the keeping relation. Removing these
PHVs may trigger an error on plan creation -- thanks to the b3ff6c742f6c for
detecting that.
This commit skips removal of PHVs during SJE. This might also happen that
we skip the removal of some PHVs that could be removed. However, the overhead
of extra PHVs is small compared to the complexity of analysis needed to remove
them.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This cleans up some loose ends left by commit e8ca9ed1d. I hadn't
looked closely enough at these places before, but now I have.
The use of double-quoted #includes for Perl headers in plperl_system.h
seems to be simply a mistake introduced in 6c944bf3c and faithfully
copied forward since then. (I had thought possibly it was required
by some weird Windows build setup, but there's no evidence of that in
our history.)
The occurrences in SectionMemoryManager.h and SectionMemoryManager.cpp
evidently stem from those files' origin as LLVM code. It's
understandable that LLVM would treat their own files as needing
double-quoted #includes; but they're still system headers to us.
I also applied the same check to *.c files, and found a few other
random incorrect usages in both directions.
Our ECPG headers and test files routinely use angle brackets to refer
to ECPG headers. I left those usages alone, since it seems reasonable
for an ECPG user to regard those headers as system headers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
c4d5cb71d2 adjusted the fast-path locking code to allow some
configuration of the number of fast-path locking slots via the
max_locks_per_transaction GUC. In that commit the FAST_PATH_REL_GROUP()
macro used integer division to determine the fast-path locking group slot
to use for the lock.
The divisor in this case is always a power-of-two value. Here we swap
out the divide by a bitwise-AND, which is a significantly faster
operation to perform.
In passing, adjust the code that's setting FastPathLockGroupsPerBackend
so that it's more clear that the value being set is a power-of-two.
Also, adjust some comments in the area which contained some magic
numbers. It seems better to justify the 1024 upper limit in the
location where the #define is made instead of where it is used.
Author: David Rowley <drowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAApHDvodr3bcnpxcs7+k-3cFwYR0tP-BYhyd2PpDhe-bCx9i=g@mail.gmail.com
|
|
|
|
|
|
|
| |
Trying to investigate a bug report by Alexander Lakhin made it apparent that
the debug logging around waiting for IO completion is insufficient. Fix that.
Discussion: https://postgr.es/m/h4in2db37vepagmi2oz5vvqymjasc5gyb4lpqkunj4eusu274i@37jpd3c2spd3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
10f66468475 intended to limit the value of io_combine_limit to the minimum of
io_combine_limit and io_max_combine_limit. To avoid issues with interdependent
GUCs, it introduced io_combine_limit_guc and set io_combine_limit in assign
hooks. That plan was thwarted by guc_tables.c accidentally still referencing
io_combine_limit, instead of io_combine_limit_guc. That lead to the GUC
machinery overriding the work done in the assign hooks, potentially leaving
io_combine_limit with a too high value.
The consequence of this bug was that when running with io_combine_limit >
io_combine_limit_guc the AIO machinery would not have reserved large enough
iovec and IO data arrays, with one IO's arrays overlapping with another IO's,
leading to total confusion.
To make such a problem easier to detect in the future, add assertions to
pgaio_io_set_handle_data_* checking the length is smaller than
io_max_combine_limit (not just PG_IOV_MAX).
It'd be nice to have a few tests for this, but it's not entirely obvious how
to do so portably.
As remarked upon by Tom, the GUC assignment hooks really shouldn't set the
underlying variable, that's the job of the GUC machinery. Change that as well.
Discussion: https://postgr.es/m/c5jyqnuwrpigd35qe7xdypxsisdjrdba5iw63mhcse4mzjogxo@qdjpv22z763f
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pgaio_io_reclaim() reset the fields in PgAioHandle before updating the state
to IDLE or incrementing the generation. For most things that's OK, but for
pg_get_aios() it is not - if it copied the PgAioHandle while fields were being
reset, we wouldn't detect that and could call
pgaio_io_get_target_description() with ioh->target == PGAIO_TID_INVALID,
leading to a crash.
Fix this issue by incrementing the generation and state earlier, before
resetting.
Also add an assertion to pgaio_io_get_target_description() for the target to
be valid - that'd have made this case a bit easier to debug. While at it,
add/update a few related assertions.
Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/062daca9-dfad-4750-9da8-b13388301ad9@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not having this check would produce a core dump at startup when running
pgstat_read_statsfile(), in the case where the information of a stats
kind for an entry in the dshash could not be found. The same check
already happens for fixed-numbered stats and entries that are stored
with their names. This issue can be seen with custom stats kinds.
Note that this problem can be reproduced what what is in the core code:
- Tweak the test module injection_points to not load the fixed-numbered
stats part, leaving only the variable-numbered stats.
- Create an instance with injection_points defined in
shared_preload_libraries.
- Create a pgstats entry by attaching and running a point.
- Restart the server without shared_preload_libraries. The startup
process detects that something is wrong and reports a WARNING.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aAieZAvM+K1d89R2@ip-10-97-1-34.eu-west-3.compute.internal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One place in hash_create() used DynaHashAlloc() as a convenient
shorthand for MemoryContextAlloc(). That was fine when it was
written, but it stopped being fine when 9c911ec06 changed
DynaHashAlloc() to use MCXT_ALLOC_NO_OOM (mea culpa). Change
the code to call plain MemoryContextAlloc() as intended.
I think that this bug may be unreachable in practice, since we now
always create AllocSets with some space already allocated, so that
an OOM failure here for a non-shared hash table should be impossible
(with a hash table name of reasonable length anyway). And there
aren't enough shared hash tables to make a crash for one of those
probable. Nonetheless it's clearly not operating as designed, so
back-patch to v16 where 9c911ec06 came in.
Reported-by: Maksim Korotkov <m.korotkov@postgrespro.ru>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/219bdccd460510efaccf90b57e5e5ef2@postgrespro.ru
Backpatch-through: 16
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
b85a9d046efd introduced a new RelIdToTypeIdCacheHash, whose entries should
exist for typecache entries with TCFLAGS_HAVE_PG_TYPE_DATA flag set or any
of TCFLAGS_OPERATOR_FLAGS set or tupDesc set. However, TypeCacheOpcCallback(),
which resets TCFLAGS_OPERATOR_FLAGS, was forgotten to update
RelIdToTypeIdCacheHash.
This commit adds a delete_rel_type_cache_if_needed() call to the
TypeCacheOpcCallback() function to maintain RelIdToTypeIdCacheHash after
resetting TCFLAGS_OPERATOR_FLAGS.
Also, this commit fixes the name of the delete_rel_type_cache_if_needed()
function in its mentions in the comments.
Reported-by: Noah Misch
Discussion: https://postgr.es/m/20250411203241.e9.nmisch%40google.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To estimate with extended statistics, we need to clear the varnullingrels
field in the expression, and duplicates are not allowed in the GroupVarInfo
list. We might re-use add_unique_group_var(), but we don't do so for two
reasons.
1) We must keep the origin_rinfos list ordered exactly the same way as
varinfos.
2) add_unique_group_var() is designed for estimate_num_groups(), where a
larger number of groups is worse. While estimating the number of hash
buckets, we have the opposite: a lesser number of groups is worse.
Therefore, we don't have to remove "known equal" vars: the removed var
may valuably contribute to the multivariate statistics to grow the number
of groups.
This commit adds custom code to estimate_multivariate_bucketsize() to
initialize varinfos properly.
Reported-by: Robins Tharakan <tharakan@gmail.com>
Discussion: https://postgr.es/m/18885-da51324078588253%40postgresql.org
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a foreign key constraint is placed on a partitioned table, we
actually make two pg_constraint entries associated with that table.
(I have my doubts about the wisdom of that, but it's been like that
since v12 and post-feature-freeze is no time to be messing with such
entrenched decisions.) The second "child" entry always had a name
generated according to the default rule, "table_column(s)_fkey[nnn]",
even if the primary entry had an unrelated user-specified name. The
trouble with doing that is that the default name could collide with
the user-specified name of some other constraint on the same table.
While we were willing to adjust the generated name to avoid
collisions, that only helps if it's made second; if it's made first
then creation of the other constraint would fail, potentially causing
dump/reload or pg_upgrade failures.
The core of the problem here is that we're infringing on user
namespace, so I doubt that there's any 100% solution other than to
find a way to not need the "child" entry. In the meantime, it seems
like it'd be an improvement to make the child's name be the name of
the parent constraint with an underscore and digit(s) appended as
necessary to make it unique. This rule can in theory fail in the same
way, but it seems much less probable; for one thing, this rule is
guaranteed not to match primary entries having auto-generated names.
(While an auto-generated primary name isn't user-specified to begin
with, it acts like that during dump/reload, so collisions against such
names are definitely possible.)
An additional bonus, visible in some of the regression test cases
that change here, arises from the fact that some error messages
cite the child constraint's name not the parent's. In the
previous approach the two names could be completely unrelated,
leading to user confusion --- the more so since psql's \d command
hides child constraints. With this approach it's hopefully much
clearer which constraint-the-user-knows-about is failing.
However, that does mean that there's user-visible behavior change
occurring here, making it seem like not something to back-patch.
I feel it's not too late for v18, though.
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CALdSSPhGitjpTfzEMJN-Y2x+Q-5QChSxAsmSJ1-E8mQJLkHOqQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 3f28b2fcac tried to ensure that the replication origin shouldn't be
advanced in case of an ERROR in the apply worker, so that it can request
the same data again after restart. However, it is possible that an ERROR
was caught and handled by a (say PL/pgSQL) function, and the apply worker
continues to apply further changes, in which case, we shouldn't reset the
replication origin.
Ensure to reset the origin only when the apply worker exits after an
ERROR.
Commit 3f28b2fcac added new function geterrlevel, which we removed in HEAD
as part of this commit, but kept it in backbranches to avoid breaking any
applications. A separate case can be made to have such a function even for
HEAD.
Reported-by: Shawn McCoy <shawn.the.mccoy@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/CALsgZNCGARa2mcYNVTSj9uoPcJo-tPuWUGECReKpNgTpo31_Pw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This assertion, based on pending_since (timestamp used to prevent stats
reports to be too frequent or should a partial flush happen), is reached
when it is found that no data can be flushed but a previous call of
pgstat_report_stat() determined that some stats data has been found as
in need of a flush. So pending_since is set when some stats data is
pending (in non-force mode) or if report attempts are too frequent, and
reset to 0 once all stats have been flushed.
Since 5cbbe70a9cc6, WAL senders have begun to report their stats on a
periodic basis for IO stats in v16~ and backend stats on HEAD, creating
some friction with the concurrent pgstat_report_stat() calls that can
happen in the context of a WAL sender (shutdown callback doing a final
report or backend-related code paths). This problem is the cause of
spurious failures in the TAP tests.
In theory, this assertion can be also reached in v15, even if that's
very unlikely. For example, a process, say a background worker, could
do periodic and direct stats flushes with concurrent calls of
pgstat_report_stat() that could cause conflicting values of
pending_since. This can be done with WAL or SLRU stats flushes using
pgstat_flush_wal() or pgstat_slru_flush(). HEAD makes this situation
easier to happen with custom cumulative stats.
This commit removes the assertion altogether, per discussion, as it is
more useful to keep the state of things as they are for the WAL sender.
The assertion could use a special state based on for example
am_walsender, but I doubt that this would be meaningful in the long run
based on the other arguments raised while discussing this issue.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/1489124.1744685908@sss.pgh.pa.us
Discussion: https://postgr.es/m/dwrkeszz6czvtkxzr5mqlciy652zau5qqnm3cp5f3p2po74ppk@omg4g3cc6dgq
Backpatch-through: 15
|
|
|
|
|
|
|
|
|
|
|
|
| |
This error message was 'runaway "struct_name"', which isn't all
that clear; I think 'could not find closing brace for "struct_name"'
is better. Also, provide the location of the struct start using the
script's usual '$file:$lineno' style.
Bug: #18901
Reported-by: Clemens Ruck <clemens.ruck@t-online.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18901-424272abe01357e6@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This injection point was named "AtEOXact_Inval-with-transInvalInfo", not
respecting the implied naming convention that injection points should
use lower-case characters, with terms separated by dashes. All the
other points defined in the tree follow this style, so let's be more
consistent.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/OSCPR01MB14966E14C1378DEE51FB7B7C5F5B32@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 17
|
|
|
|
|
|
|
|
|
|
|
|
| |
Word boundaries are based on whether a character is alphanumeric or
not. For the PG_UNICODE_FAST collation, alphanumeric includes
non-ASCII digits; whereas for the PG_C_UTF8 collation, it only
includes digits 0-9. Pass down the right information from the
pg_locale_t into initcap_wbnext to differentiate the behavior.
Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250417135841.33.nmisch@google.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
exec_replication_command created a cmd_context to work in and
then deleted it on exit. This is pretty dangerous because
some replication commands start/finish transactions. In the
wake of commit 1afe31f03, that could lead to re-selecting a
CurrentMemoryContext that's already been deleted, leading to
hilarity such as a memory context that is its own parent.
To fix, let's make the cmd_context persist across
exec_replication_command calls; instead of deleting it, we'll just
reset it each time. In this way it retains the same identity and
there's no problem if transaction abort restores it as the working
context. It probably even saves a few microseconds to do this.
This fix also ensures that exec_replication_command returns to the
caller (PostgresMain) with the same context active that had been
when it was called (probably MessageContext). The previous
coding could get that wrong too.
Reported-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAO6_XqoJA7-_G6t7Uqe5nWF3nj+QBGn4F6Ptp=rUGDr0zo+KvA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The case of "node == parent" might seem impossible, since we just
allocated the new node. But it's possible if parent is a dangling
reference to a recently-deleted context. In fact, given aset.c's
habit of recycling contexts, it's actually rather likely if that's so.
If we'd had this assertion before, it would have simplified debugging
a recently-identified walsender issue.
Reported-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAO6_XqoJA7-_G6t7Uqe5nWF3nj+QBGn4F6Ptp=rUGDr0zo+KvA@mail.gmail.com
|
|
|
|
|
|
|
|
| |
Similar to 84fd3bc14 but these ones were found using a regex that can span
multiple lines.
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrMcr8XD107H3NV=WHgyBcu=sx5+7=WArr-n_cWUqdFXQ@mail.gmail.com
|
|
|
|
|
|
|
| |
These are all new to v18
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrMcr8XD107H3NV=WHgyBcu=sx5+7=WArr-n_cWUqdFXQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Blocking checkpoint phase 2 requires MarkBufferDirty() and
BUFFER_LOCK_EXCLUSIVE; neither suffices by itself. transam/README documents
this, citing SyncOneBuffer(). Update the DELAY_CHKPT_START documentation to
say this. Expand the heap_inplace_update_and_unlock() comment that cites
XLogSaveBufferForHint() as precedent, since heap_inplace_update_and_unlock()
could have opted not to use DELAY_CHKPT_START.
Commit 8e7e672cdaa6bfec85d4d5dd9be84159df23bb41 added DELAY_CHKPT_START to
heap_inplace_update_and_unlock(). Since commit
bc6bad88572501aecaa2ac5d4bc900ac0fd457d5 reverted it in non-master branches,
no back-patch.
Discussion: https://postgr.es/m/20250406180054.26.nmisch@google.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 7102070329d8147246d2791321f9915c3b5abf31 fixed a similar bug, but
it missed the case of database-wide ANALYZE ("use_own_xacts" mode).
Commit a07e03fd8fa7daf4d1356f7cb501ffe784ea6257 changed consequences
from silent discard of a pg_class stats (relpages et al.) update to
ERROR "tuple to be updated was already modified". Losing a relpages
update of an ON COMMIT DELETE ROWS table was negligible, but a
COMMIT-time error isn't negligible. Back-patch to v13 (all supported
versions).
Reported-by: Richard Guo <guofenglinux@gmail.com
Reported-by: Robins Tharakan <tharakan@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-XwMKMKJ_GT=p3_-_=j9rQSEs1FbDFUnW9zHuKPsPNEQ@mail.gmail.com
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1349d2790 added support so that aggregate functions with an ORDER BY or
DISTINCT clause could make use of presorted inputs to avoid an implicit
sort within nodeAgg.c. That commit failed to consider that a FILTER
clause may exist that filters rows before the aggregate function
arguments are evaluated. That can be problematic if an aggregate
argument contains an expression which could error out during evaluation.
It's perfectly valid to want to have a FILTER clause which eliminates
such values, and with the pre-sorted path added in 1349d2790, it was
possible that the planner would produce a plan with a Sort node above
the Aggregate to perform the sort on the aggregate's arguments long before
the Aggregate node would filter out the non-matching values.
Here we fix this by inspecting ORDER BY / DISTINCT aggregate functions
which have a FILTER clause to see if the aggregate's arguments are
anything more complex than a Var or a Const. Evaluating these isn't
going to cause an error. If we find any non-Var, non-Const parameters
then the planner will now opt to perform the sort in the Aggregate node
for these aggregates, i.e. disable the presorted aggregate optimization.
An alternative fix would have been to completely disallow the presorted
optimization for Aggrefs with any FILTER clause, but that wasn't done as
that could cause large performance regressions for queries that see
significant gains from 1349d2790 due to presorted results coming in from
an Index Scan.
Backpatch to 16, where 1349d2790 was introduced
Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Kaimeh <kkaimeh@gmail.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAK-%2BJz9J%3DQ06-M7cDJoPNeYbz5EZDqkjQbJnmRyQyzkbRGsYkA%40mail.gmail.com
Backpatch-through: 16
|
|
|
|
|
|
|
|
| |
The large majority of these have been introduced by recent commits done
in the v18 development cycle.
Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/9a7763ab-5252-429d-a943-b28941e0e28b@gmail.com
|
|
|
|
|
|
|
|
|
| |
The format of the injection point names used by the AIO code does not
match the existing naming convention used everywhere else in the code,
so let's be consistent. These points are used in test_aio.
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/Z_yTB80bdu1sYDqJ@paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both pg_get_process_memory_contexts() and pg_backend_memory_contexts
have 1-based levels, whereas pg_log_backend_memory_contexts() was using
0-based levels. Align these.
This results in slightly saner behavior from MemoryContextStatsDetail()
in regards to the max_level. Previously it would stop at 1 level before
the maximum requested level rather than at that level.
Reported-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Author: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Author: David Rowley <drowleyml@gmail.com
Reviewed-by: Melih Mutlu <m.melihmutlu@gmail.com>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Discussion: https://postgr.es/m/395ea5d4fe190480efa95bf533485c70@oss.nttdata.com
|
|
|
|
|
|
|
|
|
|
| |
The "children" list won't be used until "got_children" has been set
true, but older compilers don't get that; about half a dozen
buildfarm animals are warning about this. Issue added by 11ff192b5.
While here, improve slightly-shaky grammar in comment.
Discussion: https://postgr.es/m/2057835.1744833309@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
| |
This gets rid of repetitive get_typlen calls in postquel_sub_params,
which show up as costing a few percent of the runtime in simple test
cases (more with more parameters).
In combination with the preceding patches, this gets us most of the
way back down to the amount of per-call overhead that functions.c
had before commit 0dca5d68d. There are some more things that could
be done, but this seems like an okay place to stop for v18.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At this point, the only data structures we allocate directly in
fcontext are the SQLFunctionCache struct itself, the ParamListInfo
struct, and the execution_state array, all of which are small and
perfectly capable of being re-used across executions of the same
FmgrInfo. Hence, let's give them the same lifespan as the FmgrInfo.
This step gets rid of the separate SQLFunctionLink struct and makes
fn_extra point to SQLFunctionCache again. We also get rid of the
separate fcontext memory context and allocate these items directly
in fn_mcxt.
For notational simplicity, SQLFunctionCache still has an fcontext
field, but it's just a copy of fn_mcxt.
The motivation for this is to allow these structures to live as
long as the FmgrInfo and be re-used across calls, restoring the
original design without its propensity for memory leaks. This
gets rid of some per-call overhead that we added in 0dca5d68d.
We also make an effort to re-use the JunkFilter and result slot.
Those might need to change if the function definition changes,
so we compromise by rebuilding them if the cached plan changes.
This also moves the tuplestore into fn_mcxt so that it can be
re-used across calls, again undoing a change made in 0dca5d68d.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Put the JunkFilter and its result slot (and thence also
some subsidiary data such as the result tupledesc) into a
separate subcontext "jfcontext". This doesn't accomplish
a lot at this point, because we make a new JunkFilter each
time through the SQL function. However, the plan is to make
the fcontext long-lived, and that raises the possibility
that we'll need a new JunkFilter because the plan for the
result-generating query changes. A separate context makes
it easy to free the obsoleted data when that happens.
Also, instead of always running the sub-executor in fcontext,
make a separate context for it if we're doing lazy eval of
a SRF, and otherwise just run it inside CurrentMemoryContext.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, much of this code ran with CurrentMemoryContext set
to be the function's fcontext, so that we tended to leak a lot of
stuff there. Commit 0dca5d68d dealt with that by releasing the
fcontext at the completion of each SQL function call, but we'd
like to go back to the previous approach of allowing the fcontext
to be query-lifespan. To control the leakage problem, rearrange
the code so that we mostly run in the memory context that fmgr_sql
is called in (which we expect to be short-lived). Notably, this
means that parsing/planning is all done in the short-lived context
and doesn't leak cruft into fcontext.
This patch also fixes the allocation of execution_state records
so that we don't leak them across executions. I set that up
with a re-usable array that contains at least as many
execution_state structs as we need for the current querytree.
The chain structure is still there, but it's not really doing
much for us, and maybe somebody will be motivated to get rid
of it. I'm not though.
This incidentally also moves the call of BlessTupleDesc to be
with the code that creates the JunkFilter. That doesn't make
much difference now, but a later patch will reduce the number
of times the JunkFilter gets made, and we needn't bless the
results any more often than that.
We still leak a fair amount in fcontext, particularly when
executing utility statements, but that's material for a
separate patch step; the point here is only to get rid of
unintentional allocations in fcontext.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Late in the development of commit 0dca5d68d, I added a step to copy
the result tlist we extract from the cached final query, because
I was afraid that that might not last as long as the JunkFilter that
we're passing it off to. However, that turns out to cost a noticeable
number of cycles, and it's really quite unnecessary because the
JunkFilter will not examine that tlist after it's been created.
(ExecFindJunkAttribute would use it, but we don't use that function
on this JunkFilter.) Hence, remove the copy step. For safety,
reset the might-become-dangling jf_targetList pointer to NIL.
In passing, remove DR_sqlfunction.cxt, which we don't use anymore;
it's confusing because it's not entirely clear which context it
ought to point at.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 0bada39c83a150079567a6e97b1a25a198f30ea3 fixed a bug of this kind,
which existed in all branches for six days before detection. While the
probability of reaching the trouble was low, the disruption was extreme. No
new backends could start, and service restoration needed an immediate
shutdown. Hence, add this to catch the next bug like it.
The new check in RelationIdGetRelation() suffices to make autovacuum detect
the bug in commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 that led to commit
0bada39. This also checks in a number of similar places. It replaces each
Assert(IsTransactionState()) that pertained to a conditional catalog read.
No back-patch for now, but a back-patch of commit 243e9b4 should back-patch
this, too. A back-patch could omit the src/test/regress changes, since back
branches won't gain new index columns.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/20250410191830.0e.nmisch@google.com
Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
|
|
|
|
|
|
| |
Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250412123430.8c.nmisch@google.com
|
|
|
|
|
|
| |
Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250412123430.8c.nmisch@google.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
estimate_multivariate_ndistinct() is coded to assume the caller handles
passing it a list of GroupVarInfos with unique 'var' fields over the
entire list. 6bb6a62f3 added code which didn't ensure this and that
could result in estimate_multivariate_ndistinct() erroring out with:
ERROR: corrupt MVNDistinct entry
This occurred because estimate_multivariate_ndistinct() first searches
for a set of stats that match to at least two of the given GroupVarInfos
and then later assumes that the MVNDistinctItem.items array of the
best matching stats will have an entry for those two columns. If the
GroupVarInfos List contained a duplicate entry then the same column could
be matched to twice and that could trick the code into thinking we have
>= 2 columns matched in cases where only a single distinct column has been
matched. This could result in a failure to find the correct
MVNDistinctItem in the stats as the array containing those never
contains an item for single columns.
Here we make it more clear that the function needs a distinct set of
GroupVarInfos and also tidy up a few other comments to make things a bit
easier to follow.
Author: David Rowley <drowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvocZCUhM9W9mJ39d6oQz7ePKoqFnao_347mvC-A7QatcQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
Buildfarm member drongo complained because the definitions of these
functions used "const Oid foo" where the forward declarations just
had "Oid foo". (I'm a bit surprised that drongo seems to be the only
complainant.) I chose to fix this by removing the "consts" because
(a) I'm generally not a fan of using const that way, and (b) it was
a minority usage even within these two functions, let alone compared
to the rest of our code base.
Oversight in commit eec0040c4, so no need for back-patch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were unnecessarily acquiring AccessExclusiveLock on all child tables
when "ALTER TABLE ONLY sometab ADD PRIMARY KEY" was run on their parent
table, an oversight in commit 14e87ffa5c54. This caused deadlocks
during pg_restore of partitioned tables.
The reason to acquire the AEL was that we need to verify that child
tables have the involved columns already marked as not-null; but if the
parent table has an inheritable not-null constraint, then all children
must necessarily be in the correct state already, so we can skip the
check, which avoids acquiring the lock. Reorder the code so that it
works that way. This doesn't change things in the case where the
constraint doesn't exist, but that case is of lesser importance because
it doesn't occur during parallel pg_restore.
While at it, reword some errmsg() and add errhint() to similar cases in
related but not adjacent code.
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/67469c1c-38bc-7d94-918a-67033f5dd731@gmx.net
Discussion: https://postgr.es/m/2045026.1743801143@sss.pgh.pa.us
Discussion: https://postgr.es/m/1280408.1744650810@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Memoize typically marks cache entries as complete after fully scanning
the inner side of a join. However, in the case of unique joins, we
skip to the next outer tuple as soon as the first matching inner tuple
is found, leaving no opportunity to scan the inner side to completion.
To work around that, we mark cache entries as complete after fetching
the first matching inner tuple in unique joins.
This approach is only safe when all of the join's restriction clauses
are parameterized; otherwise, there is no guarantee that reading just
one tuple from the inner side is sufficient.
Currently, we check for this by verifying that the number of clauses
in ppi_clauses is no less than the number of the join's restriction
clauses. However, this check isn't entirely reliable, as ppi_clauses
includes join clauses available from all outer rels, not just the
current outer rel. This means the check could pass even if a
restriction clause isn't parameterized, as long as another join
clause, which doesn't belong to the current join, is included in
ppi_clauses.
To fix this, we explicitly check whether each restriction clause of
the current join is present in ppi_clauses.
While we're here, remove the XXX comment from the modified code, as
it's not justified; in certain cases, it's not possible to move a join
clause to the inner side.
This is arguably a bugfix, but no backpatch given the lack of field
reports.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-8JPouj=wBDj4DhK-WO4+Xdx=A2jbjvvyyTBQneJ1=BQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a GENERATED column is declared to have a domain data type where
the domain's constraints disallow null values, INSERT commands failed
because we built a targetlist that included coercing a null constant
to the domain's type. The failure occurred even when the generated
value would have been perfectly OK. This is adjacent to the issues
fixed in 0da39aa76, but we didn't notice for lack of testing a domain
with such a constraint.
We aren't going to use the result of the targetlist entry for the
generated column --- ExecComputeStoredGenerated will overwrite it.
So it's not really necessary that it have the exact datatype of
the generated column. This patch fixes the problem by changing
the targetlist entry to be a null Const of the domain's base type,
which should be sufficiently legal. (We do have to tweak
ExecCheckPlanOutput to accept the situation, though.)
This has been broken since we implemented generated columns.
However, this patch only applies easily as far back as v14, partly
because I (tgl) only carried 0da39aa76 back that far, but mostly
because v14 significantly refactored the handling of INSERT/UPDATE
targetlists. Given the lack of field complaints and the short
remaining support lifetime of v13, I judge the cost-benefit ratio
not good for devising a version that would work in v13.
Reported-by: jian he <jian.universality@gmail.com>
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxG59tip2+9h=rEv-ykOFjt0cbsPVchhi0RTij8bABBA0Q@mail.gmail.com
Backpatch-through: 14
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 0f21db36d made an assumption that GIN triConsistentFns
would not modify their input entryRes[] arrays. But in fact,
the "shim" triConsistentFn that we use for opclasses that don't
supply their own did exactly that, potentially leading to wrong
answers from a GIN index search. Through bad luck, none of the
test cases that we have for such opclasses exposed the bug.
One response to this could be that the assumption of consistency check
functions not modifying entryRes[] arrays is a bad one, but it still
seems reasonable to me. Notably, shimTriConsistentFn is itself
assuming that with respect to the underlying boolean consistentFn,
so it's sure being self-centered in supposing that it gets to do so.
Fortunately, it's quite simple to fix shimTriConsistentFn to restore
the entry-time state of entryRes[], so let's do that instead.
This issue doesn't affect any core GIN opclasses, since they all
supply their own triConsistentFns. It does affect contrib modules
btree_gin, hstore, and intarray.
Along the way, I (tgl) noticed that shimTriConsistentFn failed to
pick up on a "recheck" flag returned by its first call to the boolean
consistentFn. This may be only a latent problem, since it would be
unlikely for a consistentFn to set recheck for the all-false case
and not any other cases. (Indeed, none of our contrib modules do
that.) Nonetheless, it's formally wrong.
Reported-by: Vinod Sridharan <vsridh90@gmail.com>
Author: Vinod Sridharan <vsridh90@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFMdLD7XzsXfi1+DpTqTgrD8XU0i2C99KuF=5VHLWjx4C1pkcg@mail.gmail.com
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
| |
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places. These
inconsistencies were all introduced during Postgres 18 development.
This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
|
|
|
|
|
|
|
|
|
| |
This fixes typos in docs and comments introduced during the v18
development cycle, to keep them from ending up in backbranches.
Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CA+COZaCgGua25f2hSrjrDLJcJJAHkwoKgTTqUy-wyL1=64JNjw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
synchronous_standby_names cannot be reloaded safely by backends, and the
checkpointer is in charge of updating a state in shared memory if the
GUC is enabled in WalSndCtl, to let the backends know if they should
wait or not for a given LSN. This provides a strict control on the
timing of the waiting queues if the GUC is enabled or disabled, then
reloaded. The checkpointer is also in charge of waking up the backends
that could be waiting for a LSN when the GUC is disabled.
This logic had a race condition at startup, where it would be possible
for backends to not wait for a LSN even if synchronous_standby_names is
enabled. This would cause visibility issues with transactions that we
should be waiting for but they were not. The problem lasts until the
checkpointer does its initial update of the shared memory state when it
loads synchronous_standby_names.
In order to take care of this problem, the shared memory state in
WalSndCtl is extended to detect if it has been initialized by the
checkpointer, and not only check if synchronous_standby_names is
defined. In WalSndCtlData, sync_standbys_defined is renamed to
sync_standbys_status, a bits8 able to know about two states:
- If the shared memory state has been initialized. This flag is set by
the checkpointer at startup once, and never removed.
- If synchronous_standby_names is known as defined in the shared memory
state. This is the same as the previous sync_standbys_defined in
WalSndCtl.
This method gives a way for backends to decide what they should do until
the shared memory area is initialized, and they now ultimately fall back
to a check on the GUC value in this case, which is the best thing that
can be done.
Fortunately, SyncRepUpdateSyncStandbysDefined() is called immediately by
the checkpointer when this process starts, so the window is very narrow.
It is possible to enlarge the problematic window by making the
checkpointer wait at the beginning of SyncRepUpdateSyncStandbysDefined()
with a hardcoded sleep for example, and doing so has showed that a 2PC
visibility test is indeed failing. On machines slow enough, this bug
would cause spurious failures.
In 17~, we have looked at the possibility of adding an injection point
to have a reproducible test, but as the problematic window happens at
early startup, we would need to invent a way to make an injection point
optionally persistent across restarts when attached, something that
would be fine for this case as it would involve the checkpointer. This
issue is quite old, and can be reproduced on all the stable branches.
Author: Melnikov Maksim <m.melnikov@postgrespro.ru>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ru
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sami complained that there's a discrepancy between n_mod_since_analyze
and n_ins_since_vacuum, as the former only accounts for committed changes
and the latter tracks committed and aborted inserts. Nobody seemed
overly concerned that this would cause any concerning issues. The
repercussions, from what I can tell, are limited to causing an
autovacuum to trigger for inserts sooner than it otherwise might. For
typical ratios of commits to aborts, it's unlikely to ever be noticed.
Fixing things to make it so n_ins_since_vacuum only displays committed
inserts would require an additional field in PgStat_TableCounts, which
does not quite seem worthwhile at this stage. This commit just adds a
comment with some details to mention that we know about it, which will
hopefully prevent repeat discussions.
Reported-by: Sami Imseih <samimseih@gmail.com>
Author: David Rowley <drowleyml@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAApHDvpgV3a-R2EGmPOh0L-x3pHbZpM3y4dySWfy+UqUazwDQA@mail.gmail.com
|
|
|
|
|
|
|
|
|
| |
Similar to 8461424fd, here we adjust a few new locations which were not
using the most suitable appendStringInfo* function for the intended
purpose.
Author: David Rowley <drowleyml@gmail.com
Discussion: https://postgr.es/m/CAApHDvqJnNjueb=Eoj8K+8n0g7nj_AcPWSiCj5RNV4fDejAfqA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
The global variable backing the DSA area for Memory Context stats
reporting had a too generic name, rename to be more descriptive.
Independently reported by Peter and Laurenz.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Peter Eisentraut <peter@eisentraut.org>
Reported-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/d51172bd4e7f4b07a18a0288ca1b1c28a71a5f6a.camel@cybertec.at
Discussion: https://postgr.es/m/25095db5-b595-4b85-9100-d358907c25b5@eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
| |
By inspection, ip_addrsize() can't return a negative result.
(If it could, we'd have way bigger problems elsewhere.)
So delete useless check in network_send(). Most C compilers
are probably perfectly capable of removing this code by
themselves, but it's confusing/misleading.
Bug: #18889
Reported-by: Daniel Elishakov <dan-eli@mail.ru>
Discussion: https://postgr.es/m/18889-73d4f19e953a629e@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Data loss can happen when the DDLs like ALTER PUBLICATION ... ADD TABLE ...
or ALTER TYPE ... that don't take a strong lock on table happens
concurrently to DMLs on the tables involved in the DDL. This happens
because logical decoding doesn't distribute invalidations to concurrent
transactions and those transactions use stale cache data to decode the
changes. The problem becomes bigger because we keep using the stale cache
even after those in-progress transactions are finished and skip the
changes required to be sent to the client.
This commit fixes the issue by distributing invalidation messages from
catalog-modifying transactions to all concurrent in-progress transactions.
This allows the necessary rebuild of the catalog cache when decoding new
changes after concurrent DDL.
We observed performance regression primarily during frequent execution of
*publication DDL* statements that modify the published tables. The
regression is minor or nearly nonexistent for DDLs that do not affect the
published tables or occur infrequently, making this a worthwhile cost to
resolve a longstanding data loss issue.
An alternative approach considered was to take a strong lock on each
affected table during publication modification. However, this would only
address issues related to publication DDLs (but not the ALTER TYPE ...)
and require locking every relation in the database for publications
created as FOR ALL TABLES, which is impractical.
The bug exists in all supported branches, but we are backpatching till 14.
The fix for 13 requires somewhat bigger changes than this fix, so the fix
for that branch is still under discussion.
Reported-by: hubert depesz lubaczewski <depesz@depesz.com>
Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Tested-by: Benoit Lobréau <benoit.lobreau@dalibo.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com
Discussion: https://postgr.es/m/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
d69d45a5a changed how em_is_child members are stored in
EquivalenceClasses. Children are no longer stored in the ec_members
list. optimizer/README mentioned that most operations "should ignore
child members", but that felt a little untrue now since child members
are now stored in a separate place, they simply won't be found by the
normal means of looking (a foreach loop over ec_members), and if you don't
find them, there's technically no need to "ignore" them.
Here we tweak the wording slightly to reflect the new storage location
for child members.
Reported-by: Amit Langote <amitlangote09@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqE8v=EuAP_3F_A2xn8zWx+nG_etW_Fe_DvKO-Fkx=+DdQ@mail.gmail.com
|