| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Writing a trailing semicolon in a macro is almost never the right thing,
because you almost always want to write a semicolon after each macro
call instead. (Even if there was some reason to prefer not to, pgindent
would probably make a hash of code formatted that way; so within PG the
rule should basically be "don't do it".) Thus, if we have a semi inside
the macro, the compiler sees "something;;". Much of the time the extra
empty statement is harmless, but it could lead to mysterious syntax
errors at call sites. In perhaps an overabundance of neatnik-ism, let's
run around and get rid of the excess semicolons whereever possible.
The only thing worse than a mysterious syntax error is a mysterious
syntax error that only happens in the back branches; therefore,
backpatch these changes where relevant, which is most of them because
most of these mistakes are old. (The lack of reported problems shows
that this is largely a hypothetical issue, but still, it could bite
us in some future patch.)
John Naylor and Tom Lane
Discussion: https://postgr.es/m/CACPNZCs0qWTqJ2QUSGJ07B7uvAvzMb-KbG2q+oo+J3tsWN5cqw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On further reflection, code comments added by commit b0229f26 slightly
misrepresented how we determine the oldest bpto.xact for the index.
btvacuumpage() does not treat the bpto.xact of a page that it put in the
FSM as a candidate to be the oldest deleted page (the delete-marked page
that has the oldest bpto.xact XID among all pages encountered).
The definition of a deleted page for the purposes of the bpto.xact
calculation is different from the definition used by the bulk delete
statistics. The bulk delete statistics don't distinguish between pages
that were deleted by the current VACUUM, pages deleted by a previous
VACUUM operation but not yet recyclable/reusable, and pages that are
reusable (though reusable pages are counted separately).
Backpatch: 11-, just like commit b0229f26.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The logic for determining how many nbtree pages in an index are deleted
pages sometimes undercounted pages. Pages that were deleted by the
current VACUUM operation (as opposed to some previous VACUUM operation
whose deleted pages have yet to be reused) were sometimes overlooked.
The final count is exposed to users through VACUUM VERBOSE's "%u index
pages have been deleted" output.
btvacuumpage() avoided double-counting when _bt_pagedel() deleted more
than one page by assuming that only one page was deleted, and that the
additional deleted pages would get picked up during a future call to
btvacuumpage() by the same VACUUM operation. _bt_pagedel() can
legitimately delete pages that the btvacuumscan() scan will not visit
again, though, so that assumption was slightly faulty.
Fix the accounting by teaching _bt_pagedel() about its caller's
requirements. It now only reports on pages that it knows btvacuumscan()
won't visit again (including the current btvacuumpage() page), so
everything works out in the end.
This bug has been around forever. Only backpatch to v11, though, to
keep _bt_pagedel() is sync on the branches that have today's bugfix
commit b0229f26da. Note that this commit changes the signature of
_bt_pagedel(), just like commit b0229f26da.
Author: Peter Geoghegan
Reviewed-By: Masahiko Sawada
Discussion: https://postgr.es/m/CAH2-WzkrXBcMQWAYUJMFTTvzx_r4q=pYSjDe07JnUXhe+OZnJA@mail.gmail.com
Backpatch: 11-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 857f9c36cda (which taught nbtree VACUUM to skip a scan of the
index from btcleanup in situations where it doesn't seem worth it) made
VACUUM maintain the oldest btpo.xact among all deleted pages for the
index as a whole. It failed to handle all the details surrounding pages
that are deleted by the current VACUUM operation correctly (though pages
deleted by some previous VACUUM operation were processed correctly).
The most immediate problem was that the special area of the page was
examined without a buffer pin at one point. More fundamentally, the
handling failed to account for the full range of _bt_pagedel()
behaviors. For example, _bt_pagedel() sometimes deletes internal pages
in passing, as part of deleting an entire subtree with btvacuumpage()
caller's page as the leaf level page. The original leaf page passed to
_bt_pagedel() might not be the page that it deletes first in cases where
deletion can take place.
It's unclear how disruptive this bug may have been, or what symptoms
users might want to look out for. The issue was spotted during
unrelated code review.
To fix, push down the logic for maintaining the oldest btpo.xact to
_bt_pagedel(). btvacuumpage() is now responsible for pages that were
fully deleted by a previous VACUUM operation, while _bt_pagedel() is now
responsible for pages that were deleted by the current VACUUM operation
(this includes half-dead pages from a previous interrupted VACUUM
operation that become fully deleted in _bt_pagedel()). Note that
_bt_pagedel() should never encounter an existing deleted page.
This commit theoretically breaks the ABI of a stable release by changing
the signature of _bt_pagedel(). However, if any third party extension
is actually affected by this, then it must already be completely broken
(since there are numerous assumptions made in _bt_pagedel() that cannot
be met outside of VACUUM). It seems highly unlikely that such an
extension actually exists, in any case.
Author: Peter Geoghegan
Reviewed-By: Masahiko Sawada
Discussion: https://postgr.es/m/CAH2-WzkrXBcMQWAYUJMFTTvzx_r4q=pYSjDe07JnUXhe+OZnJA@mail.gmail.com
Backpatch: 11-, where the "skip full scan" feature was introduced.
|
|
|
|
|
|
|
|
|
|
|
| |
XLogRegisterBufData() does not copy data pointed to by caller's pointer
argument.
Oversight in commit 0d861bbb702.
Author: Peter Eisentraut
Reported-By: Peter Eisentraut
Discussion: https://postgr.es/m/21800dbe-a13e-22f7-d423-b81db9d249f5@2ndquadrant.com
|
|
|
|
| |
see also a00c53b0cb
|
|
|
|
|
|
| |
_bt_getbuf() cannot return an invalid buffer.
Oversight in commit 2ed5b87f96d.
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 79dfa8a has introduced a check to catch when the minimum protocol
version was set higher than the maximum version, however an error was
getting generated when both bounds are set even if they are able to
work, causing a backend to not use a new SSL context but keep the old
one.
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/14BFD060-8C9D-43B4-897D-D5D9AA6FC92B@yesql.se
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Checkpointer uses its MyLatch to wake up when a checkpoint request is
received. But before commit c6550776394e the latch was not used for
anything else, so the code could just go to sleep after each loop
without rechecking the sleeping condition. That commit added a separate
ResetLatch in its code path[1], which can cause a checkpoint to go
unnoticed for potentially a long time.
Fix by skipping sleep if any checkpoint flags are set. Also add a test
to verify this; authored by Kyotaro Horiguchi.
[1] CreateCheckPoint -> InvalidateObsoleteReplicationSlots ->
ConditionVariableTimeSleep
Report and diagnosis by Kyotaro Horiguchi.
Co-authored-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200408.141956.891237856186513376.horikyota.ntt@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lack of these checks could cause visible misbehavior, including
assertion failures. This was missed in commit c6550776394e, whereby
restart_lsn becomes invalid when the size limit is exceeded.
Also reword some existing error messages, and add errdetail(), so that
the reported errors all match in spirit.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200408.093710.447591748588426656.horikyota.ntt@gmail.com
|
|
|
|
|
| |
Some translatable strings have been moved to scanner_yyerror(), so we
need to add that, too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pg_statio_all_tables view appears to have a wrong grouping. As the result
numbers of toast index blocks read and hit were multiplied to the number of
table indexes. This commit fixes the view definition.
Backpatching this appears difficult. We don't have a mechanism to patch a
system catalog of existing instances in minor upgrade. We can write a
release notes instruction to do this manually. But per discussion this is
probably not so critical bug for doing such an intrusive fix.
Reported-by: Andrei Zubkov
Discussion: https://postgr.es/m/CAPpHfdtMYkkNudLMG9G0dxX_B%3Dn5sfKzOyxxrvWYtSicaGW0Lw%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Queries such as '!(foo<->bar)' failed to find matching rows when
implemented as a GiST or GIN index search. That's because of
failing to handle phrase searches as tri-valued when considering
a query without any position information for the target tsvector.
We can only say that the phrase operator might match, not that it
does match; and therefore its NOT also might match. The previous
coding incorrectly inverted the approximate phrase result to
decide that there was certainly no match.
To fix, we need to make TS_phrase_execute return a real ternary result,
and then bubble that up accurately in TS_execute. As long as we have
to do that anyway, we can simplify the baroque things TS_phrase_execute
was doing internally to manage tri-valued searching with only a bool
as explicit result.
For now, I left the externally-visible result of TS_execute as a plain
bool. There do not appear to be any outside callers that need to
distinguish a three-way result, given that they passed in a flag
saying what to do in the absence of position data. This might need
to change someday, but we wouldn't want to back-patch such a change.
Although tsginidx.c has its own TS_execute_ternary implementation for
use at upper index levels, that sadly managed to get this case wrong
as well :-(. Fixing it is a lot easier fortunately.
Per bug #16388 from Charles Offenbacher. Back-patch to 9.6 where
phrase search was introduced.
Discussion: https://postgr.es/m/16388-98cffba38d0b7e6e@postgresql.org
|
|
|
|
|
| |
Author: Justin Pryzby
Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
|
|
|
|
| |
from 303640199d0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid accessing the leaf page's top parent tuple without a buffer lock
held during the second phase of nbtree page deletion. The old approach
was safe, though only because VACUUM never drops its buffer pin (and
because only VACUUM itself can modify a half-dead page). Even still, it
seems like a good idea to be strict here. Tighten things up by copying
the top parent page's block number to a local variable before releasing
the buffer lock on the leaf page -- not after.
This is a follow-up to commit fa7ff642, which fixed a similar issue in
the first phase of nbtree page deletion.
Update some related comments in passing.
Discussion: https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid accessing the deletion target page's special area during nbtree
page deletion at a point where there is no buffer lock held. This issue
was detected by a patch that extends Valgrind's memcheck tool to mark
nbtree pages that are unsafe to access (due to not having a buffer lock
or buffer pin) as NOACCESS.
We do hold a buffer pin at this point, and only access the special area,
so the old approach was safe. Even still, it seems like a good idea to
tighten up the rules in this area. There is no reason to not simply
insist on always holding a buffer lock (not just a pin) when accessing
nbtree pages.
Update some related comments in passing.
Discussion: https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write
< sentPtr. When the latest physical LSN yields no logical replication
messages (a common case), that keepalive elicits a reply. Processing
the reply updates pg_stat_replication.replay_lsn. WalSndLoop() lacks
that; when WalSndLoop() slept, replay_lsn advancement could stall until
wal_receiver_status_interval elapsed. This sometimes stalled
src/test/subscription/t/001_rep_changes.pl for up to 10s.
Reviewed by Fujii Masao and Michael Paquier.
Discussion: https://postgr.es/m/20200418070142.GA1075445@rfd.leadboat.com
|
|
|
|
|
|
|
| |
This reverts commit 421685812290406daea58b78dfab0346eb683bbb. It caused
idle physical walsenders to busy-wait, as reported by Fujii Masao.
Discussion: https://postgr.es/m/20200417054146.GA1061007@rfd.leadboat.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CreateRole() was passing a Value node, not a RoleSpec node, for the
newly-created role name when adding the role as a member of existing
roles for the IN ROLE syntax.
This mistake went unnoticed because the node in question is used only
for error messages and is not accessed on non-error paths.
In older pg versions (such as 9.5 where this was found), this results
in an "unexpected node type" error in place of the real error. That
node type check was removed at some point, after which the code would
accidentally fail to fail on 64-bit platforms (on which accessing the
Value node as if it were a RoleSpec would be mostly harmless) or give
an "unexpected role type" error on 32-bit platforms.
Fix the code to pass the correct node type, and add an lfirst_node
assertion just in case.
Per report on irc from user m1chelangelo.
Backpatch all the way, because this error has been around for a long
time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 32ff26911 introduced use of rank() into the triggers view to
calculate the spec-mandated action_order column. As written, this
prevents query constraints on the table-name column from being pushed
below the window aggregate step. That's bad for performance of this
typical usage pattern, since the view now has to be evaluated for all
tables not just the one(s) the user wants to see. It's also the cause
of some recent buildfarm failures, in which trying to evaluate the view
rows for triggers in process of being dropped resulted in "cache lookup
failed for function NNN" errors. Those rows aren't of interest to the
test script doing the query, but the filter that would eliminate them
is being applied too late. None of this happened before the rank()
call was there, so it's a regression compared to v10 and before.
We can improve matters by changing the rank() call so that instead of
partitioning by OIDs, it partitions by nspname and relname, casting
those to sql_identifier so that they match the respective view output
columns exactly. The planner has enough intelligence to know that
constraints on partitioning columns are safe to push down, so this
eliminates the performance problem and the regression test failure
risk. We could make the other partitioning columns match view outputs
as well, but it'd be more complicated and the performance benefits
are questionable.
Side note: as this stands, the planner will push down constraints on
event_object_table and trigger_schema, but not on event_object_schema,
because it checks for ressortgroupref matches not expression
equivalence. That might be worth improving someday, but it's not
necessary to fix the immediate concern.
Back-patch to v11 where the rank() call was added. Ordinarily we'd not
change information_schema in released branches, but the test failure has
been seen in v12 and presumably could happen in v11 as well, so we need
to do this to keep the buildfarm happy. The change is harmless so far
as users are concerned. Some might wish to apply it to existing
installations if performance of this type of query is of concern,
but those who don't are no worse off.
I bumped catversion in HEAD as a pro forma matter (there's no
catalog incompatibility that would really require a re-initdb).
Obviously that can't be done in the back branches.
Discussion: https://postgr.es/m/5891.1587594470@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
78ea8b5 has fixed an issue related to the recycling of WAL segments on
standbys depending on archive_mode. However, it has introduced a
regression with the handling of WAL segments ready to be archived during
crash recovery, causing those files to be recycled without getting
archived.
This commit fixes the regression by tracking in shared memory if a live
cluster is either in crash recovery or archive recovery as the handling
of WAL segments ready to be archived is different in both cases (those
WAL segments should not be removed during crash recovery), and by using
this new shared memory state to decide if a segment can be recycled or
not. Previously, it was not possible to know if a cluster was in crash
recovery or archive recovery as the shared state was able to track only
if recovery was happening or not, leading to the problem.
A set of TAP tests is added to close the gap here, making sure that WAL
segments ready to be archived are correctly handled when a cluster is in
archive or crash recovery with archive_mode set to "on" or "always", for
both standby and primary.
Reported-by: Benoît Lobréau
Author: Jehan-Guillaume de Rorthais
Reviewed-by: Kyotaro Horiguchi, Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/20200331172229.40ee00dc@firost
Backpatch-through: 9.5
|
|
|
|
|
|
|
|
|
|
| |
In the footsteps of aaf069aa3, remove ACLDEBUG, which was the only
other remaining undocumented symbol in pg_config_manual.h. The fact
that nobody had bothered to document it in seventeen years is a good
clue to its usefulness. In practice, none of the tracing logic it
enabled would be of any value without additional effort.
Discussion: https://postgr.es/m/6631.1587565046@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
| |
Nobody really uses this stuff, especially not since we created
valgrind-based infrastructure that does the same thing better.
It is thus unsurprising that the generation.c and slab.c versions
were actually broken. Rather than fix 'em, let's just remove 'em.
Alexander Lakhin
Discussion: https://postgr.es/m/8936216c-3492-3f6e-634b-d638fddc5f91@gmail.com
|
|
|
|
|
|
|
|
|
|
| |
Function names declared "extern" now use BackupManifest in the name
rather than just Manifest, and data types use backup_manifest
rather than just manifest.
Per note from Michael Paquier.
Discussion: http://postgr.es/m/20200418125713.GG350229@paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a9c35cf85ca I changed ExecMakeTableFunctionResult() to dynamically
allocate the FunctionCallInfo used to call the SRF. Unfortunately I
did not account for the fact that the surrounding memory context has
query lifetime, leading to a leak till the end of the query.
In most cases the leak is fairly inconsequential, but if the
FunctionScan is done many times in the query, the leak can add
up. This happens e.g. if the function scan is on the inner side of a
nested loop, due to a lateral join.
EXPLAIN SELECT sum(f) FROM generate_series(1, 100000000) g(i), generate_series(i, i+1) f;
quickly shows the leak.
Instead of explicitly freeing the FunctionCallInfo it seems better to
make sure all the per-set temporary state in
ExecMakeTableFunctionResult() is cleaned up wholesale. Currently
that's probably just the FunctionCallInfo allocation, but since
there's some initialization work, and since there's already an
appropriate context, this seems like a more robust approach.
Bug: #16112
Reported-By: Ben Cornett
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/16112-4448bbf55a404189%40postgresql.org
Backpatch: 12, a9c35cf85ca
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When estimating the number of pre-sorted groups in cost_incremental_sort
we must not pass Vars with varno 0 to estimate_num_groups, which would
cause failues in find_base_rel. This may happen when sorting output of
set operations, thanks to generate_append_tlist.
Unlike recurse_set_operations we can't easily access the original target
list, so if we find any Vars with varno 0, we fall back to the default
estimate DEFAULT_NUM_DISTINCT.
Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20200411214639.GK2228%40telsasoft.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
This Assert was trying to ensure that the number of columns in the foreign
key being cloned was the same number of attributes in the parentRel. Of
course, it's perfectly valid to have columns in the table which are not
part of the foreign key constraint. It appears that this Assert was
misunderstanding that.
Reported-by: Rajkumar Raghuwanshi
Reviewed-by: amul sul
Discussion: https://postgr.es/m/CAKcux6=z1dtiWw5BOpqDx-U6KTiq+zD0Y2m810zUtWL+giVXWA@mail.gmail.com
|
|
|
|
|
|
|
|
|
| |
This has been broken since PostgreSQL 12 and was probably never really
used. PostgreSQL 12 added an analogous HEAPAMSLOTDEBUGALL, which
still works right now, but it's also not very useful, so remove that
as well.
Discussion: https://www.postgresql.org/message-id/flat/645c0646-4218-d4c3-409a-a7003a0c108d%402ndquadrant.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
index.c supposed that it could just use a PG_TRY block to clean up the
state associated with an active REINDEX operation. However, that code
doesn't run if we do a FATAL exit --- for example, due to a SIGTERM
shutdown signal --- while the REINDEX is happening. And that state does
get consulted during catalog accesses, which makes it problematic if we
do any catalog accesses during shutdown --- for example, to clean up any
temp tables created in the session.
If this combination of circumstances occurred, we could find ourselves
trying to access already-freed memory. In debug builds that'd fairly
reliably cause an assertion failure. In production we might often
get away with it, but with some bad luck it could cause a core dump.
Another possible bad outcome is an erroneous conclusion that an
index-to-be-accessed is being reindexed; but it looks like that would
be unlikely to have any consequences worse than failing to drop temp
tables right away. (They'd still get dropped by the next session that
uses that temp schema.)
To fix, get rid of the use of PG_TRY here, and instead hook into
the transaction abort mechanisms to clean up reindex state.
Per bug #16378 from Alexander Lakhin. This has been wrong for a
very long time, so back-patch to all supported branches.
Discussion: https://postgr.es/m/16378-7a70ca41b3ec2009@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Working on commit 1c455078b led me to check through FunctionCallInvoke
call sites to see if every one was being honest about (a) making sure
that fcinfo.isnull is initially false, and (b) checking its state after
the call. Sure enough, I found some violations.
The main one is that finalize_partialaggregate re-used serialfn_fcinfo
without resetting isnull, even though it clearly intends to cater for
serialfns that return NULL. There would only be an issue with a
non-strict serialfn, since it's unlikely that a serialfn would return
NULL for non-null input. We have no non-strict serialfns in core, and
there may be none in the wild either, which would account for the lack
of complaints. Still, it's clearly wrong, so back-patch that fix to
9.6 where finalize_partialaggregate was introduced.
Also, arrayfuncs.c and rowtypes.c contained various callers that were
not bothering to check for result nulls. While what's being called is
a comparison or hash function that probably *shouldn't* return null,
that's a lousy excuse for not having any check at all. There are
existing places that just Assert(!fcinfo->isnull) in comparable
situations, so I added that to the places that were calling btree
comparison or hash support functions. In the places calling
boolean-returning equality functions, it's quite cheap to have them
treat isnull as FALSE, so make those places do that. Also remove some
"locfcinfo->isnull = false" assignments that are unnecessary given the
assumption that no previous call returned null. These changes seem like
mostly neatnik-ism or debugging support, so I didn't back-patch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a partition is detached, any triggers that had been cloned from its
parent were not properly disentangled from its parent triggers.
This resulted in triggers that could not be dropped because they
depended on the trigger in the trigger in the no-longer-parent table:
ALTER TABLE t DETACH PARTITION t1;
DROP TRIGGER trig ON t1;
ERROR: cannot drop trigger trig on table t1 because trigger trig on table t requires it
HINT: You can drop trigger trig on table t instead.
Moreover the table can no longer be re-attached to its parent, because
the trigger name is already taken:
ALTER TABLE t ATTACH PARTITION t1 FOR VALUES FROM (1)TO(2);
ERROR: trigger "trig" for relation "t1" already exists
The former is a bug introduced in commit 86f575948c77. (The latter is
not necessarily a bug, but it makes the bug more uncomfortable.)
To avoid the complexity that would be needed to tell whether the trigger
has a local definition that has to be merged with the one coming from
the parent table, establish the behavior that the trigger is removed
when the table is detached.
Backpatch to pg11.
Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/20200408152412.GZ2228@telsasoft.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 0d861bbb, which introduced deduplication to nbtree, added some
logic to take large posting list tuples into account when choosing a
split point. We subtract firstright posting list overhead from the
projected new high key size when calculating leftfree/rightfree values
for an affected candidate split point. Posting list tuples aren't
special to nbtsplitloc.c, but taking them into account like this makes a
huge difference in practice. Posting list tuples are frequently tuple
size outliers.
However, commit 0d861bbb missed a closely related issue: split interval
itself is calculated based on the assumption that tuples on the page
being split are roughly equisized. That assumption was acceptable back
when commit fab25024 taught the logic for choosing a split point about
suffix truncation, but it's pretty questionable now that very large
tuple sizes are common. This oversight led to unbalanced page splits in
low cardinality multi-column indexes when deduplication was used: page
splits that don't give sufficient weight to how unbalanced the split is
when the interval happens to include some large posting list tuples (and
when most other tuples on the page are not so large).
Nail this down by calculating an initial split interval in a way that's
attuned to the actual cost that we want to keep under control (not a
fuzzy proxy for the cost): apply a leftfree + rightfree evenness test to
each candidate split point that actually gets included in the split
interval (for the default strategy). This replaces logic that used a
percentage of all legal split points for the page as the basis of the
initial split interval.
Discussion: https://postgr.es/m/CAH2-WznJt5aT2uUB2Bs+JBLdwe0XTX67+xeLFcaNvCKxO=QBVQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although selfuncs.c will never call a target operator with null inputs,
some functions might return null anyway. The existing coding will fail
if that happens (since FunctionCall2Coll will punt), which seems
undesirable given that matchingsel() has such a broad range of potential
applicability --- in fact, we already have a problem because we apply it
to jsonb_path_exists_opr, which can return null. Hence, rejigger the
underlying functions mcv_selectivity and histogram_selectivity to cope,
treating a null result as false.
While we are at it, we can move the InitFunctionCallInfoData overhead
out of the inner loops, which isn't a huge number of cycles but might
save something considering we are likely calling functions as cheap
as int4eq(). Plus, the number of loop cycles to be expected is much
more than it was when this code was written, since typical settings
of default_statistics_target are higher.
In view of that consideration, let's apply the same change to
var_eq_const, eqjoinsel_inner, and eqjoinsel_semi. We do not expect
equality functions to ever return null for non-null inputs (and
certainly that code has been that way a long time without complaints),
but the cycle savings seem attractive, especially in the eqjoinsel loops
where there's potentially an O(N^2) savings.
Similar code exists in ineq_histogram_selectivity and
get_variable_range, but I forebore from changing those for now.
The performance argument for changing ineq_histogram_selectivity
is really weak anyway, since that will only iterate log2(N) times.
Nikita Glukhov and Tom Lane
Discussion: https://postgr.es/m/9d3b0959-95d6-c37e-2c0b-287bcfe5c705@postgrespro.ru
|
|
|
|
|
|
|
| |
"operator" is a reserved word in C++, so per project conventions,
don't use it as an identifier in header files.
My oversight in commit a80818605.
|
|
|
|
|
|
|
|
| |
basebackup.c is already a pretty big and complicated file, so it
makes more sense to keep the backup manifest support routines
in a separate file, for clarity and ease of maintenance.
Discussion: http://postgr.es/m/CA+TgmoavRak5OdP76P8eJExDYhPEKWjMb0sxW7dF01dWFgE=uA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit f2fcad27d59c (9.6 era) added the ability to mark objects as
dependent an extension, but forgot to add a way for such dependencies to
be removed. This commit fixes that oversight.
Strictly speaking this should be backpatched to 9.6, but due to lack of
demand we're not doing so at this time.
Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql
Reviewed-by: ahsan hadi <ahsan.hadi@gmail.com>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The views pg_stat_progress_* had not gotten the memo that
pg_read_all_stats is supposed to be able to read all statistics. Also
make a pass over all text-returning pg_stat_xyz functions that could
return "insufficient privilege" and make sure they also respect
pg_read_all_status.
Reported-by: Andrey M. Borodin
Reviewed-by: Andrey M. Borodin, Kyotaro Horiguchi
Discussion: https://postgr.es/m/13145F2F-8458-4977-9D2D-7B2E862E5722@yandex-team.ru
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have repeatedly seen the buildfarm reach the Assert(false) in
SyncRepGetSyncStandbysPriority. This apparently is due to failing to
consider the possibility that the sync_standby_priority values in
shared memory might be inconsistent; but they will be whenever only
some of the walsenders have updated their values after a change in
the synchronous_standby_names setting. That function is vastly too
complex for what it does, anyway, so rewriting it seems better than
trying to apply a band-aid fix.
Furthermore, the API of SyncRepGetSyncStandbys is broken by design:
it returns a list of WalSnd array indexes, but there is nothing
guaranteeing that the contents of the WalSnd array remain stable.
Thus, if some walsender exits and then a new walsender process
takes over that WalSnd array slot, a caller might make use of
WAL position data that it should not, potentially leading to
incorrect decisions about whether to release transactions that
are waiting for synchronous commit.
To fix, replace SyncRepGetSyncStandbys with a new function
SyncRepGetCandidateStandbys that copies all the required data
from shared memory while holding the relevant mutexes. If the
associated walsender process then exits, this data is still safe to
make release decisions with, since we know that that much WAL *was*
sent to a valid standby server. This incidentally means that we no
longer need to treat sync_standby_priority as protected by the
SyncRepLock rather than the per-walsender mutex.
SyncRepGetSyncStandbys is no longer used by the core code, so remove
it entirely in HEAD. However, it seems possible that external code is
relying on that function, so do not remove it from the back branches.
Instead, just remove the known-incorrect Assert. When the bug occurs,
the function will return a too-short list, which callers should treat
as meaning there are not enough sync standbys, which seems like a
reasonably safe fallback until the inconsistent state is resolved.
Moreover it's bug-compatible with what has been happening in non-assert
builds. We cannot do anything about the walsender-replacement race
condition without an API/ABI break.
The bogus assertion exists back to 9.6, but 9.6 is sufficiently
different from the later branches that the patch doesn't apply at all.
I chose to just remove the bogus assertion in 9.6, feeling that the
probability of a bad outcome from the walsender-replacement race
condition is too low to justify rewriting the whole patch for 9.6.
Discussion: https://postgr.es/m/21519.1585272409@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some corner cases, this could also lead to corrupted values being
included in the tuple.
Users who are concerned that they are affected by this should first
upgrade and then perform a base backup of their database and restore onto
an off-line server. They should then query each table with generated
columns to ensure there are no rows where the generated expression does
not match a newly calculated version of the GENERATED ALWAYS expression.
If no crashes occur and no rows are returned then you're not affected.
Fixes bug #16369.
Reported-by: Cameron Ezell
Discussion: https://postgr.es/m/16369-5845a6f1bef59884@postgresql.org
Backpatch-through: 12 (where GENERATED ALWAYS columns were added.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
recordExtObjInitPriv and removeExtObjInitPriv were sloppy about
calling ReleaseSysCache. The cases cannot occur given current usage
in ALTER EXTENSION ADD/DROP, since we wouldn't get here for these
relkinds; but it seems wise to clean up better.
In passing, extend test logic in test_pg_dump to exercise the
dropped-column code paths here.
Since the case is unreachable at present, there seems no great
need to back-patch; hence fix HEAD only.
Kyotaro Horiguchi, with test case and comment adjustments by me
Discussion: https://postgr.es/m/20200417.151831.1153577605111650154.horikyota.ntt@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was previously thought that remove_useless_groupby_columns() needed to
keep track of which constraints the generated plan depended upon, however,
this is unnecessary. The confusion likely arose regarding this because of
check_functional_grouping(), which does need to track the dependency to
ensure VIEWs with columns which are functionally dependant on the GROUP BY
remain so. For remove_useless_groupby_columns(), cached plans will just
become invalidated when the primary key's underlying index is removed
through the normal relcache invalidation code.
Here we just remove the unneeded code which records the dependency and
updates the comments. The previous comments claimed that we could not use
UNIQUE constraints for the same optimization due to lack of a
pg_constraint record for NOT NULL constraints (which are required because
NULLs can be duplicated in a unique index). Since we don't actually need a
pg_constraint record to handle the invalidation, it looks like we could
add code to do this in the future. But not today.
We're not really fixing any bug in the code here, this fix is just to set
the record straight on UNIQUE constraints. This code was added back in
9.6, but due to lack of any bug, we'll not be backpatching this.
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAApHDvrdYa=VhOoMe4ZZjZ-G4ALnD-xuAeUNCRTL+PYMVN8OnQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Earlier we were inconsistent in allowing the usage of parallel and
full options. Change it such that we disallow them only when they are
combined in a way that we don't support.
In passing, improve the comments in some of the existing tests of parallel
vacuum.
Reported-by: Tushar Ahuja
Author: Justin Pryzby, Amit Kapila
Reviewed-by: Sawada Masahiko, Michael Paquier, Mahendra Singh Thalor and
Amit Kapila
Discussion: https://postgr.es/m/58c8d171-e665-6fa3-a9d3-d9423b694dae%40enterprisedb.com
|
|
|
|
|
| |
Spotted during post-commit review of the nbtree deduplication commit
(commit 0d861bbb).
|
|
|
|
|
|
|
|
|
| |
A comment from the Berkeley days incorrectly claimed that the page
management code cares about the contents of the hole in the center of
the page (at least in the case of the left half of an nbtree page
split). Commit 8fa30f906be added an addendum that stated that the
original comment was "probably obsolete". It's definitely obsolete,
though, so remove the original comment plus the addendum.
|
|
|
|
|
|
|
| |
Commit 913bbd88d overlooked that the result of coerce_to_target_type
might need collation fixups. Per report from Andreas Joseph Krogh.
Discussion: https://postgr.es/m/VisenaEmail.72.37d08ec2b8cb8fb5.17179940cd3@tc7-visena
|
|
|
|
|
|
|
| |
Back in commits 1df92eeafe, f884a96819, and 592123efbb I used some
hackish code to set the script search path, unaware despite decades of
perl that there was a completely standard way to do this. This patch
changes those cases to use the standard perl FindBin package.
|
|
|
|
|
|
| |
Nest the "update metapage as part of insert into root-like page" branch
inside the broader "insert into internal page" branch. This improves
readability.
|