| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
create_merge_append_plan failed to honor the CP_EXACT_TLIST flag:
it would generate the expected targetlist but then it felt free to
add resjunk sort targets to it. This demonstrably leads to assertion
failures in v11 and HEAD, and it's probably just accidental that we
don't see the same in older branches. I've not looked into whether
there would be any real-world consequences in non-assert builds.
In HEAD, create_append_plan has sprouted the same problem, so fix
that too (although we do not have any test cases that seem able to
reach that bug). This is an oversight in commit 3fc6e2d7f which
invented the CP_EXACT_TLIST flag, so back-patch to 9.6 where that
came in.
convert_subquery_pathkeys would create pathkeys for subquery output
values if they match any EquivalenceClass known in the outer query
and are available in the subquery's syntactic targetlist. However,
the second part of that condition is wrong, because such values might
not appear in the subquery relation's reltarget list, which would
mean that they couldn't be accessed above the level of the subquery
scan. We must check that they appear in the reltarget list, instead.
This can lead to dropping knowledge about the subquery's sort
ordering, but I believe it's okay, because any sort key that the
outer query actually has any interest in would appear in the
reltarget list.
This second issue is of very long standing, but right now there's no
evidence that it causes observable problems before 9.6, so I refrained
from back-patching further than that. We can revisit that choice if
somebody finds a way to make it cause problems in older branches.
(Developing useful test cases for these issues is really problematic;
fixing convert_subquery_pathkeys removes the only known way to exhibit
the create_merge_append_plan bug, and neither of the test cases added
by this patch causes a problem in all branches, even when considering
the issues separately.)
The second issue explains bug #15795 from Suresh Kumar R ("could not
find pathkey item to sort" with nested DISTINCT queries). I stumbled
across the first issue while investigating that.
Discussion: https://postgr.es/m/15795-fadb56c8e44ee73c@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 7012b132d0, which added support for aggregate pushdown in
postgres_fdw, the expense of evaluating the final scan/join target
computed by make_group_input_target() was not accounted for at all in
costing aggregate pushdown paths with local statistics. The right fix
for this would be to have a separate upper stage to adjust the final
scan/join relation (see comments for apply_scanjoin_target_to_paths());
but for now, fix by adding the tlist eval cost when costing aggregate
pushdown paths with local statistics.
Apply this to HEAD only to avoid destabilizing existing plan choices.
Author: Etsuro Fujita
Reviewed-By: Antonin Houska
Discussion: https://postgr.es/m/5C66A056.60007%40lab.ntt.co.jp
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit bb16aba50 broke the code that maintains SxactGlobalXmin. It
could get stuck when a well-timed READ ONLY transaction runs. If
SxactGlobalXmin stops advancing, transactions on the
FinishedSerializableTransactions queue are never cleaned up, so
resources are effectively leaked. Revert that hunk of the commit.
Also revert another similar hunk that was probably harmless, but
unnecessary and unjustified, relating to the DOOMED flag in case of
RO_SAFE early release.
Author: Thomas Munro
Reported-by: Tom Lane
Discussion: https://postgr.es/m/16170.1557251214%40sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The right way for IsCatalogRelation/Class to behave is to return true
for OIDs less than FirstBootstrapObjectId (not FirstNormalObjectId),
without any of the ad-hoc fooling around with schema membership.
The previous code was wrong because (1) it claimed that
information_schema tables were not catalog relations but their toast
tables were, which is silly; and (2) if you dropped and recreated
information_schema, which is a supported operation, the behavior
changed. That's even sillier. With this definition, "catalog
relations" are exactly the ones traceable to the postgres.bki data,
which seems like what we want.
With this simplification, we don't actually need access to the pg_class
tuple to identify a catalog relation; we only need its OID. Hence,
replace IsCatalogClass with "IsCatalogRelationOid(oid)". But keep
IsCatalogRelation as a convenience function.
This allows fixing some arguably-wrong semantics in contrib/sepgsql and
ReindexRelationConcurrently, which were using an IsSystemNamespace test
where what they really should be using is IsCatalogRelationOid. The
previous coding failed to protect toast tables of system catalogs, and
also was not on board with the general principle that user-created tables
do not become catalogs just by virtue of being renamed into pg_catalog.
We can also get rid of a messy hack in ReindexMultipleTables.
While we're at it, also rename IsSystemNamespace to IsCatalogNamespace,
because the previous name invited confusion with the more expansive
semantics used by IsSystemRelation/Class.
Also improve the comments in catalog.c.
There are a few remaining places in replication-related code that are
special-casing OIDs below FirstNormalObjectId. I'm inclined to think
those are wrong too, and if there should be any special case it should
just extend to FirstBootstrapObjectId. But first we need to debate
whether a FOR ALL TABLES publication should include information_schema.
Discussion: https://postgr.es/m/21697.1557092753@sss.pgh.pa.us
Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
| |
Commit dd299df8189, which added suffix truncation to nbtree, simplified
the WAL record format used by page splits. It became necessary to
explicitly WAL-log the new high key for the left half of a split in all
cases, which relieved the REDO routine from having to reconstruct a new
high key for the left page by copying the first item from the right
page. Remove a comment that referred to the previous practice.
|
|
|
|
|
|
|
|
|
|
| |
Some messages related to foreign servers were reporting the server name
without quotes, or not at all; our style is to have all names be quoted,
and the server name already appears quoted in a few other messages, so
just add quotes and make them all consistent.
Remove an extra "s" in other messages (typos introduced by myself in
f56f8f8da6af).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
REINDEX CONCURRENTLY locks tables with ShareUpdateExclusiveLock rather
than the ShareLock used by a plain REINDEX. However,
RangeVarCallbackForReindexIndex() was not updated for that and still
used the ShareLock only. This would lead to lock upgrades later,
leading to possible deadlocks.
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/20190430151735.wi52sxjvxsjvaxxt%40alap3.anarazel.de
|
|
|
|
| |
The flat file mechanism was removed in PostgreSQL 9.0.
|
|
|
|
|
|
|
| |
It is no longer possible under any circumstances for nbtree code to
reconstruct a strict lower bound key (parent page's pivot tuple key) for
a right sibling page by retrieving the first item in the right sibling
page.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit contains multiple improvements to error reporting in jsonpath
including but not limited to getting rid of following things:
* definition of error messages in macros,
* errdetail() when valueable information could fit to errmsg(),
* word "singleton" which is not properly explained anywhere,
* line breaks in error messages.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/14890.1555523005%40sss.pgh.pa.us
Author: Alexander Korotkov
Reviewed-by: Tom Lane
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds new parameter to VACUUM command, TRUNCATE,
which specifies that VACUUM should attempt to truncate off
any empty pages at the end of the table and allow the disk space
for the truncated pages to be returned to the operating system.
This parameter, if specified, overrides the vacuum_truncate
reloption. If neither the reloption nor the VACUUM option is
used, the default is true, as before.
Author: Fujii Masao
Reviewed-by: Julien Rouhaud, Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoD+qtrSDL=GSma4Wd3kLYLeRC0hPna-YAdkDeV4z156vg@mail.gmail.com
|
|
|
|
| |
Author: Daniel Gustafsson <daniel@yesql.se>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On a 64-bit machine, if you set track_activity_query_size and
max_connections such that their product exceeds 1GB, shared memory
setup will still succeed (given enough RAM), but attempts to read
pg_stat_activity fail with "invalid memory alloc request size".
Work around that by using MemoryContextAllocHuge to allocate the
local copy of the activity strings. Using the "huge" API costs us
nothing extra in normal cases, and it seems better than throwing
an error and/or explaining to people why they can't do this.
This situation seems insanely profligate today, but who knows what
people will consider normal in ten or twenty years? So let's fix it
in HEAD but not worry about a back-patch.
Per report from James Tomson.
Discussion: https://postgr.es/m/1CFDCCD6-B268-48D8-85C8-400D2790B2C3@pushd.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This feature was using a process local map to track the first few blocks
in the relation. The map was reset each time we get the block with enough
freespace. It was discussed that it would be better to track this map on
a per-relation basis in relcache and then invalidate the same whenever
vacuum frees up some space in the page or when FSM is created. The new
design would be better both in terms of API design and performance.
List of commits reverted, in reverse chronological order:
06c8a5090e Improve code comments in b0eaa4c51b.
13e8643bfc During pg_upgrade, conditionally skip transfer of FSMs.
6f918159a9 Add more tests for FSM.
9c32e4c350 Clear the local map when not used.
29d108cdec Update the documentation for FSM behavior..
08ecdfe7e5 Make FSM test portable.
b0eaa4c51b Avoid creation of the free space map for small heap relations.
Discussion: https://postgr.es/m/20190416180452.3pm6uegx54iitbt5@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In examine_variable() and examine_simple_variable(), when checking the
user's table and column privileges to determine whether to grant
access to the pg_statistic data, use checkAsUser for the privilege
checks, if it's set. This will be the case if we're accessing the
table via a view, to indicate that we should perform privilege checks
as the view owner rather than the current user.
This change makes this planner check consistent with the check in the
executor, so the planner will be able to make use of statistics if the
table is accessible via the view. This fixes a performance regression
introduced by commit e2d4ef8de8, which affects queries against
non-security barrier views in the case where the user doesn't have
privileges on the underlying table, but the view owner does.
Note that it continues to provide the same safeguards controlling
access to pg_statistic for direct table access (in which case
checkAsUser won't be set) and for security barrier views, because of
the nearby checks on rte->security_barrier and rte->securityQuals.
Back-patch to all supported branches because e2d4ef8de8 was.
Dean Rasheed, reviewed by Jonathan Katz and Stephen Frost.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit e2d4ef8de8, security checks were added to prevent
user-supplied operators from running over data from pg_statistic
unless the user has table or column privileges on the table, or the
operator is leakproof. For a table with RLS, however, checking for
table or column privileges is insufficient, since that does not
guarantee that the user has permission to view all of the column's
data.
Fix this by also checking for securityQuals on the RTE, and insisting
that the operator be leakproof if there are any. Thus the
leakproofness check will only be skipped if there are no securityQuals
and the user has table or column privileges on the table -- i.e., only
if we know that the user has access to all the data in the column.
Back-patch to 9.5 where RLS was added.
Dean Rasheed, reviewed by Jonathan Katz and Stephen Frost.
Security: CVE-2019-10130
|
|
|
|
|
|
|
| |
Noticed while reviewing nearby code. Given all the disclaimers about
this not being meant as user-facing code, I wonder whether we should
make these non-translatable? But in any case there's little excuse
for them not to be good English.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Project style is to check the success of SearchSysCacheN and friends
by applying HeapTupleIsValid to the result. A tiny minority of calls
creatively did it differently. Bring them into line with the rest.
This is just cosmetic, since HeapTupleIsValid is indeed just a null
check at the moment ... but that may not be true forever, and in any
case it puts a mental burden on readers who may wonder why these
call sites are not like the rest.
Back-patch to v11 just to keep the branches in sync. (The bulk of these
errors seem to have originated in v11 or v12, though a few are old.)
Per searching to see if anyplace else had made the same error
repaired in 62148c352.
|
|
|
|
|
|
|
|
| |
Omitted in commit 05b38c7e6 (though it looks like the original blame
belongs to 9e9befac4). A failure is admittedly unlikely, but if it
did happen, SIGSEGV is not the approved method of reporting it.
Per Coverity. Back-patch to v11 where the broken code originated.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 3f342839 corrected obsolete comments about buffer locks at the
main _bt_insert_parent() call site, but missed similar obsolete comments
above _bt_insert_parent() itself. Both sets of comments were rendered
obsolete by commit 40dae7ec537, which made the nbtree page split
algorithm more robust. Fix the comments that were missed the first time
around now.
In passing, refine a related _bt_insert_parent() comment about
re-finding the parent page to insert new downlink.
|
|
|
|
|
|
|
|
| |
In the wake of commit f912d7dec, RelationSetIndexList isn't used any
more. It was always a horrid wart, so getting rid of it is very nice.
We can also convert rd_indexvalid back to a plain boolean.
Discussion: https://postgr.es/m/28926.1556664156@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commits 3dbb317d3 et al failed under CLOBBER_CACHE_ALWAYS testing.
Investigation showed that to reindex pg_class_oid_index, we must
suppress accesses to the index (via SetReindexProcessing) before we call
RelationSetNewRelfilenode, or at least before we do CommandCounterIncrement
therein; otherwise, relcache reloads happening within the CCI may try to
fetch pg_class rows using the index's new relfilenode value, which is as
yet an empty file.
Of course, the point of 3dbb317d3 was that that ordering didn't work
either, because then RelationSetNewRelfilenode's own update of the index's
pg_class row cannot access the index, should it need to.
There are various ways we might have got around that, but Andres Freund
came up with a brilliant solution: for a mapped index, we can really just
skip the pg_class update altogether. The only fields it was actually
changing were relpages etc, but it was just setting them to zeroes which
is useless make-work. (Correct new values will be installed at the end
of index build.) All pg_class indexes are mapped and probably always will
be, so this eliminates the problem by removing work rather than adding it,
always a pleasant outcome. Having taught RelationSetNewRelfilenode to do
it that way, we can revert the code reordering in reindex_index. (But
I left the moved setup code where it was; there seems no reason why it
has to run without use of the old index. If you're trying to fix a
busted pg_class index, you'll have had to disable system index use
altogether to get this far.)
Moreover, this means we don't need RelationSetIndexList at all, because
reindex_relation's hacking to make "REINDEX TABLE pg_class" work is
likewise now unnecessary. We'll leave that code in place in the back
branches, but a follow-on patch will remove it in HEAD.
In passing, do some minor cleanup for commit 5c1560606 (in HEAD only),
notably removing a duplicate newrnode assignment.
Patch by me, using a core idea due to Andres Freund. Back-patch to all
supported branches, as 3dbb317d3 was.
Discussion: https://postgr.es/m/28926.1556664156@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
| |
Commit d2599ecfcc74 introduced some contorted, confused code around:
readers would think that it's possible for HeapTupleHeaderGetXmin return
a non-frozen value for some frozen tuples, which would be disastrous.
There's no actual bug, but it seems better to make it clearer.
Per gripe from Tom Lane and Andres Freund.
Discussion: https://postgr.es/m/30116.1555430496@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit dd299df8189, which made heap TID a tiebreaker nbtree index
column, introduced new rules on page space management to make suffix
truncation safe. In general, suffix truncation needs to have a small
amount of extra space available on the new left page when splitting a
leaf page. This is needed in case it turns out that truncation cannot
even "truncate away the heap TID column", resulting in a
larger-than-firstright leaf high key with an explicit heap TID
representation.
Despite all this, CREATE INDEX/nbtsort.c did not account for the
possible need for extra heap TID space on leaf pages when deciding
whether or not a new item could fit on current page. This could lead to
"failed to add item to the index page" errors when CREATE
INDEX/nbtsort.c tried to finish off a leaf page that lacked space for a
larger-than-firstright leaf high key (it only had space for firstright
tuple, which was just short of what was needed following "truncation").
Several conditions needed to be met all at once for CREATE INDEX to
fail. The problem was in the hard limit on what will fit on a page,
which tends to be masked by the soft fillfactor-wise limit. The easiest
way to recreate the problem seems to be a CREATE INDEX on a low
cardinality text column, with tuples that are of non-uniform width,
using a fillfactor of 100.
To fix, bring nbtsort.c in line with nbtsplitloc.c, which already
pessimistically assumes that all leaf page splits will have high keys
that have a heap TID appended.
Reported-By: Andreas Joseph Krogh
Discussion: https://postgr.es/m/VisenaEmail.c5.3ee7fe277d514162.16a6d785bea@tc7-visena
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new nleft_dead_tuples and nleft_dead_itemids fields are confusing
and do not seem like the correct way forward. One of them is tested
via an assertion that can fail, as it has already done on buildfarm
member topminnow. Remove the assertion and the fields.
Change the logic for the case where a tuple is not initially pruned
by heap_page_prune but later diagnosed HEAPTUPLE_DEAD by
HeapTupleSatisfiesVacuum. Previously, tupgone = true was set in
that case, which leads to treating the tuple as one that will be
removed. In a normal vacuum, that's OK, because we'll remove
index entries for it and then the second heap pass will remove the
tuple itself, but when index cleanup is disabled, those things
don't happen, so we must instead treat it as a recently-dead
tuple that we have voluntarily chosen to keep.
Report and analysis by Tom Lane. This patch loosely based on one
from Masahiko Sawada, but I changed most of it.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The message type for temp files and for checksum failures were missing
from the union. Due to the coding style used there was no compiler error
when this happend. So change the code to actively use the union thereby
producing a compiler error if the same mistake happens again, suggested
by Tom Lane.
Author: Julien Rouhaud
Reported-By: Tomas Vondra
Discussion: https://postgr.es/m/20190430163328.zd4rrlnbvgaqlcdz@development
|
|
|
|
|
|
|
|
| |
Introduced in 3dbb317d3. Fix by using the new local variable in more
places.
Reported-By: Bruce Momjian (off-list)
Backpatch: 9.4-, like 3dbb317d3
|
|
|
|
|
|
|
| |
Author: Justin Pryzby
Discussion:
https://postgr.es/m/20190408141828.GE10080@telsasoft.com
https://postgr.es/m/20181127184133.GM10913@telsasoft.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While discussing comment improvements (see next commit) by Justin
Pryzby, Tom complained about a few details of the logic to infer the
length of the NULL bitmap when building the JITed tuple deforming
function. That bitmap allows to avoid checking the tuple header's
natts, a check which often causes a pipeline stall
Improvements:
a) As long as missing columns aren't taken into account, we can
continue to infer the length of the NULL bitmap from NOT NULL
columns following it. Previously we stopped at the first missing
column. It's unlikely to matter much in practice, but the
alternative would have been to document why we stop.
b) For robustness reasons it seems better to also check against
attisdropped - RemoveAttributeById() sets attnotnull to false, but
an additional check is trivial.
c) Improve related comments
Discussion: https://postgr.es/m/20637.1555957068@sss.pgh.pa.us
Backpatch: -
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The interaction of these parameters was a bit confused/confusing,
and in fact v11 entirely misses the opportunity to apply partition
constraints when a partition is accessed directly (rather than
indirectly from its parent).
In HEAD, establish the principle that enable_partition_pruning controls
partition pruning and nothing else. When accessing a partition via its
parent, we do partition pruning (if enabled by enable_partition_pruning)
and then there is no need to consider partition constraints in the
constraint_exclusion logic. When accessing a partition directly, its
partition constraints are applied by the constraint_exclusion logic,
only if constraint_exclusion = on.
In v11, we can't have such a clean division of these GUCs' effects,
partly because we don't want to break compatibility too much in a
released branch, and partly because the clean coding requires
inheritance_planner to have applied partition pruning to a partitioned
target table, which it doesn't in v11. However, we can tweak things
enough to cover the missed case, which seems like a good idea since
it's potentially a performance regression from v10. This patch keeps
v11's previous behavior in which enable_partition_pruning overrides
constraint_exclusion for an inherited target table, though.
In HEAD, also teach relation_excluded_by_constraints that it's okay to use
inheritable constraints when trying to prune a traditional inheritance
tree. This might not be thought worthy of effort given that that feature
is semi-deprecated now, but we have enough infrastructure that it only
takes a couple more lines of code to do it correctly.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/9813f079-f16b-61c8-9ab7-4363cab28d80@lab.ntt.co.jp
Discussion: https://postgr.es/m/29069.1555970894@sss.pgh.pa.us
|
| |
|
|
|
|
|
|
|
|
| |
Mistake in ab0dfc961b6a; progress reporting would have wrapped around
for indexes created with more than 2^31 tuples.
Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wz=WbNxc5ob5NJ9yqo2RMJ0q4HXDS30GVCobeCvC9A1L9A@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When reindexing individual indexes on pg_class it was possible to
either trigger an assertion failure:
TRAP: FailedAssertion("!(!ReindexIsProcessingIndex(((index)->rd_id)))
That's because reindex_index() called SetReindexProcessing() - which
enables an asserts ensuring no index insertions happen into the index
- before calling RelationSetNewRelfilenode(). That not correct for
indexes on pg_class, because RelationSetNewRelfilenode() updates the
relevant pg_class row, which needs to update the indexes.
The are two reasons this wasn't noticed earlier. Firstly the bug
doesn't trigger when reindexing all of pg_class, as reindex_relation
has code "hiding" all yet-to-be-reindexed indexes. Secondly, the bug
only triggers when the the update to pg_class doesn't turn out to be a
HOT update - otherwise there's no index insertion to trigger the
bug. Most of the time there's enough space, making this bug hard to
trigger.
To fix, move RelationSetNewRelfilenode() to before the
SetReindexProcessing() (and, together with some other code, to outside
of the PG_TRY()).
To make sure the error checking intended by SetReindexProcessing() is
more robust, modify CatalogIndexInsert() to check
ReindexIsProcessingIndex() even when the update is a HOT update.
Also add a few regression tests for REINDEXing of system catalogs.
The last two improvements would have prevented some of the issues
fixed in 5c1560606dc4c from being introduced in the first place.
Reported-By: Michael Paquier
Diagnosed-By: Tom Lane and Andres Freund
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20190418011430.GA19133@paquier.xyz
Backpatch: 9.4-, the bug is present in all branches
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of these stem from d25f519107 "tableam: relation creation, VACUUM
FULL/CLUSTER, SET TABLESPACE.".
1) To pass data to the relation_set_new_filenode()
RelationSetNewRelfilenode() was made to update RelationData.rd_rel
directly. That's not OK however, as it makes the relcache entries
temporarily inconsistent. Which among other scenarios is a problem
if a REINDEX targets an index on pg_class - the
CatalogTupleUpdate() in RelationSetNewRelfilenode(). Presumably
that was introduced because other places in the code do so - while
those aren't "good practice" they don't appear to be actively
buggy (e.g. because system tables may not be targeted).
I (Andres) should have caught this while reviewing and signficantly
evolving the code in that commit, mea culpa.
Fix that by instead passing in the new RelFileNode as separate
argument to relation_set_new_filenode() and rely on the relcache to
update the catalog entry. Also revert that the
RelationMapUpdateMap() call was changed to immediate, and undo some
other more unnecessary changes.
2) Document that the relation_set_new_filenode cannot rely on the
whole relcache entry to be valid. It might be worthwhile to
refactor the code to never have to rely on that, but given the way
heap_create() is currently coded, that'd be a large change.
3) ATExecSetTableSpace() shouldn't do FlushRelationBuffers() itself. A
table AM might not use shared buffers at all. Move to
index_copy_data() and heapam_relation_copy_data().
4) heapam_relation_set_new_filenode() previously sometimes accessed
rel->rd_rel->relpersistence rather than the `persistence`
argument. Code movement mistake.
5) Previously heapam_relation_set_new_filenode() re-opened the smgr
relation to create the init for, if necesary. Instead have
RelationCreateStorage() return the SMgrRelation and use it to
create the init fork.
6) Add a note about the danger of modifying the relcache directly to
ATExecSetTableSpace() - it's currently not a bug because there's a
check ERRORing for catalog tables.
Regression tests and assertion improvements that together trigger the
bug described in 1) will be added in a later commit, as there is a
related bug on all branches.
Reported-By: Michael Paquier
Diagnosed-By: Tom Lane and Andres Freund
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20190418011430.GA19133@paquier.xyz
|
|
|
|
|
|
|
| |
Remove a comment that refers to a coding practice that was fully removed
by commit a8b8f4db, which introduced MarkBufferDirty(). It looks like
the comment was even obsolete before then, since it concerns
write-ordering dependencies with synchronous buffer writes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is quite unsafe, even for the case of ereport(FATAL) where we won't
return control to the interrupted code, and despite this code's use of
a flag to restrict the areas where we'd try to do it. It's possible
for example that we interrupt malloc or free while that's holding a lock
that's meant to protect against cross-thread interference. Then, any
attempt to do malloc or free within ereport() will result in a deadlock,
preventing the walreceiver process from exiting in response to SIGTERM.
We hypothesize that this explains some hard-to-reproduce failures seen
in the buildfarm.
Hence, get rid of the immediate-exit code in WalRcvShutdownHandler,
as well as the logic associated with WalRcvImmediateInterruptOK.
Instead, we need to take care that potentially-blocking operations
in the walreceiver's data transmission logic (libpqwalreceiver.c)
will respond reasonably promptly to the process's latch becoming
set and then call ProcessWalRcvInterrupts. Much of the needed code
for that was already present in libpqwalreceiver.c. I refactored
things a bit so that all the uses of PQgetResult use latch-aware
waiting, but didn't need to do much more.
These changes should be enough to ensure that libpqwalreceiver.c
will respond promptly to SIGTERM whenever it's waiting to receive
data. In principle, it could block for a long time while waiting
to send data too, and this patch does nothing to guard against that.
I think that that hazard is mostly theoretical though: such blocking
should occur only if we fill the kernel's data transmission buffers,
and we don't generally send enough data to make that happen without
waiting for input. If we find out that the hazard isn't just
theoretical, we could fix it by using PQsetnonblocking, but that
would require more ticklish changes than I care to make now.
This is a bug fix, but it seems like too big a change to push into
the back branches without much more testing than there's time for
right now. Perhaps we'll back-patch once we have more confidence
in the change.
Patch by me; thanks to Thomas Munro for review.
Discussion: https://postgr.es/m/20190416070119.GK2673@paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a temporary table with an identity column and ON COMMIT DROP is
created in a single-statement transaction (not useful, but allowed),
it would leave the catalog corrupted. We need to add a
CommandCounterIncrement() so that PreCommit_on_commit_actions() sees
the created dependency between table and sequence and can clean it
up.
The analogous and more useful case of doing this in a transaction
block already runs some CommandCounterIncrement() before it gets to
the on-commit cleanup, so it wasn't a problem in practical use.
Several locations for placing the new CommandCounterIncrement() call
were discussed. This patch places it at the end of
standard_ProcessUtility(). That would also help if other commands
were to create catalog entries that some on-commit action would like
to see.
Bug: #15631
Reported-by: Serge Latyntsev <dnsl48@gmail.com>
Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
|
|
|
|
| |
Emacs wrongly indented hundreds of subsequent lines.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Be more consistent about use of XXXGetDatum macros in new jsonpath
code. This is mostly to avoid having code that looks randomly
different from everyplace else that's doing the exact same thing.
In pg_regress.c, avoid an unreferenced-function warning from
compilers that don't understand pg_attribute_unused(). Putting
the function inside the same #ifdef as its only caller is more
straightforward coding anyway.
In be-secure-openssl.c, avoid use of pg_attribute_unused() on a label.
That's pretty creative, but there's no good reason to suppose that
it's portable, and there's absolutely no need to use goto's here in the
first place. (This wasn't actually causing any buildfarm complaints,
but it's new code in v12 so it has no portability track record.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix DefineIndex so that it doesn't attempt to pass down a to-be-reused
index relfilenode to a child index creation, and fix TryReuseIndex
to not think that reuse is sensible for a partitioned index.
In v11, this fixes a problem where ALTER TABLE on a partitioned table
could assign the same relfilenode to several different child indexes,
causing very nasty catalog corruption --- in fact, attempting to DROP
the partitioned table then leads not only to a database crash, but to
inability to restart because the same crash will recur during WAL replay.
Either of these two changes would be enough to prevent the failure, but
since neither action could possibly be sane, let's put in both changes
for future-proofing.
In HEAD, no such bug manifests, but that's just an accidental consequence
of having changed the pg_class representation of partitioned indexes to
have relfilenode = 0. Both of these changes still seem like smart
future-proofing.
This is only a stop-gap because the code for ALTER TABLE on a partitioned
table with a no-op type change still leaves a great deal to be desired.
As the added regression tests show, it gets things wrong for comments on
child indexes/constraints, and it is regenerating child indexes it doesn't
have to. However, fixing those problems will take more work which may not
get back-patched into v11. We need a fix for the corruption problem now.
Per bug #15672 from Jianing Yang.
Patch by me, regression test cases based on work by Amit Langote,
who also did a lot of the investigative work.
Discussion: https://postgr.es/m/15672-b9fa7db32698269f@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an existing index in a partition is attached to a new index on
its parent, we forgot to set the "relispartition" flag correctly, which
meant that it was not possible to find the index in various operations,
such as adding a foreign key constraint that references that partitioned
table. One of four places that was assigning the parent index was
forgetting to do that, so fix by shifting responsibility of updating the
flag to the routine that changes the parent.
Author: Amit Langote, Álvaro Herrera
Reported-by: Hubert "depesz" Lubaczewski
Discussion: https://postgr.es/m/CA+HiwqHMsRtRYRWYTWavKJ8x14AFsv7bmAV46mYwnfD3vy8goQ@mail.gmail.com
|
|
|
|
|
|
|
|
| |
Commit 3eb77eba5a renamed some functions, but forgot to
update some comments referencing to those functions.
This commit fixes those function names in the comments.
Kyotaro Horiguchi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit ca4103025dfe left a few loose ends. The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8bb, but some things remained:
* When ALTER TABLE rewrites tables, the indexes must remain in the
tablespace they were originally in. This didn't work because
index recreation during ALTER TABLE runs manufactured SQL (yuck),
which runs afoul of default_tablespace in competition with the parent
relation tablespace. To fix, reset default_tablespace to the empty
string temporarily, and add the TABLESPACE clause as appropriate.
* Setting a partitioned rel's tablespace to the database default is
confusing; if it worked, it would direct the partitions to that
tablespace regardless of default_tablespace. But in reality it does
not work, and making it work is a larger project. Therefore, throw
an error when this condition is detected, to alert the unwary.
Add some docs and tests, too.
Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In sigusr1_handler, don't ignore PMSIGNAL_ADVANCE_STATE_MACHINE based
on pmState. The restriction is unnecessary (PostmasterStateMachine
should work in any state), not future-proof (since it makes too many
assumptions about why the signal might be sent), and broken even today
because a race condition can make it necessary to respond to the signal
in PM_WAIT_READONLY state. The race condition seems unlikely, but
if it did happen, a hot-standby postmaster could fail to shut down
after receiving a smart-shutdown request.
In MaybeStartWalReceiver, don't clear the WalReceiverRequested flag
if the fork attempt fails. Leaving it set allows us to try
again in future iterations of the postmaster idle loop. (The startup
process would eventually send a fresh request signal, but this change
may allow us to retry the fork sooner.)
Remove an obsolete comment and unnecessary test in
PostmasterStateMachine's handling of PM_SHUTDOWN_2 state. It's not
possible to have a live walreceiver in that state, and AFAICT has not
been possible since commit 5e85315ea. This isn't a live bug, but the
false comment is quite confusing to readers.
In passing, rearrange sigusr1_handler's CheckPromoteSignal tests so that
we don't uselessly perform stat() calls that we're going to ignore the
results of.
Add some comments clarifying the behavior of MaybeStartWalReceiver;
I very nearly rearranged it in a way that'd reintroduce the race
condition fixed in e5d494d78. Mea culpa for not commenting that
properly at the time.
Back-patch to all supported branches. The PMSIGNAL_ADVANCE_STATE_MACHINE
change is the only one of even minor significance, but we might as well
keep this code in sync across branches.
Discussion: https://postgr.es/m/9001.1556046681@sss.pgh.pa.us
|
|
|
|
| |
... for translatability purposes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows table AMs that don't need these horizons. This was already
documented in the tableam relation_set_new_filenode callback, but an
assert prevented if from actually working (the test AM code contained
the change itself). Defang the asserts in the general code, and move
the stronger ones into heap AM.
Relatedly, after CLUSTER/VACUUM, we'd always assign a relfrozenxid /
relminmxid. Change the table_relation_copy_for_cluster() interface to
allow the AM to overwrite the horizons that get set on the pg_class
entry. This'd also in the future allow AMs like heap to compute a
relfrozenxid during rewrite that's the table's actual minimum rather
than a pre-determined value. Arguably it'd have been better to move
the whole computation / setting of those values into the callback, but
it seems likely that for other reasons it'd be better to be able to
use one value to vacuum/cluster multiple tables (e.g. a toast's
horizon shouldn't be different than the table's).
Reported-By: Heikki Linnakangas
Author: Andres Freund
Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cache_locale_time (extraction of LC_TIME-related info) had never been
taught the lessons we previously learned about extraction of info related
to LC_MONETARY and LC_NUMERIC. Specifically, commit 95a777c61 taught
PGLC_localeconv() that data coming out of localeconv() was in an encoding
determined by the relevant locale, but we didn't realize that there's a
similar issue with strftime(). And commit a4930e7ca hardened
PGLC_localeconv() against errors occurring partway through, but failed
to do likewise for cache_locale_time(). So, rearrange the latter
function to perform encoding conversion and not risk failure while
it's got the locales set to temporary values.
This time around I also changed PGLC_localeconv() to treat it as FATAL
if it can't restore the previous settings of the locale values. There
is no reason (except possibly OOM) for that to fail, and proceeding with
the wrong locale values seems like a seriously bad idea --- especially
on Windows where we have to also temporarily change LC_CTYPE. Also,
protect against the possibility that we can't identify the codeset
reported for LC_MONETARY or LC_NUMERIC; rather than just failing,
try to validate the data without conversion.
The user-visible symptom this fixes is that if LC_TIME is set to a locale
name that implies an encoding different from the database encoding,
non-ASCII localized day and month names would be retrieved in the wrong
encoding, leading to either unexpected encoding-conversion error reports
or wrong output from to_char(). The other possible failure modes are
unlikely enough that we've not seen reports of them, AFAIK.
The encoding conversion problems do not manifest on Windows, since
we'd already created special-case code to handle that issue there.
Per report from Juan José Santamaría Flecha. Back-patch to all
supported versions.
Juan José Santamaría Flecha and Tom Lane
Discussion: https://postgr.es/m/CAC+AXB22So5aZm2vZe+MChYXec7gWfr-n-SK-iO091R0P_1Tew@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit dd299df8 made nbtree treat heap TID as a tiebreaker column,
establishing the principle that there is only one correct location (page
and page offset number) for every index tuple, no matter what.
Insertions of tuples into non-unique indexes proceed as if heap TID
(scan key's scantid) is just another user-attribute value, but
insertions into unique indexes are more delicate. The TID value in
scantid must initially be omitted to ensure that the unique index
insertion visits every leaf page that duplicates could be on. The
scantid is set once again after unique checking finishes successfully,
which can force _bt_findinsertloc() to step right one or more times, to
locate the leaf page that the new tuple must be inserted on.
Stepping right within _bt_findinsertloc() was assumed to occur no more
frequently than stepping right within _bt_check_unique(), but there was
one important case where that assumption was incorrect: inserting a
"duplicate" with NULL values. Since _bt_check_unique() didn't do any
real work in this case, it wasn't appropriate for _bt_findinsertloc() to
behave as if it was finishing off a conventional unique insertion, where
any existing physical duplicate must be dead or recently dead.
_bt_findinsertloc() might have to grovel through a substantial portion
of all of the leaf pages in the index to insert a single tuple, even
when there were no dead tuples.
To fix, treat insertions of tuples with NULLs into a unique index as if
they were insertions into a non-unique index: never unset scantid before
calling _bt_search() to descend the tree, and bypass _bt_check_unique()
entirely. _bt_check_unique() is no longer responsible for incoming
tuples with NULL values.
Discussion: https://postgr.es/m/CAH2-Wzm08nr+JPx4jMOa9CGqxWYDQ-_D4wtPBiKghXAUiUy-nQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Up to now, DefineIndex() was responsible for adding attnotnull constraints
to the columns of a primary key, in any case where it hadn't been
convenient for transformIndexConstraint() to mark those columns as
is_not_null. It (or rather its minion index_check_primary_key) did this
by executing an ALTER TABLE SET NOT NULL command for the target table.
The trouble with this solution is that if we're creating the index due
to ALTER TABLE ADD PRIMARY KEY, and the outer ALTER TABLE has additional
sub-commands, the inner ALTER TABLE's operations executed at the wrong
time with respect to the outer ALTER TABLE's operations. In particular,
the inner ALTER would perform a validation scan at a point where the
table's storage might be inconsistent with its catalog entries. (This is
on the hairy edge of being a security problem, but AFAICS it isn't one
because the inner scan would only be interested in the tuples' null
bitmaps.) This can result in unexpected failures, such as the one seen
in bug #15580 from Allison Kaptur.
To fix, let's remove the attempt to do SET NOT NULL from DefineIndex(),
reducing index_check_primary_key's role to verifying that the columns are
already not null. (It shouldn't ever see such a case, but it seems wise
to keep the check for safety.) Instead, make transformIndexConstraint()
generate ALTER TABLE SET NOT NULL subcommands to be executed ahead of
the ADD PRIMARY KEY operation in every case where it can't force the
column to be created already-not-null. This requires only minor surgery
in parse_utilcmd.c, and it makes for a much more satisfying spec for
transformIndexConstraint(): it's no longer having to take it on faith
that someone else will handle addition of NOT NULL constraints.
To make that work, we have to move the execution of AT_SetNotNull into
an ALTER pass that executes ahead of AT_PASS_ADD_INDEX. I moved it to
AT_PASS_COL_ATTRS, and put that after AT_PASS_ADD_COL to avoid failure
when the column is being added in the same command. This incidentally
fixes a bug in the only previous usage of AT_PASS_COL_ATTRS, for
AT_SetIdentity: it didn't work either for a newly-added column.
Playing around with this exposed a separate bug in ALTER TABLE ONLY ...
ADD PRIMARY KEY for partitioned tables. The intent of the ONLY modifier
in that context is to prevent doing anything that would require holding
lock for a long time --- but the implied SET NOT NULL would recurse to
the child partitions, and do an expensive validation scan for any child
where the column(s) were not already NOT NULL. To fix that, invent a
new ALTER subcommand AT_CheckNotNull that just insists that a child
column be already NOT NULL, and apply that, not AT_SetNotNull, when
recursing to children in this scenario. This results in a slightly laxer
definition of ALTER TABLE ONLY ... SET NOT NULL for partitioned tables,
too: that command will now work as long as all children are already NOT
NULL, whereas before it just threw up its hands if there were any
partitions.
In passing, clean up the API of generateClonedIndexStmt(): remove a
useless argument, ensure that the output argument is not left undefined,
update the header comment.
A small side effect of this change is that no-such-column errors in ALTER
TABLE ADD PRIMARY KEY now produce a different message that includes the
table name, because they are now detected by the SET NOT NULL step which
has historically worded its error that way. That seems fine to me, so
I didn't make any effort to avoid the wording change.
The basic bug #15580 is of very long standing, and these other bugs
aren't new in v12 either. However, this is a pretty significant change
in the way ALTER TABLE ADD PRIMARY KEY works. On balance it seems best
not to back-patch, at least not till we get some more confidence that
this patch has no new bugs.
Patch by me, but thanks to Jie Zhang for a preliminary version.
Discussion: https://postgr.es/m/15580-d1a6de5a3d65da51@postgresql.org
Discussion: https://postgr.es/m/1396E95157071C4EBBA51892C5368521017F2E6E63@G08CNEXMBPEKD02.g08.fujitsu.local
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
xml.c passed format = 1 to xmlNodeDump(), resulting in sometimes getting
extra whitespace (newlines + spaces) in the output. We don't really want
that, first because whitespace might be semantically significant in some
XML uses, and second because it happens only very inconsistently. Only
one case in our regression tests is affected.
This potentially affects the results of xpath() and the XMLTABLE construct,
when emitting nodeset values.
Note that the older code in contrib/xml2 doesn't do this; it seems
to have been an aboriginal bad decision in commit ea3b212fe.
While this definitely seems like a bug to me, the small number of
complaints to date argues against back-patching a behavioral change.
Hence, fix in HEAD only, at least for now.
Per report from Jean-Marc Voillequin.
Discussion: https://postgr.es/m/1EC8157EB499BF459A516ADCF135ADCE3A23A9CA@LON-WGMSX712.ad.moodys.net
|