| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 7717f630069 removed NextCopyFromRawFields() from copy.h. While
it was hoped that NextCopyFrom() could serve as an alternative,
certain use cases still require NextCopyFromRawFields(). For instance,
extensions like file_text_array_fdw, which process source data with an
unknown number of columns, rely on this function.
Per buildfarm member crake.
Reported-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Sutou Kouhei <kou@clear-code.com>
Discussion: https://postgr.es/m/5c7e1ac8-5083-4c08-af19-cb9ade2f16ce@dunslane.net
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit introduces a new CopyFromRoutine struct, which is a set of
callback routines to read tuples in a specific format. It also makes
COPY FROM with the existing formats (text, CSV, and binary) utilize
these format callbacks.
This change is a preliminary step towards making the COPY FROM command
extensible in terms of input formats.
Similar to 2e4127b6d2d, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.
Author: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
|
|
|
|
|
|
|
|
|
|
|
| |
As per a suggestion from Tom Lane, we do this by declaring "struct
ExplainState" here and refer to that rather than "ExplainState".
Also per Tom, CreateExplainSerializeDestReceiver was still defined
in explain.h in addition to explain_dr.h. Remove leftover prototype.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYtaad3i21V0jqua-fbr+CR0ix6uBvEX8_s6BG96abd=g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
Commit ddb17e387aa28d61521227377b00f997756b8a27 introduced this
regression. Ideally, the regression tests would have caught this
mistake, but apparently they don't test with timing enabled,
presumably because that would make the output vary.
Author: Thom Brown <thom@linux.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Discussion: http://postgr.es/m/CAA-aLv6nq=UeiyvM7_Mxgo9TVBzs2oh46b9vfyLzuyVEz3j1-g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code is extracted from pg_stat_get_backend_io() in pgstatfuncs.c,
so as it can be shared with other areas that need backend pgstats
entries while having the benefits of the various sanity checks
refactored here. As per its name, this retrieves backend statistics
based on a PID, with the option of retrieving a BackendType if given in
input.
Currently, this is used for the backend-level IO statistics. The next
move would be to reuse that for the backend-level WAL statistics.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit introduces a new CopyToRoutine struct, which is a set of
callback routines to copy tuples in a specific format. It also makes
the existing formats (text, CSV, and binary) utilize these format
callbacks.
This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.
Additionally, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.
Author: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
|
|
|
|
|
|
|
|
|
| |
explain.c has grown rather large, and the code that deals with the
DestReceiver that supports the SERIALIZE option is pretty easily severable
from the rest of explain.c; hence, move it to a separate file.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: http://postgr.es/m/CA+TgmoYutMw1Jgo8BWUmB3TqnOhsEAJiYO=rOQufF4gPLWmkLQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
explain.c has grown rather large, so move various functions that
are principally concerned with output generation to a new source
file, explain_format.c, instead of lumping them in with everything
else that is part of explain.c
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: http://postgr.es/m/CA+TgmoYutMw1Jgo8BWUmB3TqnOhsEAJiYO=rOQufF4gPLWmkLQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit ddb17e387aa28d61521227377b00f997756b8a27 attempted to avoid
confusing users by displaying digits after the decimal point only when
nloops > 1, since it's impossible to have a fraction row count after a
single iteration. However, this made the regression tests unstable since
parallal queries will have nloops>1 for all nodes below the Gather or
Gather Merge in normal cases, but if the workers don't start in time and
the leader finishes all the work, they will suddenly have nloops==1,
making it unpredictable whether the digits after the decimal point would
be displayed or not. Although 44cbba9a7f51a3888d5087fc94b23614ba2b81f2
seemed to fix the immediate failures, it may still be the case that there
are lower-probability failures elsewhere in the regression tests.
Various fixes are possible here. For example, it has previously been
proposed that we should try to display the digits after the decimal
point only if rows/nloops is an integer, but currently rows is storead
as a float so it's not theoretically an exact quantity -- precision
could be lost in extreme cases. It has also been proposed that we
should try to display the digits after the decimal point only if we're
under some sort of construct that could potentially cause looping
regardless of whether it actually does. While such ideas are not
without merit, this patch adopts the much simpler solution of always
display two decimal digits. If that approach stands up to scrutiny
from the buildfarm and human users, it spares us the trouble of doing
anything more complex; if not, we can reassess.
This commit incidentally reverts 44cbba9a7f51a3888d5087fc94b23614ba2b81f2,
which should no longer be needed.
Author: Robert Haas <robertmhaas@gmail.com>
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Discussion: http://postgr.es/m/CA+TgmoazzVHn8sFOMFAEwoqBTDxKT45D7mvkyeHgqtoD2cn58Q@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stop comparing access method OID values against HASH_AM_OID and
BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see
if it advertises its ability to perform the necessary ordering,
hashing, or cross-type comparing functionality. A field amcanorder
already existed, this uses it more widely. Fields amcanhash and
amcancrosscompare are added for the other purposes.
Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
As spotted by Coverity, the calculation of ojrelid mixes signed and unsigned
types causes possible overflow and undefined behavior. Instead of trying to
fix the expression, this commit eliminates the relied local variable. The
explicit branching is used to replace the -1 value. That, in turn, requires
changing the signature of the remove_rel_from_eclass() function.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/914330.1740330169%40sss.pgh.pa.us
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the internal queue of buffers was capped at max_ios * 4,
though not less than io_combine_limit, at allocation time. That was
done in the first version based on conservative theories about resource
usage and heuristics pending later work. The configured I/O depth could
not always be reached with dense random streams generated by ANALYZE,
VACUUM, the proposed Bitmap Heap Scan patch, and also sequential streams
with the proposed AIO subsystem to name some examples.
The new formula is (max_ios + 1) * io_combine_limit, enough buffers for
the full configured I/O concurrency level using the full configured I/O
combine size, plus the buffers from one finished but not yet consumed
full-sized I/O. Significantly more memory would be needed for high GUC
values if the client code requests a large per-buffer data size, but
that is discouraged (existing and proposed stream users try to keep it
under a few words, if not zero).
With this new formula, an intermediate variable could have overflowed
under maximum GUC values, so its data type is adjusted to cope.
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After commit f41d8468dd, a process could acquire and use a replication
slot that had just been invalidated, leading to failures while accessing
WAL.
To ensure that we don't accidentally start using invalid slots, we must
perform the invalidation check after acquiring the slot or under the
spinlock where we associate the slot with a particular process. We choose
the earlier method to keep the code simple.
Reported-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Author: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CABdArM7J-LbGoMPGUPiFiLOyB_TZ5+YaZb=HMES0mQqzVTn8Gg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds to pgstatfuncs.c a new routine called
pg_stat_wal_build_tuple(), helper routine for pg_stat_get_wal(). This
is in charge of filling one tuple based on the contents of
PgStat_WalStats retrieved from pgstats.
This refactoring will be used by an upcoming patch introducing
backend-level WAL statistics, simplifying the main patch. Note that
it is not possible for stats_reset to be NULL in pg_stat_wal; backend
statistics need to be able to handle this case.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
9d9b9d46f3c5 has added spinlocks to protect the fields in ProcSignal
flags, introducing a code path in ProcSignalInit() where a spinlock
could be released twice if the pss_pid field of a ProcSignalSlot is
found as already set. Multiple spinlock releases have no effect with
most spinlock implementations, but this could cause the code to run into
issues when the spinlock is acquired concurrently by a different
process.
This sanity check on pss_pid generates a LOG that can be delayed until
after the spinlock is released as, like older versions up to v17, the
code expects the initialization of the ProcSignalSlot to happen even if
pss_pid is found incorrect. The code is changed so as the old pss_pid
is read while holding the slot's spinlock, with the LOG from the sanity
check generated after releasing the spinlock, preventing the double
release.
Author: Maksim Melnikov <m.melnikov@postgrespro.ru>
Co-authored-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/dca47527-2d8b-4e3b-b5a0-e2deb73371a4@postgrespro.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we used attname for both table and index columns, but
that is problematic for indexes because their attnames are assigned
by internal rules that don't guarantee to preserve the names across
dump and reload. (This is what's causing the remaining buildfarm
failures in cross-version-upgrade tests.) Fortunately we can use
attnum instead, since there's no such thing as adding or dropping
columns in an existing index. We met this same problem previously
with ALTER INDEX ... SET STATISTICS, and solved it the same way,
cf commit 5b6d13eec.
In pg_restore_attribute_stats() itself, we accept either attnum or
attname, but the policy used by pg_dump is to always use attname
for tables and attnum for indexes.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/1457469.1740419458@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new structure contains the counters and the data related to the WAL
activity statistics gathered from WalUsage, separated into its own
structure so as it can be shared across more than one Stats structure in
pg_stat.h.
This refactoring will be used by an upcoming patch introducing
backend-level WAL statistics.
Bump PGSTAT_FILE_FORMAT_ID.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All the processes that generate WAL should call pgstat_report_wal() to
report all their statistics related to WAL, and this is already what
happens in the tree. Keeping pgstat_report_wal() is confusing while the
other routine is encouraged.
This routine is not required since fc415edf8ca8, where it was lastly
used in pgstat_report_stat() before an equivalent callback existed.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z71oPkJJICrRB5Ws@paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original message did not mention where the checkpoint record LSN was
found, a control file or a backup_label file. A couple of LOG messages
are generated before this FATAL check is reached, providing more details
about the way recovery is set up. However, knowing this information in
this specific message is useful for debugging. This is also useful for
instances where log_min_messages is set to FATAL or more, where LOG
messages do not show up.
Author: Benoit Lobréau <benoit.lobreau@dalibo.com>
Reviewed-by: David Steele <david@pgbackrest.org>
Discussion: https://postgr.es/m/4ed10bc8-5513-4d8e-8643-8abcaa08336d@dalibo.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit is a rework of 2421e9a51d20, about which Andres Freund has
raised some concerns as it is valuable to have both track_io_timing and
track_wal_io_timing in some cases, as the WAL write and fsync paths can
be a major bottleneck for some workloads. Hence, it can be relevant to
not calculate the WAL timings in environments where pg_test_timing
performs poorly while capturing some IO data under track_io_timing for
the non-WAL IO paths. The opposite can be also true: it should be
possible to disable the non-WAL timings and enable the WAL timings (the
previous GUC setups allowed this possibility).
track_wal_io_timing is added back in this commit, controlling if WAL
timings should be calculated in pg_stat_io for the read, fsync and write
paths, as done previously with pg_stat_wal. pg_stat_wal previously
tracked only the sync and write parts (now removed), read stats is new
data tracked in pg_stat_io, all three are aggregated if
track_wal_io_timing is enabled. The read part matters during recovery
or if a XLogReader is used.
Extra note: more control over if the types of timings calculated in
pg_stat_io could be done with a GUC that lists pairs of (IOObject,IOOp).
Reported-by: Andres Freund <andres@anarazel.de>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3opf2wh2oljco6ldyqf7ukabw3jijnnhno6fjb4mlu6civ5h24@fcwmhsgmlmzu
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After commit f3dae2ae58, the primary purpose of separating the
pg_set_*_stats() from the pg_restore_*_stats() variants was
eliminated.
Leave pg_restore_relation_stats() and pg_restore_attribute_stats(),
which satisfy both purposes, and remove pg_set_relation_stats() and
pg_set_attribute_stats().
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/1457469.1740419458@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
| |
This basically mirrors the changes done in the predecessor commit. While there
isn't currently a need to get these paths in critical sections, it seems a
shame to unnecessarily allocate memory in these paths now that relpath()
doesn't allocate anymore.
Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For AIO, and also some other recent patches, we need the ability to call
relpath() in a critical section. Until now that was not feasible, as it
allocated memory.
The fact that relpath() allocated memory also made it awkward to use in log
messages because we had to take care to free the memory afterwards. Which we
e.g. didn't do for when zeroing out an invalid buffer.
We discussed other solutions, e.g. filling a pre-allocated buffer that's
passed to relpath(), but they all came with plenty downsides or were larger
projects. The easiest fix seems to be to make relpath() return the path by
value.
To be able to return the path by value we need to determine the maximum length
of a relation path. This patch adds a long #define that computes the exact
maximum, which is verified to be correct in a regression test.
As this change the signature of relpath(), extensions using it will need to
adapt their code. We discussed leaving a backward-compat shim in place, but
decided it's not worth it given the use of relpath() doesn't seem widespread.
Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The callback functions ReplaceVarsFromTargetList_callback and
pullup_replace_vars_callback are both used to replace Vars in an
expression tree that reference a particular RTE with items from a
targetlist, and they both need to expand whole-tuple references and
deal with OLD/NEW RETURNING list Vars. As a result, currently there
is significant code duplication between these two functions.
This patch introduces a new function, ReplaceVarFromTargetList, to
perform the replacement and calls it from both callback functions,
thereby eliminating code duplication.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWhr=FM4X5kCPvVs-g2XEk+ceLsNtBK_zZMkqFn9vUjsw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 83ea6c540 added support for virtual generated columns that are
computed on read. All Var nodes in the query that reference virtual
generated columns must be replaced with the corresponding generation
expressions. Currently, this replacement occurs in the rewriter.
However, this approach has several issues. If a Var referencing a
virtual generated column has any varnullingrels, those varnullingrels
need to be propagated into the generation expression. Failing to do
so can lead to "wrong varnullingrels" errors and improper outer-join
removal.
Additionally, if such a Var comes from the nullable side of an outer
join, we may need to wrap the generation expression in a
PlaceHolderVar to ensure that it is evaluated at the right place and
hence is forced to null when the outer join should do so. In certain
cases, such as when the query uses grouping sets, we also need a
PlaceHolderVar for anything that is not a simple Var to isolate
subexpressions. Failure to do so can result in incorrect results.
To fix these issues, this patch expands the virtual generated columns
in the planner rather than in the rewriter, and leverages the
pullup_replace_vars architecture to avoid code duplication. The
generation expressions will be correctly marked with nullingrel bits
and wrapped in PlaceHolderVars when needed by the pullup_replace_vars
callback function. This requires handling the OLD/NEW RETURNING list
Vars in pullup_replace_vars_callback, as it may now deal with Vars
referencing the result relation instead of a subquery.
The "wrong varnullingrels" error was reported by Alexander Lakhin.
The incorrect result issue and the improper outer-join removal issue
were reported by Richard Guo.
Author: Richard Guo <guofenglinux@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/75eb1a6f-d59f-42e6-8a78-124ee808cda7@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit documents that the failover option is not copied when using
the pg_copy_logical_replication_slot function.
In passing, we modify the comments in the function clarifying the reason
for this behavior.
Reported-by: <duffieldzane@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/173976850802.682632.11315364077431550250@wrigleys.postgresql.org
|
|
|
|
|
|
|
|
|
| |
The use of in-place updates was originally there to follow the
precedent of ANALYZE and to reduce the potential for bloat on
pg_class. Per discussion, it's not worth the risks.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/cpdanvzykcb5o64rmapkx6n5gjypoce3y52hff7ocxupgpbxu4@53jmlyvukijo
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A non-leaf partition with a subplan that is an Append node was
omitted from PlannedStmt.unprunableRelids because it was mistakenly
included in PlannerGlobal.prunableRelids due to the way
PartitionedRelPruneInfo.leafpart_rti_map[] is constructed. This
happened when a non-leaf partition used an unflattened Append or
MergeAppend. As a result, ExecGetRangeTableRelation() reported an
error when called from CreatePartitionPruneState() to process the
partition's own PartitionPruneInfo, since it was treated as prunable
when it should not have been.
Reported-by: Alexander Lakhin <exclusion@gmail.com> (via sqlsmith)
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/74839af6-aadc-4f60-ae77-ae65f94bf607@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a standby replays an XLOG_PARAMETER_CHANGE record that lowers
wal_level below logical, we invalidate all logical slots in hot
standby mode. However, if this record was replayed while not in hot
standby mode, logical slots could remain valid even after promotion,
potentially causing an assertion failure during WAL record decoding.
To fix this issue, this commit adds a check for hot_standby status
when restoring a logical replication slot on standbys. This check
ensures that logical slots are invalidated when they become
incompatible due to insufficient wal_level during recovery.
Backpatch to v16 where logical decoding on standby was introduced.
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CAD21AoABoFwGY_Rh2aeE6tEq3HkJxf0c6UeOXn4VV9v6BAQPSw%40mail.gmail.com
Backpatch-through: 16
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pages from the bitmap created by the TIDBitmap API can be exact or
lossy. The TIDBitmap API extracts the tuple offsets from exact pages
into an array for the convenience of the caller.
This was done in tbm_private|shared_iterate() right after advancing the
iterator. However, as long as tbm_private|shared_iterate() set a
reference to the PagetableEntry in the TBMIterateResult, the offset
extraction can be done later.
Waiting to extract the tuple offsets has a few benefits. For the shared
iterator case, it allows us to extract the offsets after dropping the
shared iterator state lock, reducing time spent holding a contended
lock.
Separating the iteration step and extracting the offsets later also
allows us to avoid extracting the offsets for prefetched blocks. Those
offsets were never used, so the overhead of extracting and storing them
was wasted.
The real motivation for this change, however, is that future commits
will make bitmap heap scan use the read stream API. This requires a
TBMIterateResult per issued block. By removing the array of tuple
offsets from the TBMIterateResult and only extracting the offsets when
they are used, we reduce the memory required for per buffer data
substantially.
Suggested-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGLHbKP3jwJ6_%2BhnGi37Pw3BD5j2amjV3oSk7j-KyCnY7Q%40mail.gmail.com
|
|
|
|
|
|
|
|
| |
TBMIterateResult->ntuples is -1 when the page in the bitmap is lossy.
Add an explicit lossy indicator so that we can move ntuples out of the
TBMIterateResult in a future commit.
Discussion: https://postgr.es/m/CA%2BhUKGLHbKP3jwJ6_%2BhnGi37Pw3BD5j2amjV3oSk7j-KyCnY7Q%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although they're exposed as int4 in pg_class, relpages and
relallvisible are really of type BlockNumber, that is uint32.
Correct type puns in relation_statistics_update() and remove
inappropriate range-checks. The type puns are only cosmetic
issues, but the range checks would cause failures with huge
relations.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/614341.1740269035@sss.pgh.pa.us
|
|
|
|
|
|
|
| |
So far the various dependencies were documented in the comment above
MAX_BACKENDS, but not checked.
Discussion: https://postgr.es/m/CA+COZaBO_s3LfALq=b+HcBHFSOEGiApVjrRacCe4VP9m7CJsNQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Jacob reported that comments for LW_SHARED_MASK referenced a MAX_BACKENDS
limit of 2^23-1, but that MAX_BACKENDS is actually limited to 2^18-1. The
limit was lowered in 48354581a49c, but the comment in lwlock.c wasn't updated.
Instead of just fixing the comment, it seems better to directly base the
lwlock defines on MAX_BACKENDS and add static assertions to ensure that there
is enough space. That way there's no comment that can go out of sync in the
future.
As part of that change I noticed that for some reason the high bit wasn't used
for flags, which seems somewhat odd. Redefine the flag values to start at the
highest bit.
Reported-by: Jacob Brazeal <jacob.brazeal@gmail.com>
Reviewed-by: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://postgr.es/m/CA+COZaBO_s3LfALq=b+HcBHFSOEGiApVjrRacCe4VP9m7CJsNQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MAX_BACKENDS influences many things besides postmaster. I e.g. noticed that we
don't have static assertions ensuring BUF_REFCOUNT_MASK is big enough for
MAX_BACKENDS, adding them would require including postmaster.h in
buf_internals.h which doesn't seem right.
While at that, add MAX_BACKENDS_BITS, as that's useful in various places for
static assertions (to be added in subsequent commits).
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/wptizm4qt6yikgm2pt52xzyv6ycmqiutloyvypvmagn7xvqkce@d4xuv3mylpg4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The four following attributes are removed from pg_stat_wal:
* wal_write
* wal_sync
* wal_write_time
* wal_sync_time
a051e71e28a1 has added an equivalent of this information in pg_stat_io
with more granularity as this now spreads across the backend types, IO
context and IO objects. So, keeping the same information in pg_stat_wal
has little benefits.
Another benefit of this commit is the removal of PendingWalStats,
simplifying an upcoming patch to add per-backend WAL statistics, which
already support IO statistics and which have access to the write/sync
stats data of WAL.
The GUC track_wal_io_timing, that was used to enable or disable the
aggregation of the write and sync timings for WAL, is also removed.
pgstat_prepare_io_time() is simplified.
Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the update of PgStat_WalStats.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
|
|
|
|
|
|
|
|
| |
Change some backend libpq functions to take void * for binary data
instead of char *. This removes the need for numerous casts.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
|
|
|
|
|
|
| |
Change internal snapbuild API function to take void * for binary data
instead of char *. This removes the need for numerous casts.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
|
|
|
|
|
|
| |
Change some internal jsonb API functions to take void * for binary
data instead of char *. This removes the need for numerous casts.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To implement AIO writes, the backend initiating writes needs to transfer the
lock ownership to the AIO subsystem, so the lock held during the write can be
released in another backend.
Other backends need to be able to "complete" an asynchronously started IO to
avoid deadlocks (consider e.g. one backend starting IO for a buffer and then
waiting for a heavyweight lock held by another relation followed by the
current holder of the heavyweight lock waiting for the IO to complete).
To that end, this commit adds LWLockDisown() and LWLockReleaseDisowned(). If
code uses LWLockDisown() it's the code's responsibility to ensure that the
lock is released in case of errors.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Concurrently dropping either the granted role or the grantee
does not stop GRANT from completing, instead resulting in a
dangling role reference in pg_auth_members. That's relatively
harmless in the short run, but inconsistent catalog entries
are not a good thing.
This patch solves the problem by adding the granted and grantee
roles as explicit shared dependencies of the pg_auth_members entry.
That's a bit indirect, but it works because the pg_shdepend code
applies the necessary locking and rechecking.
Commit 6566133c5 previously established similar handling for
the grantor column of pg_auth_members; it's not clear why it
didn't cover the other two role OID columns.
A side-effect of this approach is that DROP OWNED BY will now drop
pg_auth_members entries that mention the target role as either the
granted or grantee role. That's clearly appropriate for the
grantee, since we'll drop its other privileges too. It doesn't
seem too far out of line for the granted role, since we're
presumably about to drop it and besides we're removing all reasons
why it'd matter to be a member of it. (One could argue that this
makes DropRole's code to auto-drop pg_auth_members entries
unnecessary, but I chose to leave it in place since perhaps some
people's workflows expect that to work without a DROP OWNED BY.)
Note to patch readers: CreateRole's first CommandCounterIncrement
call is now unconditional, because this change creates another
case in which it's needed, and it seemed to be more trouble than
it's worth to preserve that micro-optimization.
Arguably this is a bug fix, but the fact that it changes the
expected contents of pg_shdepend seems like not a great thing
to do in the stable branches, and perhaps we don't want the
change in DROP OWNED BY semantics there either. On the other
hand, I opted not to force a catversion bump in HEAD, because
the presence or absence of these entries doesn't matter for
most purposes.
Reported-by: Virender Singla <virender.cse@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/CAM6Zo8woa62ZFHtMKox6a4jb8qQ=w87R2L0K8347iE-juQL2EA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When nloops > 1, we now display two digits after the decimal point,
rather than none. This is important because what we print is actually
planstate->instrument->ntuples / nloops, and sometimes what you want
to know is planstate->instrument->ntuples. You can estimate that by
multiplying the displayed row count by the displayed nloops value, but
the fact that the displayed value is rounded makes that inexact. It's
still inexact even if we show these two extra decimal places, but less
so. Perhaps we will agree on a way to further improve this output later,
but for now this seems better than doing nothing.
Author: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Naeem Akhter <akhternaeem@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@percona.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrei Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Guillaume Lelarge <guillaume@lelarge.info>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: http://postgr.es/m/603c8f070905281830g2e5419c4xad2946d149e21f9d%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The signedness of the 'char' type in C is
implementation-dependent. For instance, 'signed char' is used by
default on x86 CPUs, while 'unsigned char' is used on aarch
CPUs. Previously, we accidentally let C implementation signedness
affect persistent data. This led to inconsistent results when
comparing char data across different platforms.
This commit introduces a new 'default_char_signedness' field in
ControlFileData to store the signedness of the 'char' type. While this
change does not encourage the use of 'char' without explicitly
specifying its signedness, this field can be used as a hint to ensure
consistent behavior for pre-v18 data files that store data sorted by
the 'char' type on disk (e.g., GIN and GiST indexes), especially in
cross-platform replication scenarios.
Newly created database clusters unconditionally set the default char
signedness to true. pg_upgrade (with an upcoming commit) changes this
flag for clusters if the source database cluster has
signedness=false. As a result, signedness=false setting will become
rare over time. If we had known about the problem during the last
development cycle that forced initdb (v8.3), we would have made all
clusters signed or all clusters unsigned. Making pg_upgrade the only
source of signedness=false will cause the population of database
clusters to converge toward that retrospective ideal.
Bump catalog version (for the catalog changes) and PG_CONTROL_VERSION
(for the additions in ControlFileData).
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows using text position search functions with nondeterministic
collations. These functions are
- position, strpos
- replace
- split_part
- string_to_array
- string_to_table
which all use common internal infrastructure.
There was previously no internal implementation of this, so it was met
with a not-supported error. This adds the internal implementation and
removes the error.
Unlike with deterministic collations, the search cannot use any
byte-by-byte optimized techniques but has to go substring by
substring. We also need to consider that the found match could have a
different length than the needle and that there could be substrings of
different length matching at a position. In most cases, we need to
find the longest such substring (greedy semantics), but this can be
configured by each caller.
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://www.postgresql.org/message-id/flat/582b2613-0900-48ca-8b0d-340c06f4d400@eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, a WARNING was issued at the time of defining a subscription
with origin=NONE only when the publisher subscribed to the same table from
other publishers, indicating potential data origination from different
origins. However, the publisher can subscribe to the partition ancestors
or partition children of the table from other publishers, which could also
result in mixed-origin data inclusion. So, give a WARNING in those cases
as well.
Reported-by: Sergey Tatarintsev <s.tatarintsev@postgrespro.ru>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/5eda6a9c-63cf-404d-8a49-8dcb116a29f3@postgrespro.ru
|
|
|
|
|
|
|
|
|
|
|
| |
NO INDENT is the default, and is added if no explicit indentation
flag was provided with XMLSERIALIZE().
Oversight in 483bdb2afec9.
Author: Jim Jones <jim.jones@uni-muenster.de>
Discussion: https://postgr.es/m/bebd457e-5b43-46b3-8fc6-f6a6509483ba@uni-muenster.de
Backpatch-through: 16
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The type argument wasn't actually really necessary. It was a remnant
of converting the API of the gist strategy translation from using
opclass to using opfamily+opcintype (commits c09e5a6a016,
622f678c102). For looking up the gist translation function, we used
the convention "amproclefttype = amprocrighttype = opclass's
opcintype" (see pg_amproc.h). But each operator family should only
have one translation function, and getting the right type for the
lookup is sometimes cumbersome and fragile, so this is all
unnecessarily complicated.
To simplify this, change the gist stategy support procedure to take
"any", "any" as argument. (This is arbitrary but seems intuitive.
The alternative of using InvalidOid as argument(s) upsets various DDL
commands, so it's not practical.) Then we don't need opcintype for
the lookup, and we can remove it from all the API layers introduced by
commit c09e5a6a016.
This also adds some more documentation about the correct signature of
the gist support function and adds more checks in gistvalidate().
This was previously underspecified. (It relied implicitly on
convention mentioned above.)
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
|
|
|
|
|
|
| |
Change backend launcher functions to take void * for binary data
instead of char *. This removes the need for numerous casts.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
|
|
|
|
| |
Use RelationGetIndexExpressions() rather than rd_indexprs directly.
Author: Corey Huinker <corey.huinker@gmail.com>
|
|
|
|
|
|
|
|
| |
Remove a number of (char *) casts that are unnecessary. Or in some
cases, rewrite the code to make the purpose of the cast clearer.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|