| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The signedness of the 'char' type in C is
implementation-dependent. For instance, 'signed char' is used by
default on x86 CPUs, while 'unsigned char' is used on aarch
CPUs. Previously, we accidentally let C implementation signedness
affect persistent data. This led to inconsistent results when
comparing char data across different platforms.
This commit introduces a new 'default_char_signedness' field in
ControlFileData to store the signedness of the 'char' type. While this
change does not encourage the use of 'char' without explicitly
specifying its signedness, this field can be used as a hint to ensure
consistent behavior for pre-v18 data files that store data sorted by
the 'char' type on disk (e.g., GIN and GiST indexes), especially in
cross-platform replication scenarios.
Newly created database clusters unconditionally set the default char
signedness to true. pg_upgrade (with an upcoming commit) changes this
flag for clusters if the source database cluster has
signedness=false. As a result, signedness=false setting will become
rare over time. If we had known about the problem during the last
development cycle that forced initdb (v8.3), we would have made all
clusters signed or all clusters unsigned. Making pg_upgrade the only
source of signedness=false will cause the population of database
clusters to converge toward that retrospective ideal.
Bump catalog version (for the catalog changes) and PG_CONTROL_VERSION
(for the additions in ControlFileData).
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The type argument wasn't actually really necessary. It was a remnant
of converting the API of the gist strategy translation from using
opclass to using opfamily+opcintype (commits c09e5a6a016,
622f678c102). For looking up the gist translation function, we used
the convention "amproclefttype = amprocrighttype = opclass's
opcintype" (see pg_amproc.h). But each operator family should only
have one translation function, and getting the right type for the
lookup is sometimes cumbersome and fragile, so this is all
unnecessarily complicated.
To simplify this, change the gist stategy support procedure to take
"any", "any" as argument. (This is arbitrary but seems intuitive.
The alternative of using InvalidOid as argument(s) upsets various DDL
commands, so it's not practical.) Then we don't need opcintype for
the lookup, and we can remove it from all the API layers introduced by
commit c09e5a6a016.
This also adds some more documentation about the correct signature of
the gist support function and adds more checks in gistvalidate().
This was previously underspecified. (It relied implicitly on
convention mentioned above.)
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
|
|
|
|
|
|
| |
Remove a number of (char *) casts that are unnecessary. Or in some
cases, rewrite the code to make the purpose of the cast clearer.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the requested recovery timeline is not reachable, the logged
checkpoint and timeline should to be the values read from the
backup_label when it is defined. The message generated used the values
from the control file in this case, which is fine when recovering from
the control file without a backup_label, but not if there is a
backup_label.
Issue introduced in ee994272ca50. v15 has introduced xlogrecovery.c and
more simplifications in this area (4a92a1c3d1c3, a27048cbcb58), making
this change a bit simpler to think about, so backpatch only down to this
version.
Author: David Steele <david@pgbackrest.org>
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Benoit Lobréau <benoit.lobreau@dalibo.com>
Discussion: https://postgr.es/m/c3d617d4-1696-4aa7-8a4d-5a7d19cc5618@pgbackrest.org
Backpatch-through: 15
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since commit 29cf61ade3, table_block_relation_estimate_size() considers
fillfactor when estimating number of rows in a relation before the first
ANALYZE. The formula however did not consider tuples may be larger than
available space determined by fillfactor, ending with density 0. This
ultimately means the relation was estimated to contain a single row.
The executor however places at least one tuple per page, even with very
low fillfactor values, so the density should be at least 1. Fixed by
clamping the density estimate using clamp_row_est().
Reported by Heikki Linnakangas. Fix by me, with regression test inspired
by example provided by Heikki.
Backpatch to 17, where the issue was introduced.
Reported-by: Heikki Linnakangas
Backpatch-through: 17
Discussion: https://postgr.es/m/2bf9d973-7789-4937-a7ca-0af9fb49c71e@iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Logical replication crashes if the subscriber's partitioned table
has a BRIN index. There are two independently blamable causes,
and this patch fixes both:
1. brininsertcleanup fails if called twice for the same IndexInfo,
because it half-destroys its BrinInsertState but leaves it still
linked from ii_AmCache. brininsert would also fail in that state,
so it's pretty hard to see any advantage to this coding. Fully
remove the BrinInsertState, instead, so that a new brininsert
call would create a new cache.
2. A logical replication subscriber sometimes does ExecOpenIndices
twice on the same ResultRelInfo, followed by doing ExecCloseIndices
twice; the second call reaches the brininsertcleanup bug. Quite
aside from tickling unexpected cases in aminsertcleanup methods,
this seems very wasteful, because the IndexInfos built in the
first ExecOpenIndices call are just lost during the second call,
and have to be rebuilt at possibly-nontrivial cost. We should
establish a coding rule that you don't do that.
The problematic coding is that when the target table is partitioned,
apply_handle_tuple_routing calls ExecFindPartition which does
ExecOpenIndices (and expects that ExecCleanupTupleRouting will
close the indexes again). Using the ResultRelInfo made by
ExecFindPartition, it calls apply_handle_delete_internal or
apply_handle_insert_internal, both of which think they need to do
ExecOpenIndices/ExecCloseIndices for themselves. They do in the main
non-partitioned code paths, but not here. The simplest fix is to pull
their ExecOpenIndices/ExecCloseIndices calls out and put them in the
call sites for the non-partitioned cases. (We could have refactored
apply_handle_update_internal similarly, but I did not do so today
because there's no bug there: the partitioned code path doesn't
call it.)
Also, remove the always-duplicative open/close calls within
apply_handle_tuple_routing itself.
Since brininsertcleanup and indeed the whole aminsertcleanup mechanism
are new in v17, there's no observable bug in older branches. A case
could be made for trying to avoid these duplicative open/close calls
in the older branches, but for now it seems not worth the trouble and
risk of new bugs.
Bug: #18815
Reported-by: Sergey Belyashov <sergey.belyashov@gmail.com>
Discussion: https://postgr.es/m/18815-2a0407cc7f40b327@postgresql.org
Backpatch-through: 17
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit introduces idle_replication_slot_timeout GUC that allows
inactive slots to be invalidated at the time of checkpoint. Because
checkpoints happen checkpoint_timeout intervals, there can be some lag
between when the idle_replication_slot_timeout was exceeded and when the
slot invalidation is triggered at the next checkpoint. To avoid such lags,
users can force a checkpoint to promptly invalidate inactive slots.
Note that the idle timeout invalidation mechanism is not applicable for
slots that do not reserve WAL or for slots on the standby server that are
synced from the primary server (i.e., standby slots having 'synced' field
'true'). Synced slots are always considered to be inactive because they
don't perform logical decoding to produce changes.
The slots can become inactive for a long period if a subscriber is down
due to a system error or inaccessible because of network issues. If such a
situation persists, it might be more practical to recreate the subscriber
rather than attempt to recover the node and wait for it to catch up which
could be time-consuming.
Then, external tools could create replication slots (e.g., for migrations
or upgrades) that may fail to remove them if an error occurs, leaving
behind unused slots that take up space and resources. Manually cleaning
them up can be tedious and error-prone, and without intervention, these
lingering slots can cause unnecessary WAL retention and system bloat.
As the duration of idle_replication_slot_timeout is in minutes, any test
using that would be time-consuming. We are planning to commit a follow up
patch for tests by using the injection point framework.
Author: Nisha Moond <nisha.moond412@gmail.com>
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB5716C131A7D80DAE8CB9E88794FC2@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move lazy_scan_heap()'s per_buffer_data variable into a tighter scope.
In lazy_scan_heap()'s phase I heap vacuuming, the read stream API
returns a pointer to the next block number to vacuum. As long as
read_stream_next_buffer() returns a valid buffer, per_buffer_data should
always be valid.
Move per_buffer_data into a tighter scope and make sure it is reset to
NULL on each iteration so that we get a core dump instead of bogus data
from a previous block if something goes wrong in the read stream API.
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/626104.1739729538%40sss.pgh.pa.us
|
|
|
|
|
| |
This commit reverts 6a2275b895. Buildfarm failure on batta spots some
concurrency issue, which requires further investigation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds the information about the number of times WAL buffers
have been full to the logs generated by VACUUM/ANALYZE (VERBOSE) and in
the logs generated by autovacuum, complementing the existing information
stored by WalUsage.
This is the last part of the backend code where the value of
wal_buffers_full can be reported, similarly to all the other fields of
WalUsage. 320545bfcfee and ce5bcc4a9f26 have done the same for EXPLAIN
and pgss.
Author: Bertrand Drouvot
Reviewed-by: Ilia Evdokimov
Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
wal_buffers_full has been introduced in pg_stat_wal in 8d9a935965f, as
some information providing metrics for the tuning of the GUC
wal_buffers. WalUsage has been introduced before that in df3b181499.
Moving this field is proving to be beneficial for several reasons:
- This information can now be made available in more layers, providing
more granularity than just pg_stat_wal, on a per-query basis: EXPLAIN,
pgss and VACUUM/ANALYZE logs.
- A patch is under discussion to provide statistics for WAL at backend
level, and this move simplifies a bit the handling of pending
statistics. The remaining data in PgStat_PendingWalStats now relates to
write/sync counters and times, with equivalents present in pg_stat_io,
that backend statistics are able to already track. So this should cut
all the dependencies between PgStat_PendingWalStats and WAL stats at
backend level.
As of this change, wal_buffers_full only shows in pg_stat_wal.
Author: Bertrand Drouvot
Reviewed-by: Ilia Evdokimov
Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow multiple backends to initialize WAL buffers concurrently. This way
`MemSet((char *) NewPage, 0, XLOG_BLCKSZ);` can run in parallel without
taking a single LWLock in exclusive mode.
The new algorithm works as follows:
* reserve a page for initialization using XLogCtl->InitializeReserved,
* ensure the page is written out,
* once the page is initialized, try to advance XLogCtl->InitializedUpTo and
signal to waiters using XLogCtl->InitializedUpToCondVar condition
variable,
* repeat previous steps until we reserve initialization up to the target
WAL position,
* wait until concurrent initialization finishes using a
XLogCtl->InitializedUpToCondVar.
Now, multiple backends can, in parallel, concurrently reserve pages,
initialize them, and advance XLogCtl->InitializedUpTo to point to the latest
initialized page.
Author: Yura Sokolov <y.sokolov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Commit bb8dff9995 added this information to the
pg_stat_progress_vacuum and pg_stat_progress_analyze system views.
This commit adds the same information to the output of VACUUM and
ANALYZE with the VERBOSE option and to the autovacuum logs.
Suggested-by: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
|
|
|
|
|
|
|
| |
Commit a99cc6c6b4 introduced the PqMsg_Progress macro but missed
updating HandleParallelMessage() accordingly.
Backpatch-through: 17
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make vacuum's third phase (its second pass over the heap), which reaps
dead items collected in the first phase and marks them as reusable, use
the read stream API. This commit adds a new read stream callback,
vacuum_reap_lp_read_stream_next(), that looks ahead in the TidStore and
returns the next block number to read for vacuum.
Author: Melanie Plageman <melanieplageman@gmail.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGKN3oy0bN_3yv8hd78a4%2BM1tJC9z7mD8%2Bf%2ByA%2BGeoFUwQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
Make vacuum's first phase, which prunes and freezes tuples and records
dead TIDs, use the read stream API by by converting
heap_vac_scan_next_block() to a read stream callback.
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_aLwANZpxHc0tC-6OT0OQT4TftDGkKAO5yigMUOv_Tcsw%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The read stream API only allows one piece of extra per block state to be
passed back to the API user (per_buffer_data). lazy_scan_heap() needs
two pieces of per-buffer data: whether or not the block was all-visible
in the visibility map and whether or not it was eagerly scanned.
Convert these two pieces of information to flags so that they can be
populated by heap_vac_scan_next_block() and returned to
lazy_scan_heap(). A future commit will turn heap_vac_scan_next_block()
into the read stream callback for heap phase I vacuuming.
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_bmx33jTqATP5GKNFYwAg02a9dDtk4U_ciEjgBHZSVkOQ%40mail.gmail.com
|
|
|
|
|
|
|
|
| |
Remove (char *) casts no longer needed after XLogRegisterData() and
XLogRegisterBufData() argument type change.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
|
|
|
|
|
|
|
| |
Change XLogRegisterData() and XLogRegisterBufData() functions to take
void * for binary data instead of char *. This will remove the need
for numerous casts (done in a separate commit for clarity).
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
|
|
|
|
|
|
|
|
| |
Remove (char *) casts around memory functions such as memcmp(),
memcpy(), or memset() where the cast is useless. Since these
functions don't take char * arguments anyway, these casts are at best
complicated casts to (void *), about which see commit 7f798aca1d5.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function is used in both vacuum and analyze code paths, and a
follow-up commit will require distinguishing between the two. This
commit forces callers to specify whether they are in a vacuum or
analyze path, but it does not use that information for anything
yet.
Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Aggressive vacuums must scan every unfrozen tuple in order to advance
the relfrozenxid/relminmxid. Because data is often vacuumed before it is
old enough to require freezing, relations may build up a large backlog
of pages that are set all-visible but not all-frozen in the visibility
map. When an aggressive vacuum is triggered, all of these pages must be
scanned. These pages have often been evicted from shared buffers and
even from the kernel buffer cache. Thus, aggressive vacuums often incur
large amounts of extra I/O at the expense of foreground workloads.
To amortize the cost of aggressive vacuums, eagerly scan some
all-visible but not all-frozen pages during normal vacuums.
All-visible pages that are eagerly scanned and set all-frozen in the
visibility map are counted as successful eager freezes and those not
frozen are counted as failed eager freezes.
If too many eager scans fail in a row, eager scanning is temporarily
suspended until a later portion of the relation. The number of failures
tolerated is configurable globally and per table.
To effectively amortize aggressive vacuums, we cap the number of
successes as well. Capping eager freeze successes also limits the amount
of potentially wasted work if these pages are modified again before the
next aggressive vacuum. Once we reach the maximum number of blocks
successfully eager frozen, eager scanning is disabled for the remainder
of the vacuum of the relation.
Original design idea from Robert Haas, with enhancements from
Andres Freund, Tomas Vondra, and me
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new variant of generated columns that are computed on read
(like a view, unlike the existing stored generated columns, which are
computed on write, like a materialized view).
The syntax for the column definition is
... GENERATED ALWAYS AS (...) VIRTUAL
and VIRTUAL is also optional. VIRTUAL is the default rather than
STORED to match various other SQL products. (The SQL standard makes
no specification about this, but it also doesn't know about VIRTUAL or
STORED.) (Also, virtual views are the default, rather than
materialized views.)
Virtual generated columns are stored in tuples as null values. (A
very early version of this patch had the ambition to not store them at
all. But so much stuff breaks or gets confused if you have tuples
where a column in the middle is completely missing. This is a
compromise, and it still saves space over being forced to use stored
generated columns. If we ever find a way to improve this, a bit of
pg_upgrade cleverness could allow for upgrades to a newer scheme.)
The capabilities and restrictions of virtual generated columns are
mostly the same as for stored generated columns. In some cases, this
patch keeps virtual generated columns more restricted than they might
technically need to be, to keep the two kinds consistent. Some of
that could maybe be relaxed later after separate careful
considerations.
Some functionality that is currently not supported, but could possibly
be added as incremental features, some easier than others:
- index on or using a virtual column
- hence also no unique constraints on virtual columns
- extended statistics on virtual columns
- foreign-key constraints on virtual columns
- not-null constraints on virtual columns (check constraints are supported)
- ALTER TABLE / DROP EXPRESSION
- virtual column cannot have domain type
- virtual columns are not supported in logical replication
The tests in generated_virtual.sql have been copied over from
generated_stored.sql with the keyword replaced. This way we can make
sure the behavior is mostly aligned, and the differences can be
visible. Some tests for currently not supported features are
currently commented out.
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Tested-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One way autovacuum chooses tables to vacuum is by comparing the
number of updated or deleted tuples with a value calculated using
autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor.
The threshold specifies the base value for comparison, and the
scale factor specifies the fraction of the table size to add to it.
This strategy ensures that smaller tables are vacuumed after fewer
updates/deletes than larger tables, which is reasonable in many
cases but can result in infrequent vacuums on very large tables.
This is undesirable for a couple of reasons, such as very large
tables incurring a huge amount of bloat between vacuums.
This new parameter provides a way to set a limit on the value
calculated with autovacuum_vacuum_threshold and
autovacuum_vacuum_scale_factor so that very large tables are
vacuumed more frequently. By default, it is set to 100,000,000
tuples, but it can be disabled by setting it to -1. It can also be
adjusted for individual tables by changing storage parameters.
Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Frédéric Yhuel <frederic.yhuel@dalibo.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Reviewed-by: Joe Conway <mail@joeconway.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Vinícius Abrahão <vinnix.bsd@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: https://postgr.es/m/956435f8-3b2f-47a6-8756-8c54ded61802%40dalibo.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds WAL IO stats to both pg_stat_io view and per-backend IO
statistics (pg_stat_get_backend_io()). This change is possible since
f92c854cf406, as WAL IO is not counted in blocks in some code paths
where its stats data is measured (like WAL read in xlogreader.c).
IOContext gains IOCONTEXT_INIT and IOObject IOOBJECT_WAL, with the
following combinations allowed:
- IOOBJECT_WAL/IOCONTEXT_NORMAL is used to track I/O operations done on
already-created WAL segments.
- IOOBJECT_WAL/IOCONTEXT_INIT is used for tracking I/O operations done
when initializing WAL segments.
The core changes are done in pg_stat_io.c, backend statistics inherit
them. Backend statistics and pg_stat_io are now available for the WAL
writer, the WAL receiver and the WAL summarizer processes.
I/O timing data is controlled by the GUC track_io_timing, like the
existing data of pg_stat_io for consistency. The timings related to
IOOBJECT_WAL show up if the GUC is enabled (disabled by default).
Bump pgstats file version, due to the additions in IOObject and
IOContext, impacting the amount of data written for the fixed-numbered
IO stats kind in the pgstats file.
Author: Nazir Bilal Yavuz
Reviewed-by: Bertrand Drouvot, Nitin Jadhav, Amit Kapila, Michael
Paquier, Melanie Plageman, Bharath Rupireddy
Discussion: https://postgr.es/m/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This turns GistTranslateCompareType() into a callback function of the
gist index AM instead of a standalone function. The existing callers
are changed to use IndexAmTranslateCompareType(). This then makes
that code not hardcoded toward gist.
This means in particular that the temporal keys code is now
independent of gist. Also, this generalizes commit 74edabce7a3, so
other index access methods other than the previously hardcoded ones
could now work as REPLICA IDENTITY in a logical replication
subscriber.
Author: Mark Dilger <mark.dilger@enterprisedb.com>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
|
|
| |
"recovery" is not a verb. Introduced in 68cb5af46cd8.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For each Index AM, provide a mapping between operator strategies and
the system-wide generic concept of a comparison type. For example,
for btree, BTLessStrategyNumber maps to and from COMPARE_LT. Numerous
places in the planner and executor think directly in terms of btree
strategy numbers (and a few in terms of hash strategy numbers.) These
should be converted over subsequent commits to think in terms of
CompareType instead. (This commit doesn't make any use of this API
yet.)
Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
|
|
|
|
|
|
| |
This refactors and simplifies various existing code to make use of the
new function.
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
| |
Follow up to commit 630f9a43cec. The previous name had become
confusing, because it doesn't actually translate a strategy number but
a CompareType into a strategy number. We might add the inverse at
some point, which would then probably be called something like
GistTranslateStratnum.
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consistently use "Size" (or size_t, or in some places int64 or double)
as the type for variables holding memory allocation sizes. In most
places variables' data types were fine already, but we had an ancient
habit of computing bytes from kilobytes-units GUCs with code like
"work_mem * 1024L". That risks overflow on Win64 where they did not
make "long" as wide as "size_t". We worked around that by restricting
such GUCs' ranges, so you couldn't set work_mem et al higher than 2GB
on Win64. This patch removes that restriction, after replacing such
calculations with "work_mem * (Size) 1024" or variants of that.
It should be noted that this patch was constructed by searching
outwards from the GUCs that have MAX_KILOBYTES as upper limit.
So I can't positively guarantee there are no other places doing
memory-size arithmetic in int or long variables. I do however feel
pretty confident that increasing MAX_KILOBYTES on Win64 is safe now.
Also, nothing in our code should be dealing in multiple-gigabyte
allocations without authorization from a relevant GUC, so it seems
pretty likely that this search caught everything that could be at
risk of overflow.
Author: Vladlen Popolitov <v.popolitov@postgrespro.ru>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1a01f0-66ec2d80-3b-68487680@27595217
|
|
|
|
|
|
|
|
| |
Some places spelled it "it's", which is short for "it is".
In passing, fix a couple other nearby grammatical errors.
Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://postgr.es/m/CA+COZaAO8g1KJCV0T48=CkJMjAnnfTGLWOATz+2aCh40c2Nm+g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds four fields to the statistics of relations, aggregating
the amount of time spent for each operation on a relation:
- total_vacuum_time, for manual vacuum.
- total_autovacuum_time, for vacuum done by the autovacuum daemon.
- total_analyze_time, for manual analyze.
- total_autoanalyze_time, for analyze done by the autovacuum daemon.
This gives users the option to derive the average time spent for these
operations with the help of the related "count" fields.
Bump catalog version (for the catalog changes) and PGSTAT_FILE_FORMAT_ID
(for the additions in PgStat_StatTabEntry).
Author: Sami Imseih
Reviewed-by: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/CAA5RZ0uVOGBYmPEeGF2d1B_67tgNjKx_bKDuL+oUftuoz+=Y1g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The right mix of DDL and VACUUM could corrupt a catalog page header such
that PageIsVerified() durably fails, requiring a restore from backup.
This affects only catalogs that both have a syscache and have DDL code
that uses syscache tuples to construct updates. One of the test
permutations shows a variant not yet fixed.
This makes !TransactionIdIsValid(TM_FailureData.xmax) possible with
TM_Deleted. I think core and PGXN are indifferent to that.
Per bug #17821 from Alexander Lakhin. Back-patch to v13 (all supported
versions). The test case is v17+, since it uses INJECTION_POINT.
Discussion: https://postgr.es/m/17821-dd8c334263399284@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
| |
Assume twophase.c is the performance-sensitive caller, and preserve its
choice of unlikely() branch hint. Add some retrospective rationale for
that choice. Back-patch to v17, for the next commit to use it.
Reviewed (in earlier versions) by Michael Paquier.
Discussion: https://postgr.es/m/17821-dd8c334263399284@postgresql.org
Discussion: https://postgr.es/m/20250116010051.f3.nmisch@google.com
|
|
|
|
| |
Discussion: https://www.postgresql.org/message-id/flat/5b558da8-99fb-0a99-83dd-f72f05388517@enterprisedb.com
|
|
|
|
| |
Discussion: https://www.postgresql.org/message-id/flat/692ee0da-49da-4d32-8dca-da224cc2800e@eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
XLogPageRead() checks immediately for an invalid WAL record header on a
standby, to be able to handle the case of continuation records that need
to be read across two different sources. As written, the check was too
generic, applying to any target LSN. Based on an analysis by Kyotaro
Horiguchi, what really matters is to make sure that the page header is
checked when attempting to read a LSN at the boundary of a segment, to
handle the case of a continuation record that spawns across multiple
pages when dealing with multiple segments, as WAL receivers are spawned
they request WAL from the beginning of a segment. This fix has been
proposed by Kyotaro Horiguchi.
This could cause standbys to loop infinitely when dealing with a
continuation record during a timeline jump, in the case where the
contents of the record in the follow-up page are invalid.
Some regression tests are added to check such scenarios, able to
reproduce the original problem. In the test, the contents of a
continuation record are overwritten with junk zeros on its follow-up
page, and replayed on standbys. This is inspired by 039_end_of_wal.pl,
and is enough to show how standbys should react on promotion by not
being stuck. Without the fix, the test would fail with a timeout. The
test to reproduce the problem has been written by Alexander Kukushkin.
The original check has been introduced in 066871980183, for a similar
problem.
Author: Kyotaro Horiguchi, Alexander Kukushkin
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAFh8B=mozC+e1wGJq0H=0O65goZju+6ab5AU7DEWCSUA2OtwDg@mail.gmail.com
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit reverts 8f67f994e8ea (down to v13) and c3de0f9eed38 (down to
v17), as these are proving to not be completely correct regarding two
aspects:
- In v17 and newer branches, c3de0f9eed38's check for epoch handling is
incorrect, and does not correctly handle frozen epochs. A logic closer
to widen_snapshot_xid() should be used. The 2PC code should try to
integrate deeper with FullTransactionIds, 5a1dfde8334b being not enough.
- In v13 and newer branches, 8f67f994e8ea is a workaround for the real
issue, which is that we should not attempt CLOG lookups without reaching
consistency. This exists since 728bd991c3c4, and this is reachable with
ProcessTwoPhaseBuffer() called by restoreTwoPhaseData() at the beginning
of recovery.
Per discussion with Noah Misch.
Discussion: https://postgr.es/m/20250116010051.f3.nmisch@google.com
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the several members of HeapScanDescData which are specific to
Bitmap Heap Scans into a new struct, BitmapHeapScanDescData, which
inherits from HeapScanDescData.
This reduces the size of the HeapScanDescData for other types of scans
and will allow us to add additional bitmap heap scan-specific members in
the future without fear of bloating the HeapScanDescData.
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/c736f6aa-8b35-4e20-9621-62c7c82e2168%40vondra.me
|
|
|
|
|
| |
Oversight in commit 5bf748b8, which enhanced nbtree ScalarArrayOp
execution.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add more comments at the top of vacuumlazy.c on heap relation vacuuming
implementation.
Previously vacuumlazy.c only had details related to dead TID storage.
This commit adds a more general summary to help future developers
understand the heap relation vacuum design and implementation at a high
level.
Reviewed-by: Alena Rybakina, Robert Haas, Andres Freund, Bilal Yavuz
Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes commit 7406ab623fe in that the gist strategy number
mapping support function is changed to use the CompareType enum as
input, instead of the "well-known" RT*StrategyNumber strategy numbers.
This is a bit cleaner, since you are not dealing with two sets of
strategy numbers. Also, this will enable us to subsume this system
into a more general system of using CompareType to define operator
semantics across index methods.
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function expects an "int64" as result and stores the number of
pages to add to the index scan bitmap as an "int", multiplying its final
result by 10. For a relation large enough, this can theoretically
overflow if counting more than (INT32_MAX / 10) pages, knowing that the
number of pages is upper-bounded by MaxBlockNumber.
To avoid the overflow, this commit redefines "totalpages", used to
calculate the result, to be an "int64" rather than an "int".
Reported-by: Evgeniy Gorbanyov
Author: James Hunter
Discussion: https://www.postgresql.org/message-id/07704817-6fa0-460c-b1cf-cd18f7647041@basealt.ru
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quite a bit of code within nbtutils.c is only called during nbtree
preprocessing. Move that code into a new .c file, nbtpreprocesskeys.c.
Also reorder some of the functions within the new file for clarity.
This commit has no functional impact. It is strictly mechanical.
Author: Peter Geoghegan <pg@bowt.ie>
Suggested-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CAH2-WznwNn1BDOpWxHBUK1f3Rdw8pO9UCenWXnvT=n9GO8GnLA@mail.gmail.com
Discussion: https://postgr.es/m/86930045-5df5-494a-b4f1-815bc3fbcce0%40iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for the NOT ENFORCED/ENFORCED flag for constraints,
with support for check constraints.
The plan is to eventually support this for foreign key constraints,
where it is typically more useful.
Note that CHECK constraints do not currently support ALTER operations,
so changing the enforceability of an existing constraint isn't
possible without dropping and recreating it. This could be added
later.
Author: Amul Sul <amul.sul@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Tested-by: Triveni N <triveni.n@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
6f3820f37 adjusted the assert-enabled validation of the CompactAttribute
to call a new external function to perform the validation. That commit
made it so the function was only available when building with
USE_ASSERT_CHECKING, and because TupleDescCompactAttr() is a static
inline function, the call to verify_compact_attribute() was compiled
into any extension which uses TupleDescCompactAttr(). This caused issues
for such extensions when loading the assert-enabled extension into
PostgreSQL versions without asserts enabled due to that function being
unavailable in core.
To fix this, make verify_compact_attribute() available unconditionally,
but make it do nothing unless building with USE_ASSERT_CHECKING.
Author: Andrew Kane <andrew@ankane.org>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAOdR5yHfMEMW00XGo=v1zCVUS6Huq2UehXdvKnwtXPTcZwXhmg@mail.gmail.com
|
|
|
|
|
|
| |
Oversight in commit 1bd4bc85, which made nbtree backwards scans operate
off of a copy of each page's left link as of the time of its call to
_bt_readpage.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The originally submitted code (using bit masking) was correct when the
number of slots was restricted to be a power of two -- but that
limitation was removed during development that led to commit
53c2a97a9266, which made the bank selection code incorrect. This led to
always using a smaller number of banks than available. Change said code
to use integer modulo instead, which works correctly with an arbitrary
number of banks.
It's likely that we could improve on this to avoid runtime use of
integer division. But with this change we're, at least, not wasting
memory on unused banks, and more banks mean less contention, which is
likely to have a much higher performance impact than a single
instruction's latency.
Author: Yura Sokolov <y.sokolov@postgrespro.ru>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/9444dc46-ca47-43ed-9058-89c456316306@postgrespro.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move nbtree's detection of RowCompare quals that are unsatisfiable due
to having a NULL in their first row element: rather than detecting these
cases at the point where _bt_first builds its insertion scan key, do so
earlier, during preprocessing proper. This brings the RowCompare case
in line every other case involving an unsatisfiable-due-to-NULL qual.
nbtree now consistently detects such unsatisfiable quals -- even when
they happen to involve a key that isn't examined by _bt_first at all.
Affected cases thereby avoid useless full index scans that cannot
possibly return any matching rows.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzmySVXst2hFrOATC-zw1Byg1XC-jYUS314=mzuqsNwk+Q@mail.gmail.com
|