| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce a new GUC recovery_prefetch. When enabled, look ahead in the
WAL and try to initiate asynchronous reading of referenced data blocks
that are not yet cached in our buffer pool. For now, this is done with
posix_fadvise(), which has several caveats. Since not all OSes have
that system call, "try" is provided so that it can be enabled where
available. Better mechanisms for asynchronous I/O are possible in later
work.
Set to "try" for now for test coverage. Default setting to be finalized
before release.
The GUC wal_decode_buffer_size limits the distance we can look ahead in
bytes of decoded data.
The existing GUC maintenance_io_concurrency is used to limit the number
of concurrent I/Os allowed, based on pessimistic heuristics used to
infer that I/Os have begun and completed. We'll also not look more than
maintenance_io_concurrency * 4 block references ahead.
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (earlier version)
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (earlier version)
Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (earlier version)
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> (earlier version)
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> (earlier version)
Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
|
|
|
|
|
|
|
|
|
| |
Change two macros to be static inline functions instead to keep the
data type consistent. This avoids a "comparison is always true"
warning that was occurring with -Wtype-limits. In the process, change
the names to look less like macros.
Discussion: https://postgr.es/m/20220407063505.njnnrmbn4sxqfsts@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
| |
Will be used by tests committed subsequently.
Bumps catversion (this time for real, the one in 0f96965c658 got lost when
rebasing over 5c279a6d350).
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_aNxL1WegCa45r=VAViCLnpOU7uNC7bTtGw+=QAPyYivw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the stats collector days it was hard to write tests for the stats system,
because fundamentally delivery of stats messages over UDP was not
synchronous (nor guaranteed). Now we easily can force pending stats updates to
be flushed synchronously.
This moves stats.sql into a parallel group, there isn't a reason for it to run
in isolation anymore. And it may shake out some bugs.
Bumps catversion.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
| |
Just after committing 5891c7a8ed8, a test running with debug_discard_caches=1
failed locally...
pgstat_drop_relation() neither checked pgstat_should_count_relation() nor
called pgstat_prep_relation_pending(). With debug_discard_caches=1
rel->pgstat_info wasn't set up, leading pg_stat_get_xact_tuples_inserted()
spuriously still returning > 0 while in the transaction dropping the table.
|
|
|
|
|
|
|
|
|
|
|
| |
Zeroing out an lwlock in a normal build turns out to not trigger any alarms,
if nobody can use the lwlock at that moment (as the case here). But with
--disable-spinlocks --disable-atomics, the sema field needs to be initialized.
We probably should make sure that this fails on more common configurations as
well...
Per buildfarm animal rorqual
|
|
|
|
|
|
| |
Broke with 5c279a6d350. But looks like it had been half-broken since
70e81861fad, because 'rmid' didn't refer to the current record's rmid anymore,
but to rmid from "Initialize resource managers" - a constant.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow extensions to specify a new custom resource manager (rmgr),
which allows specialized WAL. This is meant to be used by a Table
Access Method or Index Access Method.
Prior to this commit, only Generic WAL was available, which offers
support for recovery and physical replication but not logical
replication.
Reviewed-by: Julien Rouhaud, Bharath Rupireddy, Andres Freund
Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change affects SubTransGetTopmostTransaction(), used to find the
topmost transaction ID of a given transaction ID. The cache is able to
store one value, so as we can save the backend from unnecessary lookups
at pg_subtrans/ on repetitive calls of this routine. There is a similar
practice in transam.c, for example.
Author: Simon Riggs
Reviewed-by: Andrey Borodin, Julien Rouhaud
Discussion: https://postgr.es/m/CANbhV-G8Co=yq4v4BkW7MJDqVt68K_8A48nAZ_+8UQS7LrwLEQ@mail.gmail.com
|
|
|
|
|
|
|
|
| |
Now that pgstat is not related to postmaster anymore, src/backend/postmaster
is not a well fitting directory.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
|
|
|
|
|
|
| |
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With stats now being stored in shared memory, the GUC isn't needed
anymore. However, the pg_stat_tmp directory and PG_STAT_TMP_DIR define are
kept, as pg_stat_statements (and some out-of-core extensions) store data in
it.
Docs will be updated in a subsequent commit, together with the other pending
docs updates due to shared memory stats.
Author: Andres Freund <andres@anarazel.de>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220330233550.eiwsbearu6xhuqwe@alap3.anarazel.de
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the statistics collector received statistics updates via UDP and
shared statistics data by writing them out to temporary files regularly. These
files can reach tens of megabytes and are written out up to twice a
second. This has repeatedly prevented us from adding additional useful
statistics.
Now statistics are stored in shared memory. Statistics for variable-numbered
objects are stored in a dshash hashtable (backed by dynamic shared
memory). Fixed-numbered stats are stored in plain shared memory.
The header for pgstat.c contains an overview of the architecture.
The stats collector is not needed anymore, remove it.
By utilizing the transactional statistics drop infrastructure introduced in a
prior commit statistics entries cannot "leak" anymore. Previously leaked
statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On
systems with many small relations pgstat_vacuum_stat() could be quite
expensive.
Now that replicas drop statistics entries for dropped objects, it is not
necessary anymore to reset stats when starting from a cleanly shut down
replica.
Subsequent commits will perform some further code cleanup, adapt docs and add
tests.
Bumps PGSTAT_FILE_FORMAT_ID.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com>
Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version)
Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version)
Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version)
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de
Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
|
|
|
|
|
|
| |
Most of pgstat uses pgstat_<verb>_<subject>() or just <verb>_<subject>(). But
not all (some introduced fairly recently by me). Rename ones that aren't
intentionally following a different scheme (e.g. AtEOXact_*).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The column 'subskiplsn' uses TYPALIGN_DOUBLE (which has 4 bytes alignment
on AIX) for storage. But the C Struct (Form_pg_subscription) has 8-byte
alignment for this field, so retrieving it from storage causes an
unaligned read.
To fix this, we rearranged the 'subskiplsn' column in the catalog so that
it naturally comes at an 8-byte boundary.
We have fixed a similar problem in commit f3b421da5f. This patch adds a
test to avoid a similar mistake in the future.
Reported-by: Noah Misch
Diagnosed-by: Noah Misch, Masahiko Sawada, Amit Kapila
Author: Masahiko Sawada
Reviewed-by: Noah Misch, Amit Kapila
Discussion: https://postgr.es/m/20220401074423.GC3682158@rfd.leadboat.com
https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the pgstat <-> replication slots API was done with on the basis of
names. However, the upcoming move to storing stats in shared memory makes it
more convenient to use a integer as key.
Change the replication slot functions to take the slot rather than the slot
name, and expose ReplicationSlotIndex() to compute the index of an replication
slot. Special handling will be required for restarts, as the index is not
stable across restarts. For now pgstat internally still uses names.
Rename pgstat_report_replslot_{create,drop}() to
pgstat_{create,drop}_replslot() to match the functions for other kinds of
stats.
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220404041516.cctrvpadhuriawlq@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One problematic part of the current statistics collector design is that there
is no reliable way of getting rid of statistics entries. Because of that
pgstat_vacuum_stat() (called by [auto-]vacuum) matches all stats for the
current database with the catalog contents and tries to drop now-superfluous
entries. That's quite expensive. What's worse, it doesn't work on physical
replicas, despite physical replicas collection statistics entries.
This commit introduces infrastructure to create / drop statistics entries
transactionally, together with the underlying catalog objects (functions,
relations, subscriptions). pgstat_xact.c maintains a list of stats entries
created / dropped transactionally in the current transaction. To ensure the
removal of statistics entries is durable dropped statistics entries are
included in commit / abort (and prepare) records, which also ensures that
stats entries are dropped on standbys.
Statistics entries created separately from creating the underlying catalog
object (e.g. when stats were previously lost due to an immediate restart)
are *not* WAL logged. However that can only happen outside of the transaction
creating the catalog object, so it does not lead to "leaked" statistics
entries.
For this to work, functions creating / dropping functions / relations /
subscriptions need to call into pgstat. For subscriptions this was already
done when dropping subscriptions, via pgstat_report_subscription_drop() (now
renamed to pgstat_drop_subscription()).
This commit does not actually drop stats yet, it just provides the
infrastructure. It is however a largely independent piece of infrastructure,
so committing it separately makes sense.
Bumps XLOG_PAGE_MAGIC.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the introduction of PgStat_Kind PgStat_Single_Reset_Type,
PgStat_Shared_Reset_Target don't make sense anymore. Replace them with
PgStat_Kind.
Instead of having dedicated reset functions for different kinds of stats, use
two generic helper routines (one to reset all stats of a kind, one to reset
one stats entry).
A number of reset functions were named pgstat_reset_*_counter(), despite
affecting multiple counters. The generic helper routines get rid of
pgstat_reset_single_counter(), pgstat_reset_subscription_counter().
Rename pgstat_reset_slru_counter(), pgstat_reset_replslot_counter() to
pgstat_reset_slru(), pgstat_reset_replslot() respectively, and have them only
deal with a single SLRU/slot. Resetting all SLRUs/slots goes through the
generic pgstat_reset_of_kind().
Previously pg_stat_reset_replication_slot() used SearchNamedReplicationSlot()
to check if a slot exists. API wise it seems better to move that to
pgstat_replslot.c.
This is done separately from the - quite large - shared memory statistics
patch to make review easier.
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220404041516.cctrvpadhuriawlq@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now index_concurrently_swap() directly modified pgstat internal
datastructures. That will break with the introduction of shared memory
statistics and seems off architecturally.
This is done separately from the - quite large - shared memory statistics
patch to make review easier.
Author: Andres Freund <andres@anarazel.de>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
| |
Only the pgstat_send_* functions that are called from outside pgstat*.c are
renamed (the rest will go away). This is done separately from the - quite
large - shared memory statistics patch to make review easier.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220404041516.cctrvpadhuriawlq@alap3.anarazel.de
|
|
|
|
|
|
|
| |
With asserts disabled, late-model clang notices that this variable
is incremented but never otherwise read.
Discussion: https://postgr.es/m/3171401.1649275153@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Soon the stats collector will be no more, with statistics instead getting
stored in shared memory. There are a lot of references to the stats collector
in comments. This commit replaces most of these references with "cumulative
statistics system", with the remaining ones getting replaced as part of
subsequent commits.
This is done separately from the - quite large - shared memory statistics
patch to make review easier.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
| |
The transactional integration code is largely independent from the rest of
pgstat.c. Subsequent commits will add more related code.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20220404041516.cctrvpadhuriawlq@alap3.anarazel.de
|
|
|
|
|
| |
I got the location wrong in 13619598f10. The name did make it sound like it
belonged in pgstat_relation.c...
|
|
|
|
|
|
|
|
|
|
|
|
| |
It might seem pointless to allow use of dsm in single user mode, but otherwise
subsystems might need dedicated single user mode code paths.
Besides changing the assert, all that's needed is to make some windows code
assuming the presence of postmaster conditional.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA+hUKGL9hY_VY=+oUK+Gc1iSRx-Ls5qeYJ6q=dQVZnT3R63Taw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Exclusive-mode backups have been deprecated since 9.6 (when
non-exclusive backups were introduced) due to the issues
they can cause should the system crash while one is running and
generally because non-exclusive provides a much better interface.
Further, exclusive backup mode wasn't really being tested (nor was most
of the related code- like being able to log in just to stop an exclusive
backup and the bits of the state machine related to that) and having to
possibly deal with an exclusive backup and the backup_label file
existing during pg_basebackup, pg_rewind, etc, added other complexities
that we are better off without.
This patch removes the exclusive backup mode, the various special cases
for dealing with it, and greatly simplifies the online backup code and
documentation.
Authors: David Steele, Nathan Bossart
Reviewed-by: Chapman Flack
Discussion: https://postgr.es/m/ac7339ca-3718-3c93-929f-99e725d1172c@pgmasters.net
https://postgr.es/m/CAHg+QDfiM+WU61tF6=nPZocMZvHDzCK47Kneyb0ZRULYzV5sKQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows "PGC_SUSET" parameters to be set by non-superusers
if they have been explicitly granted the privilege to do so.
The privilege to perform ALTER SYSTEM SET/RESET on a specific parameter
can also be granted.
Such privileges are cluster-wide, not per database. They are tracked
in a new shared catalog, pg_parameter_acl.
Granting and revoking these new privileges works as one would expect.
One caveat is that PGC_USERSET GUCs are unaffected by the SET privilege
--- one could wish that those were handled by a revocable grant to
PUBLIC, but they are not, because we couldn't make it robust enough
for GUCs defined by extensions.
Mark Dilger, reviewed at various times by Andrew Dunstan, Robert Haas,
Joshua Brindle, and myself
Discussion: https://postgr.es/m/3D691E20-C1D5-4B80-8BA5-6BEB63AF3029@enterprisedb.com
|
|
|
|
|
|
|
|
| |
Avoid printing signed values as unsigned. (No impact in practice
expected.)
Author: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CALT9ZEHN7hWJo6MgJKqoDMGj%3DGOzQU50wTvOYZXDj7x%3DsUK-kw%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 27e1f1456, create_append_plan() only allowed the subplan
created from a given subpath to be executed asynchronously when it was
an async-capable ForeignPath. To extend coverage, this patch handles
cases when the given subpath includes some other Path types as well that
can be omitted in the plan processing, such as a ProjectionPath directly
atop an async-capable ForeignPath, allowing asynchronous execution in
partitioned-scan/partitioned-join queries with non-Var tlist expressions
and more UNION queries.
Andrey Lepikhov and Etsuro Fujita, reviewed by Alexander Pyhalov and
Zhihong Yu.
Discussion: https://postgr.es/m/659c37a8-3e71-0ff2-394c-f04428c76f08%40postgrespro.ru
|
|
|
|
|
|
|
| |
Reported-by: Antonin Houska
Author: Amit Kapila
Reviewed-by: Antonin Houska, Ajin Cherian
Discussion: https://postgr.es/m/84638.1649152255@antos
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These clauses allow the user to specify how data from nested paths are
joined, allowing considerable freedom in shaping the tabular output of
JSON_TABLE.
PLAN DEFAULT allows the user to specify the global strategies when
dealing with sibling or child nested paths. The is often sufficient to
achieve the necessary goal, and is considerably simpler than the full
PLAN clause, which allows the user to specify the strategy to be used
for each named nested path.
Nikita Glukhov
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zhihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
Discussion: https://postgr.es/m/7e2cb85d-24cf-4abb-30a5-1a33715959bd@postgrespro.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commits 74cf7d46 and a61daa14 fixed pg_upgrade bugs involving oversights
in how relfrozenxid or relminmxid are carried forward or initialized.
Corruption caused by bugs of this nature was ameliorated by commit
78db307bb2, which taught VACUUM to always overwrite existing invalid
relfrozenxid or relminmxid values that are apparently "in the future".
Extend that work now by showing a warning in the event of overwriting
either relfrozenxid or relminmxid due to an existing value that is "in
the future". There is probably a decent chance that the sanity checks
added by commit 699bf7d05c will raise an error before VACUUM reaches
this point, but we shouldn't rely on that.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzmRZEzeGvLv8yDW0AbFmSvJjTziORqjVUrf74mL4GL0Ww@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Move the execution pruning initialization steps that are common
between both ExecInitAppend() and ExecInitMergeAppend() into a new
function ExecInitPartitionPruning() defined in execPartition.c.
Those steps include creation of a PartitionPruneState to be used for
all instances of pruning and determining the minimal set of child
subplans that need to be initialized by performing initial pruning if
needed, and finally adjusting the subplan_map arrays in the
PartitionPruneState to reflect the new set of subplans remaining
after initial pruning if it was indeed performed.
ExecCreatePartitionPruneState() is no longer exported out of
execPartition.c and has been renamed to CreatePartitionPruneState()
as a local sub-routine of ExecInitPartitionPruning().
* Likewise, ExecFindInitialMatchingSubPlans() that was in charge of
performing initial pruning no longer needs to be exported. In fact,
since it would now have the same body as the more generally named
ExecFindMatchingSubPlans(), except differing in the value of
initial_prune passed to the common subroutine
find_matching_subplans_recurse(), it seems better to remove it and add
an initial_prune argument to ExecFindMatchingSubPlans().
* Add an ExprContext field to PartitionPruneContext to remove the
implicit assumption in the runtime pruning code that the ExprContext to
use to compute pruning expressions that need one can always rely on the
PlanState providing it. A future patch will allow runtime pruning (at
least the initial pruning steps) to be performed without the
corresponding PlanState yet having been created, so this will help.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqEYCpEqh2LMDOp9mT+4-QoVe8HgFMKBjntEMCTZLpcCCA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous coding of dshash_seq_next(), on the first call, accessed
status->hash_table->size_log2 without holding a partition lock and without
guaranteeing that ensure_valid_bucket_pointers() had ever been called.
That oversight turns out to not have immediately visible effects, because
bucket 0 is always in partition 0, and ensure_valid_bucket_pointers() was
called after acquiring the partition lock. However,
PARTITION_FOR_BUCKET_INDEX() with a size_log2 of 0 ends up triggering formally
undefined behaviour.
Simplify by accessing partition 0, without using PARTITION_FOR_BUCKET_INDEX().
While at it, remove dshash_get_current(), there is no convincing use
case. Also polish a few comments.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA+hUKGL9hY_VY=+oUK+Gc1iSRx-Ls5qeYJ6q=dQVZnT3R63Taw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
There was a wild mishmash of function comment formatting in pgstat, making it
hard to know what to use for any new function and hard to extend existing
comments (particularly due to randomly different forms of indentation).
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20220329191727.mzzwbl7udhpq7pmf@alap3.anarazel.de
Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This feature allows jsonb data to be treated as a table and thus used in
a FROM clause like other tabular data. Data can be selected from the
jsonb using jsonpath expressions, and hoisted out of nested structures
in the jsonb to form multiple rows, more or less like an outer join.
Nikita Glukhov
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zhihong Yu (whose
name I previously misspelled), Himanshu Upadhyaya, Daniel Gustafsson,
Justin Pryzby.
Discussion: https://postgr.es/m/7e2cb85d-24cf-4abb-30a5-1a33715959bd@postgrespro.ru
|
|
|
|
|
|
|
|
|
| |
Move remaining VACUUM resource allocation and deallocation code from
lazy_scan_heap() to its caller, heap_vacuum_rel(). This finishes off
work started by commit 73f6ec3d.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wzk3fNBa_S3Ngi+16GQiyJ=AmUu3oUY99syMDTMRxitfyQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
Commit f4fb45d15c tried to free memory during aggregate finalization.
This cause issues, particularly when used as a window function, so stop
doing that.
Per complaint by Jaime Casanova and diagnosis by Andres Freund
Discussion: https://postgr.es/m/YkfeMNYRCGhySKyg@ahch-to
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The general usage pattern when we store tuples in tuplesort.c is that
we store a series of tuples one by one then either perform a sort or spill
them to disk. In the common case, there is no pfreeing of already stored
tuples. For the common case since we do not individually pfree tuples, we
have very little need for aset.c memory allocation behavior which
maintains freelists and always rounds allocation sizes up to the next
power of 2 size.
Here we conditionally use generation.c contexts for storing tuples in
tuplesort.c when the sort will never be bounded. Unfortunately, the
memory context to store tuples is already created by the time any calls
would be made to tuplesort_set_bound(), so here we add a new sort option
that allows callers to specify if they're going to need a bounded sort or
not. We'll use a standard aset.c allocator when this sort option is not
set.
Extension authors must ensure that the TUPLESORT_ALLOWBOUNDED flag is
used when calling tuplesort_begin_* for any sorts that make a call to
tuplesort_set_bound().
Author: David Rowley
Reviewed-by: Andy Fan
Discussion: https://postgr.es/m/CAApHDvoH4ASzsAOyHcxkuY01Qf++8JJ0paw+03dk+W25tQEcNQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This replaces the bool flag for randomAccess. An upcoming patch requires
adding another option, so instead of breaking the API for that, then
breaking it again one day if we add more options, let's just break it
once. Any boolean options we add in the future will just make use of an
unused bit in the flags.
Any extensions making use of tuplesorts will need to update their code
to pass TUPLESORT_RANDOMACCESS instead of true for randomAccess.
TUPLESORT_NONE can be used for a set of empty options.
Author: David Rowley
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/CAApHDvoH4ASzsAOyHcxkuY01Qf%2B%2B8JJ0paw%2B03dk%2BW25tQEcNQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here we make a series of improvements to the generation memory
allocator, namely:
1. Allow generation contexts to have a minimum, initial and maximum block
sizes. The standard allocator allows this already but when the generation
context was added, it only allowed fixed-sized blocks. The problem with
fixed-sized blocks is that it's difficult to choose how large to make the
blocks. If the chosen size is too small then we'd end up with a large
number of blocks and a large number of malloc calls. If the block size is
made too large, then memory is wasted.
2. Add support for "keeper" blocks. This is a special block that is
allocated along with the context itself but is never freed. Instead,
when the last chunk in the keeper block is freed, we simply mark the block
as empty to allow new allocations to make use of it.
3. Add facility to "recycle" newly empty blocks instead of freeing them
and having to later malloc an entire new block again. We do this by
recording a single GenerationBlock which has become empty of any chunks.
When we run out of space in the current block, we check to see if there is
a "freeblock" and use that if it contains enough space for the allocation.
Author: David Rowley, Tomas Vondra
Reviewed-by: Andy Fan
Discussion: https://postgr.es/m/d987fd54-01f8-0f73-af6c-519f799a0ab8@enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When dispatching sort operations to specialized variants, commit
69749243 failed to handle the case where CLUSTER-sort decides not to
initialize datum1 and isnull1. Fix by hoisting that decision up a level
and advertising whether datum1 can be relied on, in the Tuplesortstate
object.
Per reports from UBsan and Valgrind build farm animals, while running
the cluster.sql test.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAFBsxsF1TeK5Fic0M%2BTSJXzbKsY6aBqJGNj6ptURuB09ZF6k_w%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
datetime.c's parsing logic has assumed that strtod() will accept
a string that looks like ".", which it does in glibc, but not on
some less-common platforms such as AIX. The result of this was
that datetime fields like "123." would be accepted on some platforms
but not others; which is a sufficiently odd case that it's not that
surprising we've heard no field complaints. But commit e39f99046
extended that assumption to new places, and happened to add a test
case that exposed the platform dependency. Remove this dependency
by special-casing situations without any digits after the decimal
point.
(Again, this is in part a pre-existing bug but I don't feel a
compulsion to back-patch.)
Also, rearrange e39f99046's changes in formatting.c to avoid a
Coverity complaint that we were copying an uninitialized field.
Discussion: https://postgr.es/m/1592893.1648969747@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Non-aggressive VACUUMs were at a gratuitous disadvantage (relative to
aggressive VACUUMs) around advancing relfrozenxid and relminmxid before
now. The issue only came up when concurrent activity unset some heap
page's visibility map bit right as VACUUM was considering if the page
should get counted in frozenskipped_pages. The non-aggressive case
would recheck the all-frozen bit at this point. The aggressive case
reasoned that the page (a skippable page) must have at least been
all-frozen in the recent past, so skipping it won't make relfrozenxid
advancement unsafe (which is never okay for aggressive VACUUMs).
The recheck created a window for some other backend to confuse matters
for VACUUM. If the page's VM bit turned out to be unset, VACUUM would
conclude that the page was _never_ all-frozen. frozenskipped_pages was
not incremented, and yet VACUUM couldn't back out of skipping at this
late stage (it couldn't choose to scan the page instead). This made it
unsafe to advance relfrozenxid later on.
Consistently avoid the issue by generalizing how we skip frozen pages
during aggressive VACUUMs: take the same approach when skipping any
skippable page range during aggressive and non-aggressive VACUUMs alike.
The new approach makes ranges (not individual pages) the fundamental
unit of skipping using the visibility map. frozenskipped_pages is
replaced with a boolean flag that represents whether some skippable
range with one or more all-visible pages was actually skipped.
It is safe for VACUUM to treat a page as all-frozen provided it at least
had its all-frozen bit set after the OldestXmin cutoff was established.
VACUUM is only required to scan pages that might have XIDs < OldestXmin
(unfrozen XIDs) to be able to safely advance relfrozenxid. Tuples
concurrently inserted on "skipped" pages can be thought of as equivalent
to tuples concurrently inserted on a block >= rel_pages.
It's possible that the issue this commit fixes hardly ever came up in
practice. But we only had to be unlucky once to lose out on advancing
relfrozenxid -- a single affected heap page was enough to throw VACUUM
off. That seems like something to avoid on general principle. This is
similar to an issue fixed by commit 44fa8488, which taught vacuumlazy.c
to not give up on non-aggressive relfrozenxid advancement just because a
cleanup lock wasn't immediately available on some heap page.
Skipping an all-visible range is now explicitly structured as a choice
made by non-aggressive VACUUMs, by weighing known costs (scanning extra
skippable pages to freeze their tuples early) against known benefits
(advancing relfrozenxid early). This works in essentially the same way
as it always has (don't skip ranges < SKIP_PAGES_THRESHOLD). We could
do much better here in the future by considering other relevant factors.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzn6bGJGfOy3zSTJicKLw99PHJeSOQBOViKjSCinaxUKDQ@mail.gmail.com
Discussion: https://postgr.es/m/CA%2BTgmoZiSOY6H7aadw5ZZGm7zYmfDzL6nwmL5V7GL4HgJgLF_w%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When VACUUM set relfrozenxid before now, it set it to whatever value was
used to determine which tuples to freeze -- the FreezeLimit cutoff.
This approach was very naive. The relfrozenxid invariant only requires
that new relfrozenxid values be <= the oldest extant XID remaining in
the table (at the point that the VACUUM operation ends), which in
general might be much more recent than FreezeLimit.
VACUUM now carefully tracks the oldest remaining XID/MultiXactId as it
goes (the oldest remaining values _after_ lazy_scan_prune processing).
The final values are set as the table's new relfrozenxid and new
relminmxid in pg_class at the end of each VACUUM. The oldest XID might
come from a tuple's xmin, xmax, or xvac fields. It might even come from
one of the table's remaining MultiXacts.
Final relfrozenxid values must still be >= FreezeLimit in an aggressive
VACUUM (FreezeLimit still acts as a lower bound on the final value that
aggressive VACUUM can set relfrozenxid to). Since standard VACUUMs
still make no guarantees about advancing relfrozenxid, they might as
well set relfrozenxid to a value from well before FreezeLimit when the
opportunity presents itself. In general standard VACUUMs may now set
relfrozenxid to any value > the original relfrozenxid and <= OldestXmin.
Credit for the general idea of using the oldest extant XID to set
pg_class.relfrozenxid at the end of VACUUM goes to Andres Freund.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DecodeInterval (interval input) was careless about integer-overflow
hazards, allowing bogus results to be obtained for sufficiently
large input values. Also, since it initially converted the input
to a "struct tm", it was impossible to produce the full range of
representable interval values.
Meanwhile, EncodeInterval (interval output) and a few other
functions could suffer failures if asked to process sufficiently
large interval values, because they also relied on being able to
represent an interval in "struct tm" which is not designed to
handle that.
Fix all this stuff by introducing new struct types that are more
fit for purpose.
While this is clearly a bug fix, it's also an API break for any
code that's calling these functions directly. So back-patching
doesn't seem wise, especially in view of the lack of field
complaints.
Joe Koshakow, editorialized a bit by me
Discussion: https://postgr.es/m/CAAvxfHff0JLYHwyBrtMx_=6wr=k2Xp+D+-X3vEhHjJYMj+mQcg@mail.gmail.com
|
|
|
|
|
|
|
| |
Move some of the heap_vacuum_rel() instrumentation related variables to
the scope where they're actually needed. Also reorder some of the
variable declarations at the start of heap_vacuum_rel() so that related
variables appear together.
|
|
|
|
|
|
|
|
|
| |
Similar to commit 6198420ad, replace is_member_of_role with
has_privs_for_role for predefined role access checks in recently
committed basebackup code. In passing fix a double-word error
in a nearby comment.
Discussion: https://postgr.es/m/flat/CAGB+Vh4Zv_TvKt2tv3QNS6tUM_F_9icmuj0zjywwcgVi4PAhFA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is essentially the same as applying VACUUM FULL to a partitioned
table, which has been supported since commit 3c3bb99330aa (March 2017).
While there's no great use case in applying CLUSTER to partitioned
tables, we don't have any strong reason not to allow it either.
For now, partitioned indexes cannot be marked clustered, so an index
must always be specified.
While at it, rename some variables that were RangeVars during the
development that led to 8bc717cb8878 but never made it that way to the
source tree; there's no need to perpetuate names that have always been
more confusing than helpful.
Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/20201028003312.GU9241@telsasoft.com
Discussion: https://postgr.es/m/20200611153502.GT14879@telsasoft.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the specialized tuplesort routine inlined handling for
reverse-sort and NULLs-ordering but called the datum comparator via a
pointer in the SortSupport struct parameter. Testing has showed that we
can get a useful performance gain by specializing datum comparison for
the different representations of abbreviated keys -- signed and unsigned
64-bit integers and signed 32-bit integers. Almost all abbreviatable data
types will benefit -- the only exception for now is numeric, since the
datum comparison is more complex. The performance gain depends on data
type and input distribution, but often falls in the range of 10-20% faster.
Thomas Munro
Reviewed by Peter Geoghegan, review and performance testing by me
Discussion:
https://www.postgresql.org/message-id/CA%2BhUKGKKYttZZk-JMRQSVak%3DCXSJ5fiwtirFf%3Dn%3DPAbumvn1Ww%40mail.gmail.com
|