| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It doesn't make any sense to report this information, since VACUUM
VERBOSE reports on heap relation truncation directly. This was an
oversight in commit 7ab96cf6, which made VACUUM VERBOSE output a little
more consistent with nearby autovacuum-specific log output. Adjust
comments that describe how this is supposed to work in passing.
Also bring truncation-related VACUUM VERBOSE output in line with the
convention established for VACUUM VERBOSE output by commit f4f4a649.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch: 14-, where VACUUM VERBOSE's output changed.
|
|
|
|
|
|
|
|
|
|
|
| |
Most data-corruption reports mention the location of the problem, but
this one failed to. Add it.
Backpatch all the way back. In 12 and older, also assign the
ERRCODE_DATA_CORRUPTED error code as was done in commit fd6ec93bf890 for
13 and later.
Discussion: https://postgr.es/m/202108191637.oqyzrdtnheir@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Somehow, spgist overlooked the need to call pgstat_count_index_scan().
Hence, pg_stat_all_indexes.idx_scan and equivalent columns never
became nonzero for an SP-GiST index, although the related per-tuple
counters worked fine.
This fix works a bit differently from other index AMs, in that the
counter increment occurs in spgrescan not spggettuple/spggetbitmap.
It looks like this won't make the user-visible semantics noticeably
different, so I won't go to the trouble of introducing an is-this-
the-first-call flag just to make the counter bumps happen in the
same places.
Per bug #17163 from Christian Quest. Back-patch to all supported
versions.
Discussion: https://postgr.es/m/17163-b8c5cc88322a5e92@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adjust track_io_timing related logging code added by commit 94d13d474d.
Make it consistent with other nearby autovacuum and autoanalyze logging
code by removing logic that suppressed zero millisecond outputs.
log_autovacuum_min_duration log output now reliably shows "read:" and
"write:" millisecond-based values in its report (when track_io_timing is
enabled).
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Stephen Frost <sfrost@snowman.net>
Discussion: https://postgr.es/m/CAH2-WznW0FNxSVQMSRazAMYNfZ6DR_gr5WE78hc6E1CBkkJpzw@mail.gmail.com
Backpatch: 14-, where the track_io_timing logging was introduced.
|
|
|
|
|
|
|
|
|
|
| |
This order seems more natural. It starts with details that are
particular to heap and index data structures, and ends with system-level
costs incurred during the autovacuum worker's VACUUM/ANALYZE operation.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkzxK6ahA9xxsOftRtBX_R0swuHZsvo4QUbak1Bz7hb7Q@mail.gmail.com
Backpatch: 14-, which enhanced the log output in various ways.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pengchengliu reported an assertion failure in a parallel woker while
performing a parallel scan using an overflowed snapshot. The proximate
cause is that TransactionXmin was set to an incorrect value. The
underlying cause is incorrect snapshot handling in parallel.c.
In particular, InitializeParallelDSM() was unconditionally calling
GetTransactionSnapshot(), because I (rhaas) mistakenly thought that
was always retrieving an existing snapshot whereas, at isolation
levels less than REPEATABLE READ, it's actually taking a new one. So
instead do this only at higher isolation levels where there actually
is a single snapshot for the whole transaction.
By itself, this is not a sufficient fix, because we still need to
guarantee that TransactionXmin gets set properly in the workers. The
easiest way to do that seems to be to install the leader's active
snapshot as the transaction snapshot if the leader did not serialize a
transaction snapshot. This doesn't affect the results of future
GetTrasnactionSnapshot() calls since those have to take a new snapshot
anyway; what we care about is the side effect of setting TransactionXmin.
Report by Pengchengliu. Patch by Greg Nancarrow, except for some comment
text which I supplied.
Discussion: https://postgr.es/m/002f01d748ac$eaa781a0$bff684e0$@tju.edu.cn
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
WAL records may span multiple segments, but XLogWrite() does not
wait for the entire record to be written out to disk before
creating archive status files. Instead, as soon as the last WAL page of
the segment is written, the archive status file is created, and the
archiver may process it. If PostgreSQL crashes before it is able to
write and flush the rest of the record (in the next WAL segment), the
wrong version of the first segment file lingers in the archive, which
causes operations such as point-in-time restores to fail.
To fix this, keep track of records that span across segments and ensure
that segments are only marked ready-for-archival once such records have
been completely written to disk.
This has always been wrong, so backpatch all the way back.
Author: Nathan Bossart <bossartn@amazon.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CBDDFA01-6E40-46BB-9F98-9340F4379505@amazon.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts the following commits:
1b5617eb844cd2470a334c1d2eec66cf9b39c41a Describe (auto-)analyze behavior for partitioned tables
0e69f705cc1a3df273b38c9883fb5765991e04fe Set pg_class.reltuples for partitioned tables
41badeaba8beee7648ebe7923a41c04f1f3cb302 Document ANALYZE storage parameters for partitioned tables
0827e8af70f4653ba17ed773f123a60eadd9f9c9 autovacuum: handle analyze for partitioned tables
There are efficiency issues in this code when handling databases with
large numbers of partitions, and it doesn't look like there isn't any
trivial way to handle those. There are some other issues as well. It's
now too late in the cycle for nontrivial fixes, so we'll have to let
Postgres 14 users continue to manually deal with ANALYZE their
partitioned tables, and hopefully we can fix the issues for Postgres 15.
I kept [most of] be280cdad298 ("Don't reset relhasindex for partitioned
tables on ANALYZE") because while we added it due to 0827e8af70f4, it is
a good bugfix in its own right, since it affects manual analyze as well
as autovacuum-induced analyze, and there's no reason to revert it.
I retained the addition of relkind 'p' to tables included by
pg_stat_user_tables, because reverting that would require a catversion
bump.
Also, in pg14 only, I keep a struct member that was added to
PgStat_TabStatEntry to avoid breaking compatibility with existing stat
files.
Backpatch to 14.
Discussion: https://postgr.es/m/20210722205458.f2bug3z6qzxzpx2s@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit ensures that the wait interval in the replay delay loop
waiting for an amount of time defined by recovery_min_apply_delay is
correctly handled on reload, recalculating the delay if this GUC value
is updated, based on the timestamp of the commit record being replayed.
The previous behavior would be problematic for example with replay
still waiting even if the delay got reduced or just cancelled. If the
apply delay was increased to a larger value, the wait would have just
respected the old value set, finishing earlier.
Author: Soumyadeep Chakraborty, Ashwin Agrawal
Reviewed-by: Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/CAE-ML+93zfr-HLN8OuxF0BjpWJ17O5dv1eMvSE5jsj9jpnAXZA@mail.gmail.com
Backpatch-through: 9.6
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Oversight in commit 3499df0d, which generalized the reloption as a way
of giving users a way to consistently avoid VACUUM's index bypass
optimization.
Per off-list report from Nikolay Shaplov.
Backpatch: 14-, where index cleanup reloption was extended.
|
|
|
|
|
|
| |
CheckpointLock was removed in commit d18e75664a, and commit ce197e91d0
updated a leftover comment in CreateCheckPoint, but there was another
copy of it in CreateRestartPoint still.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Buildfarm shows that this test has a further failure mode when a
checkpoint starts earlier than expected, so we detect a "checkpoint
completed" line that's not the one we want. Change the config to try
and prevent this.
Per buildfarm
While at it, update one comment that was forgotten in commit
d18e75664a2f.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20210729.162038.534808353849568395.horikyota.ntt@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a file is truncated, we must update minRecoveryPoint. Once a file is
truncated, there's no going back; it would not be safe to stop recovery
at a point earlier than that anymore.
Commit 7bffc9b7bf changed xact_redo_commit() so that it updates
minRecoveryPoint on truncation, but forgot to change xact_redo_abort().
Back-patch to all supported versions.
Reported-by: mengjuan.cmj@alibaba-inc.com
Author: Fujii Masao
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/b029fce3-4fac-4265-968e-16f36ff4d075.mengjuan.cmj@alibaba-inc.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When some slots are invalidated due to the max_slot_wal_keep_size limit,
the old segment horizon should move forward to stay within the limit.
However, in commit c6550776394e we forgot to call KeepLogSeg again to
recompute the horizon after invalidating replication slots. In cases
where other slots remained, the limits would be recomputed eventually
for other reasons, but if all slots were invalidated, the limits would
not move at all afterwards. Repair.
Backpatch to 13 where the feature was introduced.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reported-by: Marcin Krupowicz <mk@071.ovh>
Discussion: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should have been removed in commit 7e30c186da, which split the loop
into two. Only the first loop uses the 'from' variable; updating it in
the second loop is bogus. It was never read after the first loop, so this
was harmless and surely optimized away by the compiler, but let's be tidy.
Backpatch to all supported versions.
Author: Ranier Vilela
Discussion: https://www.postgresql.org/message-id/CAEudQAoWq%2BAL3BnELHu7gms2GN07k-np6yLbukGaxJ1vY-zeiQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
Reaching PITR on such a transaction would cause the generation of a LOG
message mentioning a transaction committed, not aborted.
Oversight in 4f1b890.
Author: Simon Riggs
Discussion: https://postgr.es/m/CANbhV-GJ6KijeCgdOrxqMCQ+C8QiK657EMhCy4csjrPcEUFv_Q@mail.gmail.com
Backpatch-through: 9.6
|
|
|
|
|
| |
Along the way make a slight adjustment to
src/include/utils/queryjumble.h to avoid an unused typedef.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we cannot immediately acquire XactSLRULock in exclusive mode at
commit time, we add ourselves to a list of processes that need their XIDs
status update. We do this if the clog page where we need to update the
current transaction status is the same as the group leader's clog page,
otherwise, we allow the caller to clear it by itself. Now, when we can't
add ourselves to any group, we were not clearing the current proc if it
has already become a member of some group which was leading to an
assertion failure when the same proc was assigned to another backend after
the current backend exits.
Reported-by: Alexander Lakhin
Bug: 17072
Author: Amit Kapila
Tested-By: Alexander Lakhin
Backpatch-through: 11, where it was introduced
Discussion: https://postgr.es/m/17072-2f8764857ef2c92a@postgresql.org
|
|
|
|
|
|
|
|
|
| |
In gistinitpage, pageSize variable looks redundant, instead just
pass BLCKSZ. This will be consistent with its peers BloomInitPage,
brin_page_init and SpGistInitPage.
Author: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CALj2ACWj=V1k5591eeZK2sOg2FYuBUp6azFO8tMkBtGfXf8PMQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
The failsafe can trigger when index processing is already disabled.
This can happen when VACUUM's INDEX_CLEANUP parameter is "off" and the
failsafe happens to trigger. Remove assertions that assume that index
processing is directly tied to the failsafe.
Oversight in commit c242baa4, which made it possible for the failsafe to
trigger in a two-pass strategy VACUUM that has yet to make its first
call to lazy_vacuum_all_indexes().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generalize the INDEX_CLEANUP VACUUM parameter (and the corresponding
reloption): make it into a ternary style boolean parameter. It now
exposes a third option, "auto". The "auto" option (which is now the
default) enables the "bypass index vacuuming" optimization added by
commit 1e55e7d1.
"VACUUM (INDEX_CLEANUP TRUE)" is redefined to once again make VACUUM
simply do any required index vacuuming, regardless of how few dead
tuples are encountered during the first scan of the target heap relation
(unless there are exactly zero). This gives users a way of opting out
of the "bypass index vacuuming" optimization, if for whatever reason
that proves necessary. It is also expected to be used by PostgreSQL
developers as a testing option from time to time.
"VACUUM (INDEX_CLEANUP FALSE)" does the same thing as it always has: it
forcibly disables both index vacuuming and index cleanup. It's not
expected to be used much in PostgreSQL 14. The failsafe mechanism added
by commit 1e55e7d1 addresses the same problem in a simpler way.
INDEX_CLEANUP can now be thought of as a testing and compatibility
option.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CAH2-WznrBoCST4_Gxh_G9hA8NzGUbeBGnOUC8FcXcrhqsv6OHQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One of the error paths left *members uninitialized. That's not a live
bug, because most callers don't look at *members when the function
returns -1, but let's be tidy. One caller, in heap_lock_tuple(), does
"if (members != NULL) pfree(members)", but AFAICS it never passes an
invalid 'multi' value so it should not reach that error case.
The callers are also a bit inconsistent in their expectations.
heap_lock_tuple() pfrees the 'members' array if it's not-NULL, others
pfree() it if "nmembers >= 0", and others if "nmembers > 0". That's
not a live bug either, because the function should never return 0, but
add an Assert for that to make it more clear. I left the callers alone
for now.
I also moved the line where we set *nmembers. It wasn't wrong before,
but I like to do that right next to the 'return' statement, to make it
clear that it's always set on return.
Also remove one unreachable return statement after ereport(ERROR), for
brevity and for consistency with the similar if-block right after it.
Author: Greg Nancarrow with the additional changes by me
Backpatch-through: 9.6, all supported versions
|
|
|
|
|
|
|
|
| |
Since commit c24dcd0cfd, we have been using pg_pread() to read the WAL
file, which doesn't change the seek position (unless we fall back to
the implementation in src/port/pread.c). Update comment accordingly.
Backpatch-through: 12, where we started to use pg_pread()
|
|
|
|
|
|
|
| |
Bugfix commit 5fc89376 effectively made the lock_waiter_detected field
from vacuumlazy.c's global state struct into private state owned by
lazy_truncate_heap(). Finish this off by replacing the struct field
with a local variable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The extra checks added by the recompression of toast data introduced in
bbe0a81 is proving to have a performance impact on VACUUM or CLUSTER
even if no recompression is done. This is more noticeable with more
toastable columns that contain non-NULL values.
Improvements could be done to make those extra checks less expensive,
but that's not material for 14 at this stage, and we are not sure either
if the code path of VACUUM FULL/CLUSTER is adapted for this job.
Per discussion with several people, including Andres Freund, Robert
Haas, Álvaro Herrera, Tom Lane and myself.
Discussion: https://postgr.es/m/20210527003144.xxqppojoiwurc2iz@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were already reporting it, but only after the parallel workers were
finished, which is visibly much later than what happens in a serial
build.
With this change we report it when the leader starts its own sort phase
when participating in the build (the normal case). Now this might
happen a little later than when the workers start their sorting phases,
but a) communicating the actual phase start from workers is likely to be
a hassle, and b) the sort phase start is pretty fuzzy anyway, since
sorting per se is actually initiated by tuplesort.c internally earlier
than tuplesort_performsort() is called.
Backpatch to pg12, where the progress reporting code for CREATE INDEX
went in.
Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Greg Nancarrow <gregn4422@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/1128176d-1eee-55d4-37ca-e63644422adb
|
|
|
|
| |
Discussion: https://postgr.es/m/CAApHDvrbyJNOPBws4RUhXghZ7+TBjtdO-rznTsqZECuowNorXg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This only happens if (1) the new standby has no WAL available locally,
(2) the new standby is starting from the old timeline, (3) the promotion
happened in the WAL segment from which the new standby is starting,
(4) the timeline history file for the new timeline is available from
the archive but the WAL files for are not (i.e. this is a race),
(5) the WAL files for the new timeline are available via streaming,
and (6) recovery_target_timeline='latest'.
Commit ee994272ca50f70b53074f0febaec97e28f83c4e introduced this
logic and was an improvement over the previous code, but it mishandled
this case. If recovery_target_timeline='latest' and restore_command is
set, validateRecoveryParameters() can change recoveryTargetTLI to be
different from receiveTLI. If streaming is then tried afterward,
expectedTLEs gets initialized with the history of the wrong timeline.
It's supposed to be a list of entries explaining how to get to the
target timeline, but in this case it ends up with a list of entries
explaining how to get to the new standby's original timeline, which
isn't right.
Dilip Kumar and Robert Haas, reviewed by Kyotaro Horiguchi.
Discussion: http://postgr.es/m/CAFiTN-sE-jr=LB8jQuxeqikd-Ux+jHiXyh4YDiZMPedgQKup0g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 8e03eb92e9 reverted a bit too much code, reintroducing one of the
issues fixed by 39b66a91bd - a page might have been left partially empty
after relcache invalidation.
Reported-By: Tom Lane
Author: Masahiko Sawada
Discussion: https://postgr.es/m/822752.1623032114@sss.pgh.pa.us
Discussion: https://postgr.es/m/CAD21AoA%3D%3Df2VSw3c-Cp_y%3DWLKHMKc1D6s7g3YWsCOvgaYPpJcg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This tidies up some questionable coding which made use of
DatumGetPointer() for Datums being passed into functions where the
parameter is expected to be a cstring. We saw no compiler warnings with
the old code as the Pointer type used in DatumGetPointer() happens to
be a char * rather than a void *. However, that's no excuse and we should
be using the correct macro for the job.
Here we also make use of OutputFunctionCall() rather than using
FunctionCall1() directly to call the type's output function.
OutputFunctionCall() is the standard way to do this. It casts the
returned value to a cstring for us.
In passing get rid of a duplicate call to strlen(). Most compilers will
likely optimize away the 2nd call, but there may be some that won't. In
any case, this just aligns the code to some other nearby code that already
does this.
Discussion: https://postgr.es/m/CAApHDvq1D=ehZ8hey8Hz67N+_Zth0GHO5wiVCfv1YcGPMXJq0A@mail.gmail.com
|
|
|
|
|
|
|
| |
A few patches committed after ca3b37487 mistakenly forgot to make the
copyright year 2021. Fix these.
Discussion: https://postgr.es/m/CAApHDvqyLmd9P2oBQYJ=DbrV8QwyPRdmXtCTFYPE08h+ip0UJw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a few places that were using appendStringInfo() when they should have
been using appendStringInfoString(). Also some cases of
appendPQExpBuffer() that would have been better suited to use
appendPQExpBufferChar(), and finally, some places that used
appendPQExpBuffer() when appendPQExpBufferStr() would have suited better.
There are no bugs are being fixed here. The aim is just to make the code
use the most optimal function for the job.
All the code being changed here is new to PG14. It makes sense to fix
these before we branch for PG15. There are a few other places that we
could fix, but those cases are older code so fixing those seems less
worthwhile as it may cause unnecessary back-patching pain in the future.
Author: Hou Zhijie
Discussion: https://postgr.es/m/OS0PR01MB5716732158B1C4142C6FE375943D9@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
|
|
|
|
|
|
|
|
| |
Reverts most of commit 39b66a91bd, which was found to cause significant
regression for REFRESH MATERIALIZED VIEW. This means only rows inserted
by heap_multi_insert will benefit from the optimization, implemented in
commit 7db0cd2145.
Reported-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoA%3D%3Df2VSw3c-Cp_y%3DWLKHMKc1D6s7g3YWsCOvgaYPpJcg%40mail.gmail.com
|
|
|
|
| |
Oversight in commit 5100010e.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Redefine '\0' (InvalidCompressionMethod) as meaning "if we need to
compress, use the current setting of default_toast_compression".
This allows '\0' to be a suitable default choice regardless of
datatype, greatly simplifying code paths that initialize tupledescs
and the like. It seems like a more user-friendly approach as well,
because now the default compression choice doesn't migrate into table
definitions, meaning that changing default_toast_compression is
usually sufficient to flip an installation's behavior; one needn't
tediously issue per-column ALTER SET COMPRESSION commands.
Along the way, fix a few minor bugs and documentation issues
with the per-column-compression feature. Adopt more robust
APIs for SetIndexStorageProperties and GetAttributeCompression.
Bump catversion because typical contents of attcompression will now
be different. We could get away without doing that, but it seems
better to ensure v14 installations all agree on this. (We already
forced initdb for beta2, anyway.)
Discussion: https://postgr.es/m/626613.1621787110@sss.pgh.pa.us
|
|
|
|
|
| |
Author: Hou Zhijie
Discussion: https://postgr.es/m/OS0PR01MB571612191738540B27A8DE5894249@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VACUUM FULL and CLUSTER can be used to enforce the use of the existing
compression method of a toastable column if a value currently stored is
compressed with a method that does not match the column's defined
method. The code in charge of decompressing and recompressing toast
values at rewrite left around the detoasted values, causing an
accumulation of memory allocated in TopTransactionContext.
When processing large relations, this could cause the system to run out
of memory. The detoasted values are not needed once their tuple is
rewritten, and this commit ensures that the necessary cleanup happens.
Issue introduced by bbe0a81d. The comments of the area are reordered a
bit while on it.
Reported-by: Andres Freund
Analyzed-by: Andres Freund
Author: Michael Paquier
Reviewed-by: Dilip Kumar
Discussion: https://postgr.es/m/20210521211929.pcehg6f23icwstdb@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The wraparound failsafe mechanism added by commit 1e55e7d1 handled the
one-pass strategy case (i.e. the "table has no indexes" case) by adding
a dedicated failsafe check. This made up for the fact that the usual
one-pass checks inside lazy_vacuum_all_indexes() cannot ever be reached
during a one-pass strategy VACUUM.
This approach failed to account for two-pass VACUUMs that opt out of
index vacuuming up-front. The INDEX_CLEANUP off case in the only case
that works like that.
Fix this by performing a failsafe check every 4GB during the first scan
of the heap, regardless of the details of the VACUUM. This eliminates
the special case, and will make the failsafe trigger more reliably.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20210424002921.pb3t7h6frupdqnkp@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that attcompression is just a char, there's a lot of wasted
padding space after it. Move it into the group of char-wide
columns to save a net of 4 bytes per pg_attribute entry. While
we're at it, swap the order of attstorage and attalign to make for
a more logical grouping of these columns.
Also re-order actions in related code to match the new field ordering.
This patch also fixes one outright bug: equalTupleDescs() failed to
compare attcompression. That could, for example, cause relcache
reload to fail to adopt a new value following a change.
Michael Paquier and Tom Lane, per a gripe from Andres Freund.
Discussion: https://postgr.es/m/20210517204803.iyk5wwvwgtjcmc5w@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
exec_for_query() normally tries to prefetch a few rows at a time
from the query being iterated over, so as to reduce executor
entry/exit overhead. Unfortunately this is unsafe if we have
COMMIT or ROLLBACK within the loop, because there might be
TOAST references in the data that we prefetched but haven't
yet examined. Immediately after the COMMIT/ROLLBACK, we have
no snapshots in the session, meaning that VACUUM is at liberty
to remove recently-deleted TOAST rows.
This was originally reported as a case triggering the "no known
snapshots" error in init_toast_snapshot(), but even if you miss
hitting that, you can get "missing toast chunk", as illustrated
by the added isolation test case.
To fix, just disable prefetching in non-atomic contexts. Maybe
there will be performance complaints prompting us to work harder
later, but it's not clear at the moment that this really costs
much, and I doubt we'd want to back-patch any complicated fix.
In passing, adjust that error message in init_toast_snapshot()
to be a little clearer about the likely cause of the problem.
Patch by me, based on earlier investigation by Konstantin Knizhnik.
Per bug #15990 from Andreas Wicht. Back-patch to v11 where
intra-procedure COMMIT was added.
Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a promotion is triggered while recovery is paused, the paused state ends
and promotion continues. But previously in that case
pg_get_wal_replay_pause_state() returned 'paused' wrongly while a promotion
was ongoing.
This commit changes a standby promotion so that it marks the recovery
pause state as 'not paused' when it's triggered, to fix the issue.
Author: Fujii Masao
Reviewed-by: Dilip Kumar, Kyotaro Horiguchi
Discussion: https://postgr.es/m/f706876c-4894-0ba5-6f4d-79803eeea21b@oss.nttdata.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a defensive "can't happen" error to code that handles nbtree posting
list splits (promote an existing assertion). This avoids a segfault in
the event of an insertion of a newitem that is somehow identical to an
existing non-pivot tuple in the index. An nbtree index should never
have two index tuples with identical TIDs.
This scenario is not particular unlikely in the event of any kind of
corruption that leaves the index in an inconsistent state relative to
the heap relation that is indexed. There are two known reports of
preventable hard crashes. Doing nothing seems unacceptable given the
general expectation that nbtree will cope reasonably well with corrupt
data.
Discussion: https://postgr.es/m/CAH2-Wz=Jr_d-dOYEEmwz0-ifojVNWho01eAqewfQXgKfoe114w@mail.gmail.com
Backpatch: 13-, where nbtree deduplication was introduced.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Formerly we just relied on operator classes that assert longValuesOK
to eventually shorten the leaf value enough to fit on an index page.
That fails since the introduction of INCLUDE-column support (commit
09c1c6ab4), because the INCLUDE columns might alone take up more
than a page, meaning no amount of leaf-datum compaction will get
the job done. At least with spgtextproc.c, that leads to an infinite
loop, since spgtextproc.c won't throw an error for not being able
to shorten the leaf datum anymore.
To fix without breaking cases that would otherwise work, add logic
to spgdoinsert() to verify that the leaf tuple size is decreasing
after each "choose" step. Some opclasses might not decrease the
size on every single cycle, and in any case, alignment roundoff
of the tuple size could obscure small gains. Therefore, allow
up to 10 cycles without additional savings before throwing an
error. (Perhaps this number will need adjustment, but it seems
quite generous right now.)
As long as we've developed this logic, let's back-patch it.
The back branches don't have INCLUDE columns to worry about, but
this seems like a good defense against possible bugs in operator
classes. We already know that an infinite loop here is pretty
unpleasant, so having a defense seems to outweigh the risk of
breaking things. (Note that spgtextproc.c is actually the only
known opclass with longValuesOK support, so that this is all moot
for known non-core opclasses anyway.)
Per report from Dilip Kumar.
Discussion: https://postgr.es/m/CAFiTN-uxP_soPhVG840tRMQTBmtA_f_Y8N51G7DKYYqDh7XN-A@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Knowing that a buggy opclass could cause an infinite insertion loop,
spgdoinsert() intended to allow its loop to be interrupted by query
cancel. However, that never actually worked, because in iterations
after the first, we'd be holding buffer lock(s) which would cause
InterruptHoldoffCount to be positive, preventing servicing of the
interrupt.
To fix, check if an interrupt is pending, and if so fall out of
the insertion loop and service the interrupt after we've released
the buffers. If it was indeed a query cancel, that's the end of
the matter. If it was a non-canceling interrupt reason, make use
of the existing provision to retry the whole insertion. (This isn't
as wasteful as it might seem, since any upper-level index tuples we
already created should be usable in the next attempt.)
While there's no known instance of such a bug in existing release
branches, it still seems like a good idea to back-patch this to
all supported branches, since the behavior is fairly nasty if a
loop does happen --- not only is it uncancelable, but it will
quickly consume memory to the point of an OOM failure. In any
case, this code is certainly not working as intended.
Per report from Dilip Kumar.
Discussion: https://postgr.es/m/CAFiTN-uxP_soPhVG840tRMQTBmtA_f_Y8N51G7DKYYqDh7XN-A@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The percentage of blocks from the table value reported by autovacuum log
output (following commit 5100010ee4d) should never exceed 100% because
it describes the state of the table back when lazy_vacuum() was called.
The value could nevertheless exceed 100% in the event of heap relation
truncation. We failed to compensate for how truncation affects
rel_pages.
Fix the faulty accounting by using the original rel_pages value instead
of the current/final rel_pages value.
Reported-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210423204306.5osfpkt2ggaedyvy@alap3.anarazel.de
|
|
|
|
|
|
|
|
| |
Also "make reformat-dat-files".
The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously long was used as the data type for some counters in BufferUsage
and WalUsage. But long is only four byte, e.g., on Windows, and it's entirely
possible to wrap a four byte counter. For example, emitting more than
four billion WAL records in one transaction isn't actually particularly rare.
To avoid the overflows of those counters, this commit changes the data type
of them from long to int64.
Suggested-by: Andres Freund
Author: Masahiro Ikeda
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/20201221211650.k7b53tcnadrciqjo@alap3.anarazel.de
Discussion: https://postgr.es/m/af0964ac-7080-1984-dc23-513754987716@oss.nttdata.com
|