| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
| |
Change XLogRegisterData() and XLogRegisterBufData() functions to take
void * for binary data instead of char *. This will remove the need
for numerous casts (done in a separate commit for clarity).
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
|
|
|
|
| |
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The inplace update survives ROLLBACK. The inval didn't, so another
backend's DDL could then update the row without incorporating the
inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER
TABLE resulted in a table with an index, yet relhasindex=f. That is a
source of index corruption. Back-patch to v12 (all supported versions).
The back branch versions don't change WAL, because those branches just
added end-of-recovery SIResetAll(). All branches change the ABI of
extern function PrepareToInvalidateCacheTuple(). No PGXN extension
calls that, and there's no apparent use case in extensions.
Reviewed by Nitin Motiani and (in earlier versions) Andres Freund.
Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This opens the possibility to define keys for more types of statistics
kinds in PgStat_HashKey, the first case being 8-byte query IDs for
statistics like pg_stat_statements.
This increases the size of PgStat_HashKey from 12 to 16 bytes, while
PgStatShared_HashEntry, entry stored in the dshash for pgstats, keeps
the same size due to alignment.
xl_xact_stats_item, that tracks the stats items to drop in commit WAL
records, is increased from 12 to 16 bytes. Note that individual chunks
in commit WAL records should be multiples of sizeof(int), hence 8-byte
object IDs are stored as two uint32, based on a suggestion from Heikki
Linnakangas.
While on it, the field of PgStat_HashKey is renamed from "objoid" to
"objid", as for some stats kinds this field does not refer to OIDs but
just IDs, like for replication slot stats.
This commit bumps the following format variables:
- PGSTAT_FILE_FORMAT_ID, as PgStat_HashKey is written to the stats file
for non-serialized stats kinds in the dshash table.
- XLOG_PAGE_MAGIC for the changes in xl_xact_stats_item.
- Catalog version, for the SQL function pg_stat_have_stats().
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/ZsvTS9EW79Up8I62@paquier.xyz
|
|
|
|
|
|
|
|
| |
Add const qualifiers to XLogRegisterData() and XLogRegisterBufData().
Several unconstify() calls can be removed.
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/dd889784-9ce7-436a-b4f1-52e4a5e577bd@eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To do this, we must include the wal_level in the first WAL record
covered by each summary file; so add wal_level to struct Checkpoint
and the payload of XLOG_CHECKPOINT_REDO and XLOG_END_OF_RECOVERY.
This, in turn, requires bumping XLOG_PAGE_MAGIC and, since the
Checkpoint is also stored in the control file, also
PG_CONTROL_VERSION. It's not great to do that so late in the release
cycle, but the alternative seems to ship v17 without robust
protections against this scenario, which could result in corrupted
incremental backups.
A side effect of this patch is that, when a server with
wal_level=replica is started with summarize_wal=on for the first time,
summarization will no longer begin with the oldest WAL that still
exists in pg_wal, but rather from the first checkpoint after that.
This change should be harmless, because a WAL summary for a partial
checkpoint cycle can never make an incremental backup possible when
it would otherwise not have been.
Report by Fujii Masao. Patch by me. Review and/or testing by Jakub
Wartak and Fujii Masao.
Discussion: http://postgr.es/m/6e30082e-041b-4e31-9633-95a66de76f5d@oss.nttdata.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new combined WAL record is now used for pruning, freezing and 2nd
pass of vacuum. This is in preparation for changing VACUUM to write a
combined prune+freeze record per page, instead of separate two
records. The new WAL record format now supports that, but the code
still always writes separate records for pruning and freezing.
This reserves separate XLOG_HEAP2_* info codes for when the pruning
record is emitted for on-access pruning or VACUUM, per Peter
Geoghegan's suggestion. The record format is identical, but having
separate info codes makes it easier analyze pruning and vacuuming with
pg_waldump.
The function to emit the new WAL record, log_heap_prune_and_freeze(),
is in pruneheap.c. The existing heap_log_freeze_plan() and its
subroutines are moved to pruneheap.c without changes, to keep them
together with log_heap_prune_and_freeze().
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAAKRu_azf-zH%3DDgVbquZ3tFWjMY1w5pO8m-TXJaMdri8z3933g@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAAKRu_b2oE4GL%3Dq4g9mcByS9yT7wTQvEH9OLpabj28e%2BWKFi2A@mail.gmail.com
|
|
|
|
|
|
|
|
| |
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz
Backpatch-through: 12
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since C99, there can be a trailing comma after the last value in an
enum definition. A lot of new code has been introducing this style on
the fly. Some new patches are now taking an inconsistent approach to
this. Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one. We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.
I omitted a few places where there was a fixed "last" value that will
always stay last. I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers. There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.
Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows tools that read the WAL sequentially to identify (possible)
redo points when they're reached, rather than only being able to
detect them in retrospect when XLOG_CHECKPOINT_ONLINE is found, possibly
much later in the WAL stream. There are other possible applications as
well; see the discussion links below.
Any redo location that precedes the checkpoint location should now point
to an XLOG_CHECKPOINT_REDO record, so add a cross-check to verify this.
While adjusting the code in CreateCheckPoint() for this patch, I made it
call WALInsertLockAcquireExclusive a bit later than before, since there
appears to be no need for it to be held while checking whether the system
is idle, whether this is an end-of-recovery checkpoint, or what the current
timeline is.
Bump XLOG_PAGE_MAGIC.
Patch by me, based in part on earlier work from Dilip Kumar. Review by
Dilip Kumar, Amit Kapila, Andres Freund, and Michael Paquier.
Discussion: http://postgr.es/m/CA+TgmoYy-Vc6G9QKcAKNksCa29cv__czr+N9X_QCxEfQVpp_8w@mail.gmail.com
Discussion: http://postgr.es/m/20230614194717.jyuw3okxup4cvtbt%40awork3.anarazel.de
Discussion: http://postgr.es/m/CA+hUKG+b2ego8=YNW2Ohe9QmSiReh1-ogrv8V_WZpJTqP3O+2w@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit only implements one prerequisite part for allowing logical
decoding. The commit message contains an explanation of the overall design,
which later commits will refer back to.
Overall design:
1. We want to enable logical decoding on standbys, but replay of WAL
from the primary might remove data that is needed by logical decoding,
causing error(s) on the standby. To prevent those errors, a new replication
conflict scenario needs to be addressed (as much as hot standby does).
2. Our chosen strategy for dealing with this type of replication slot
is to invalidate logical slots for which needed data has been removed.
3. To do this we need the latestRemovedXid for each change, just as we
do for physical replication conflicts, but we also need to know
whether any particular change was to data that logical replication
might access. That way, during WAL replay, we know when there is a risk of
conflict and, if so, if there is a conflict.
4. We can't rely on the standby's relcache entries for this purpose in
any way, because the startup process can't access catalog contents.
5. Therefore every WAL record that potentially removes data from the
index or heap must carry a flag indicating whether or not it is one
that might be accessed during logical decoding.
Why do we need this for logical decoding on standby?
First, let's forget about logical decoding on standby and recall that
on a primary database, any catalog rows that may be needed by a logical
decoding replication slot are not removed.
This is done thanks to the catalog_xmin associated with the logical
replication slot.
But, with logical decoding on standby, in the following cases:
- hot_standby_feedback is off
- hot_standby_feedback is on but there is no a physical slot between
the primary and the standby. Then, hot_standby_feedback will work,
but only while the connection is alive (for example a node restart
would break it)
Then, the primary may delete system catalog rows that could be needed
by the logical decoding on the standby (as it does not know about the
catalog_xmin on the standby).
So, it’s mandatory to identify those rows and invalidate the slots
that may need them if any. Identifying those rows is the purpose of
this commit.
Implementation:
When a WAL replay on standby indicates that a catalog table tuple is
to be deleted by an xid that is greater than a logical slot's
catalog_xmin, then that means the slot's catalog_xmin conflicts with
the xid, and we need to handle the conflict. While subsequent commits
will do the actual conflict handling, this commit adds a new field
isCatalogRel in such WAL records (and a new bit set in the
xl_heap_visible flags field), that is true for catalog tables, so as to
arrange for conflict handling.
The affected WAL records are the ones that already contain the
snapshotConflictHorizon field, namely:
- gistxlogDelete
- gistxlogPageReuse
- xl_hash_vacuum_one_page
- xl_heap_prune
- xl_heap_freeze_page
- xl_heap_visible
- xl_btree_reuse_page
- xl_btree_delete
- spgxlogVacuumRedirect
Due to this new field being added, xl_hash_vacuum_one_page and
gistxlogDelete do now contain the offsets to be deleted as a
FLEXIBLE_ARRAY_MEMBER. This is needed to ensure correct alignment.
It's not needed on the others struct where isCatalogRel has
been added.
This commit just introduces the WAL format changes mentioned above. Handling
the actual conflicts will follow in future commits.
Bumps XLOG_PAGE_MAGIC as the several WAL records are changed.
Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Author: Andres Freund <andres@anarazel.de> (in an older version)
Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version)
Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This will create two bytes of padding space in xl_hash_vacuum_one_page which
can be used for future patches. This makes the datatype of
xl_hash_vacuum_one_page.ntuples same as gistxlogDelete.ntodelete which is
advisable as both are used for the same purpose.
Author: Bertrand Drouvot
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/b0e20c40-cb7a-fc1c-c607-2a78dac5021e@gmail.com
|
|
|
|
| |
Backpatch-through: 11
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make heapam WAL records that describe freezing performed by VACUUM more
space efficient by storing each distinct "freeze plan" once, alongside
an array of associated page offset numbers (one per freeze plan). The
freeze plans required for most heap pages tend to naturally have a great
deal of redundancy, so this technique is very effective in practice. It
often leads to freeze WAL records that are less than 20% of the size of
equivalent WAL records generated using the previous approach.
The freeze plan concept was introduced by commit 3b97e6823b, which fixed
bugs in VACUUM's handling of MultiXacts. We retain the concept of
freeze plans, but go back to using page offset number arrays. There is
no loss of generality here because deduplication is an additive process
that gets applied mechanically when FREEZE_PAGE WAL records are built.
More than anything else, freeze plan deduplication is an optimization
that reduces the marginal cost of freezing additional tuples on pages
that will need to have at least one or two tuples frozen in any case.
Ongoing work that adds page-level freezing to VACUUM will take full
advantage of the improved cost profile through batching.
Also refactor some of the details surrounding recovery conflicts needed
to REDO freeze records in passing: make original execution responsible
for generating a standard latestRemovedXid cutoff, rather than working
backwards to get the same cutoff in the REDO routine. Bugfix commit
66fbcb0d2e did it the other way around, which is equivalent but obscures
what's going on.
Also rename the cutoff field from the WAL record/struct (rename the
field cutoff_xid to latestRemovedXid to match similar WAL records).
Processing of conflicts by REDO routines is already completely uniform,
so tools like pg_waldump should present the information driving the
process uniformly. There are two remaining WAL record types that still
don't quite follow this convention (heapam's VISIBLE record type and
SP-GiST's VACUUM_REDIRECT record type). They can be brought into line
by later work that totally standardizes how the cutoffs are presented.
Bump XLOG_PAGE_MAGIC.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CAH2-Wz=XytErMnb8FAyFd+OQEbiipB0Q2FmFdXrggPL4VBnRYQ@mail.gmail.com
|
|
|
|
|
| |
Reviewed-by: Amul Sul <sulamul@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5b558da8-99fb-0a99-83dd-f72f05388517%40enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
Up until now, we've had a policy of only marking certain variables
in the PostgreSQL header files with PGDLLIMPORT, but now we've
decided to mark them all. This means that extensions running on
Windows should no longer operate at a disadvantage as compared to
extensions running on Linux: if the variable is present in a header
file, it should be accessible.
Discussion: http://postgr.es/m/CA+TgmoYanc1_FSfimhgiWSqVyP5KKmh5NP2BWNwDhO8Pg2vGYQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
| |
Provides similar functionality to pg_waldump, but from a SQL interface
rather than a separate utility.
Author: Bharath Rupireddy
Reviewed-by: Greg Stark, Kyotaro Horiguchi, Andres Freund, Ashutosh Sharma, Nitin Jadhav, RKN Sai Krishna
Discussion: https://postgr.es/m/CALj2ACUGUYXsEQdKhEdsBzhGEyF3xggvLdD8C0VT72TNEfOiog%40mail.gmail.com
|
|
|
|
| |
Discussion: https://postgr.es/m/3280724.1649340315@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow extensions to specify a new custom resource manager (rmgr),
which allows specialized WAL. This is meant to be used by a Table
Access Method or Index Access Method.
Prior to this commit, only Generic WAL was available, which offers
support for recovery and physical replication but not logical
replication.
Reviewed-by: Julien Rouhaud, Bharath Rupireddy, Andres Freund
Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One problematic part of the current statistics collector design is that there
is no reliable way of getting rid of statistics entries. Because of that
pgstat_vacuum_stat() (called by [auto-]vacuum) matches all stats for the
current database with the catalog contents and tries to drop now-superfluous
entries. That's quite expensive. What's worse, it doesn't work on physical
replicas, despite physical replicas collection statistics entries.
This commit introduces infrastructure to create / drop statistics entries
transactionally, together with the underlying catalog objects (functions,
relations, subscriptions). pgstat_xact.c maintains a list of stats entries
created / dropped transactionally in the current transaction. To ensure the
removal of statistics entries is durable dropped statistics entries are
included in commit / abort (and prepare) records, which also ensures that
stats entries are dropped on standbys.
Statistics entries created separately from creating the underlying catalog
object (e.g. when stats were previously lost due to an immediate restart)
are *not* WAL logged. However that can only happen outside of the transaction
creating the catalog object, so it does not lead to "leaked" statistics
entries.
For this to work, functions creating / dropping functions / relations /
subscriptions need to call into pgstat. For subscriptions this was already
done when dropping subscriptions, via pgstat_report_subscription_drop() (now
renamed to pgstat_drop_subscription()).
This commit does not actually drop stats yet, it just provides the
infrastructure. It is however a largely independent piece of infrastructure,
so committing it separately makes sense.
Bumps XLOG_PAGE_MAGIC.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
| |
While on it, fix a thinko in the docs, introduced by the same commit.
Oversights in e953732.
Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20220311214900.GN28503@telsasoft.com
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new rmgr method, rm_decode, and use that rather than a switch
statement.
In preparation for rmgr extensibility.
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
Discussion: https://postgr.es/m/20220118095332.6xtlcjoyxobv6cbk@jrouhaud
|
|
|
|
| |
Backpatch-through: 10
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Physical replication always ships WAL segment files to replicas once
they are complete. This is a problem if one WAL record is split across
a segment boundary and the primary server crashes before writing down
the segment with the next portion of the WAL record: WAL writing after
crash recovery would happily resume at the point where the broken record
started, overwriting that record ... but any standby or backup may have
already received a copy of that segment, and they are not rewinding.
This causes standbys to stop following the primary after the latter
crashes:
LOG: invalid contrecord length 7262 at A8/D9FFFBC8
because the standby is still trying to read the continuation record
(contrecord) for the original long WAL record, but it is not there and
it will never be. A workaround is to stop the replica, delete the WAL
file, and restart it -- at which point a fresh copy is brought over from
the primary. But that's pretty labor intensive, and I bet many users
would just give up and re-clone the standby instead.
A fix for this problem was already attempted in commit 515e3d84a0b5, but
it only addressed the case for the scenario of WAL archiving, so
streaming replication would still be a problem (as well as other things
such as taking a filesystem-level backup while the server is down after
having crashed), and it had performance scalability problems too; so it
had to be reverted.
This commit fixes the problem using an approach suggested by Andres
Freund, whereby the initial portion(s) of the split-up WAL record are
kept, and a special type of WAL record is written where the contrecord
was lost, so that WAL replay in the replica knows to skip the broken
parts. With this approach, we can continue to stream/archive segment
files as soon as they are complete, and replay of the broken records
will proceed across the crash point without a hitch.
Because a new type of WAL record is added, users should be careful to
upgrade standbys first, primaries later. Otherwise they risk the standby
being unable to start if the primary happens to write such a record.
A new TAP test that exercises this is added, but the portability of it
is yet to be seen.
This has been wrong since the introduction of physical replication, so
backpatch all the way back. In stable branches, keep the new
XLogReaderState members at the end of the struct, to avoid an ABI
break.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://postgr.es/m/202108232252.dh7uxf6oxwcy@alvherre.pgsql
|
|
|
|
|
|
| |
Oversight in 4035cd5, spotted by Tom Lane.
Discussion: https://postgr.es/m/365778.1624941613@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 438fc4a39c prevented the WAL replay from writing
COMMIT_TS_SETTS record. By this change there is no code that
generates COMMIT_TS_SETTS record in PostgreSQL core.
Also we can think that there are no extensions using the record
because we've not received so far any complaints about the issue
that commit 438fc4a39c fixed. Therefore this commit removes
COMMIT_TS_SETTS record and its related code. Even without
this record, the timestamp required for commit timestamp feature
can be acquired from the COMMIT record.
Bump WAL page magic.
Reported-by: lx zou <zoulx1982@163.com>
Author: Fujii Masao
Reviewed-by: Alvaro Herrera
Discussion: https://postgr.es/m/16931-620d0f2fdc6108f1@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Retry the call to heap_prune_page() in rare cases where there is
disagreement between the heap_prune_page() call and the call to
HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is
possible when a concurrently-aborted transaction makes a tuple DEAD
during the tiny window between each step. This was the only case where
a tuple considered DEAD by VACUUM still had storage following pruning.
VACUUM's definition of dead tuples is now uniformly simple and
unambiguous: dead tuples from each page are always LP_DEAD line pointers
that were encountered just after we performed pruning (and just before
we considered freezing remaining items with tuple storage).
Eliminating the tupgone=true special case enables INDEX_CLEANUP=off
style skipping of index vacuuming that takes place based on flexible,
dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping
indexes up-front before now, due to a subtle interaction with the
special case (see commit dd695979) -- this was a special case unto
itself. Now there are no special cases. And so now it won't matter
when or how we decide to skip index vacuuming: it won't affect how
pruning behaves, and it won't be affected by any of the implementation
details of pruning or freezing.
Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer
necessary because we now rely entirely on heap pruning taking care of
recovery conflicts. There is no longer any need to generate recovery
conflicts for DEAD tuples that pruning just missed. This also means
that heap vacuuming now uses exactly the same strategy for recovery
conflicts as index vacuuming always has: REDO routines never need to
process a latestRemovedXid from the WAL record, since earlier REDO of
the WAL record from pruning is sufficient in all cases. The generic
XLOG_HEAP2_CLEAN record type is now split into two new record types to
reflect this new division (these are called XLOG_HEAP2_PRUNE and
XLOG_HEAP2_VACUUM).
Also stop acquiring a super-exclusive lock for heap pages when they're
vacuumed during VACUUM's second heap pass. A regular exclusive lock is
enough. This is correct because heap page vacuuming is now strictly a
matter of setting the LP_DEAD line pointers to LP_UNUSED. No other
backend can have a pointer to a tuple located in a pinned buffer that
can be invalidated by a concurrent heap page vacuum operation.
Heap vacuuming can now be thought of as conceptually similar to index
vacuuming and conceptually dissimilar to heap pruning. Heap pruning now
has sole responsibility for anything involving the logical contents of
the database (e.g., managing transaction status information, recovery
conflicts, considering what to do with HOT chains). Index vacuuming and
heap vacuuming are now only concerned with recycling garbage items from
physical data structures that back the logical database.
Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record
changes.
Credit for the idea of retrying pruning a page to avoid the tupgone case
goes to Andres Freund.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the entire idea of "stale stats" within nbtree VACUUM (stop
caring about stats involving the number of inserted tuples). Also
remove the vacuum_cleanup_index_scale_factor GUC/param on the master
branch (though just disable them on postgres 13).
The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM
partially responsible for deciding when pg_class.reltuples stats needed
to be updated. This seems contrary to the spirit of the index AM API,
though -- it is not actually necessary for an index AM's bulk delete and
cleanup callbacks to provide accurate stats when it happens to be
inconvenient. The core code owns that. (Index AMs have the authority
to perform or not perform certain kinds of deferred cleanup based on
their own considerations, such as page deletion and recycling, but that
has little to do with pg_class.reltuples/num_index_tuples.)
This issue was fairly harmless until the introduction of the
autovacuum_vacuum_insert_threshold feature by commit b07642db, which had
an undesirable interaction with the vacuum_cleanup_index_scale_factor
mechanism: it made insert-driven autovacuums perform full index scans,
even though there is no real benefit to doing so. This has been tied to
a regression with an append-only insert benchmark [1].
Also have remaining cases that perform a full scan of an index during a
cleanup-only nbtree VACUUM indicate that the final tuple count is only
an estimate. This prevents vacuumlazy.c from setting the index's
pg_class.reltuples in those cases (it will now only update pg_class when
vacuumlazy.c had TIDs for nbtree to bulk delete). This arguably fixes
an oversight in deduplication-related bugfix commit 48e12913.
[1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com
Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise we risk "leaking" deleted pages by making them non-recyclable
indefinitely. Commit 6655a729 did the same thing for deleted pages in
GiST indexes. That work was used as a starting point here.
Stop storing an XID indicating the oldest bpto.xact across all deleted
though unrecycled pages in nbtree metapages. There is no longer any
reason to care about that condition/the oldest XID. It only ever made
sense when wraparound was something _bt_vacuum_needs_cleanup() had to
consider.
The btm_oldest_btpo_xact metapage field has been repurposed and renamed.
It is now btm_last_cleanup_num_delpages, which is used to remember how
many non-recycled deleted pages remain from the last VACUUM (in practice
its value is usually the precise number of pages that were _newly
deleted_ during the specific VACUUM operation that last set the field).
The general idea behind storing btm_last_cleanup_num_delpages is to use
it to give _some_ consideration to non-recycled deleted pages inside
_bt_vacuum_needs_cleanup() -- though never too much. We only really
need to avoid leaving a truly excessive number of deleted pages in an
unrecycled state forever. We only do this to cover certain narrow cases
where no other factor makes VACUUM do a full scan, and yet the index
continues to grow (and so actually misses out on recycling existing
deleted pages).
These metapage changes result in a clear user-visible benefit: We no
longer trigger full index scans during VACUUM operations solely due to
the presence of only 1 or 2 known deleted (though unrecycled) blocks
from a very large index. All that matters now is keeping the costs and
benefits in balance over time.
Fix an issue that has been around since commit 857f9c36, which added the
"skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup()
logic). The accuracy of btm_last_cleanup_num_heap_tuples accidentally
hinged upon _when_ the source value gets stored. We now always store
btm_last_cleanup_num_heap_tuples in btvacuumcleanup(). This fixes the
issue because IndexVacuumInfo.num_heap_tuples (the source field) is
expected to accurately indicate the state of the table _after_ the
VACUUM completes inside btvacuumcleanup().
A backpatchable fix cannot easily be extracted from this commit. A
targeted fix for the issue will follow in a later commit, though that
won't happen today.
I (pgeoghegan) have chosen to remove any mention of deleted pages in the
documentation of the vacuum_cleanup_index_scale_factor GUC/param, since
the presence of deleted (though unrecycled) pages is no longer of much
concern to users. The vacuum_cleanup_index_scale_factor description in
the docs now seems rather unclear in any case, and it should probably be
rewritten in the near future. Perhaps some passing mention of page
deletion will be added back at the same time.
Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Teach nbtree and heapam to cooperate in order to eagerly remove
duplicate tuples representing dead MVCC versions. This is "bottom-up
deletion". Each bottom-up deletion pass is triggered lazily in response
to a flood of versions on an nbtree leaf page. This usually involves a
"logically unchanged index" hint (these are produced by the executor
mechanism added by commit 9dc718bd).
The immediate goal of bottom-up index deletion is to avoid "unnecessary"
page splits caused entirely by version duplicates. It naturally has an
even more useful effect, though: it acts as a backstop against
accumulating an excessive number of index tuple versions for any given
_logical row_. Bottom-up index deletion complements what we might now
call "top-down index deletion": index vacuuming performed by VACUUM.
Bottom-up index deletion responds to the immediate local needs of
queries, while leaving it up to autovacuum to perform infrequent clean
sweeps of the index. The overall effect is to avoid certain
pathological performance issues related to "version churn" from UPDATEs.
The previous tableam interface used by index AMs to perform tuple
deletion (the table_compute_xid_horizon_for_tuples() function) has been
replaced with a new interface that supports certain new requirements.
Many (perhaps all) of the capabilities added to nbtree by this commit
could also be extended to other index AMs. That is left as work for a
later commit.
Extend deletion of LP_DEAD-marked index tuples in nbtree by adding logic
to consider extra index tuples (that are not LP_DEAD-marked) for
deletion in passing. This increases the number of index tuples deleted
significantly in many cases. The LP_DEAD deletion process (which is now
called "simple deletion" to clearly distinguish it from bottom-up
deletion) won't usually need to visit any extra table blocks to check
these extra tuples. We have to visit the same table blocks anyway to
generate a latestRemovedXid value (at least in the common case where the
index deletion operation's WAL record needs such a value).
Testing has shown that the "extra tuples" simple deletion enhancement
increases the number of index tuples deleted with almost any workload
that has LP_DEAD bits set in leaf pages. That is, it almost never fails
to delete at least a few extra index tuples. It helps most of all in
cases that happen to naturally have a lot of delete-safe tuples. It's
not uncommon for an individual deletion operation to end up deleting an
order of magnitude more index tuples compared to the old naive approach
(e.g., custom instrumentation of the patch shows that this happens
fairly often when the regression tests are run).
Add a further enhancement that augments simple deletion and bottom-up
deletion in indexes that make use of deduplication: Teach nbtree's
_bt_delitems_delete() function to support granular TID deletion in
posting list tuples. It is now possible to delete individual TIDs from
posting list tuples provided the TIDs have a tableam block number of a
table block that gets visited as part of the deletion process (visiting
the table block can be triggered directly or indirectly). Setting the
LP_DEAD bit of a posting list tuple is still an all-or-nothing thing,
but that matters much less now that deletion only needs to start out
with the right _general_ idea about which index tuples are deletable.
Bump XLOG_PAGE_MAGIC because xl_btree_delete changed.
No bump in BTREE_VERSION, since there are no changes to the on-disk
representation of nbtree indexes. Indexes built on PostgreSQL 12 or
PostgreSQL 13 will automatically benefit from bottom-up index deletion
(i.e. no reindexing required) following a pg_upgrade. The enhancement
to simple deletion is available with all B-Tree indexes following a
pg_upgrade, no matter what PostgreSQL version the user upgrades from.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzm+maE3apHB8NOtmM=p-DO65j2V5GzAWCOEEuy3JZgb2g@mail.gmail.com
|
|
|
|
| |
Backpatch-through: 9.5
|
|
|
|
|
|
|
|
|
| |
Reuse cautionary language from src/test/ssl/README in
src/test/kerberos/README. SLRUs have had access to six-character
segments names since commit 73c986adde5d73a5e2555da9b5c8facedb146dcd,
and recovery stopped calling HeapTupleHeaderAdvanceLatestRemovedXid() in
commit 558a9165e081d1936573e5a7d576f5febd7fb55a. The other corrections
are more self-evident.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When wal_level=logical, write invalidations at command end into WAL so
that decoding can use this information.
This patch is required to allow the streaming of in-progress transactions
in logical decoding. The actual work to allow streaming will be committed
as a separate patch.
We still add the invalidations to the cache and write them to WAL at
commit time in RecordTransactionCommit(). This uses the existing
XLOG_INVALIDATIONS xlog record type, from the RM_STANDBY_ID resource
manager (see LogStandbyInvalidations for details).
So existing code relying on those invalidations (e.g. redo) does not need
to be changed.
The invalidations written at command end uses a new xlog record type
XLOG_XACT_INVALIDATIONS, from RM_XACT_ID resource manager. See
LogLogicalInvalidations for details.
These new xlog records are ignored by existing redo procedures, which
still rely on the invalidations written to commit records.
The invalidations are decoded and accumulated in top-transaction, and then
executed during replay. This obviates the need to decode the
invalidations as part of a commit record.
Bump XLOG_PAGE_MAGIC, since this introduces XLOG_XACT_INVALIDATIONS.
Author: Dilip Kumar, Tomas Vondra, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Neha Sharma and Mahendra Singh Thalor
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The logical decoding infrastructure needs to know which top-level
transaction the subxact belongs to, in order to decode all the
changes. Until now that might be delayed until commit, due to the
caching (GPROC_MAX_CACHED_SUBXIDS), preventing features requiring
incremental decoding.
So we also write the assignment info into WAL immediately, as part
of the next WAL record (to minimize overhead) only when wal_level=logical.
We can not remove the existing XLOG_XACT_ASSIGNMENT WAL as that is
required for avoiding overflow in the hot standby snapshot.
Bump XLOG_PAGE_MAGIC, since this introduces XLR_BLOCK_ID_TOPLEVEL_XID.
Author: Tomas Vondra, Dilip Kumar, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Neha Sharma and Mahendra Singh Thalor
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous definition of the column was almost universally disliked,
so provide this updated definition which is more useful for monitoring
purposes: a large positive value is good, while zero or a negative value
means danger. This should be operationally more convenient.
Backpatch to 13, where the new column to pg_replication_slots (and the
feature it represents) were added.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://postgr.es/m/9ddfbf8c-2f67-904d-44ed-cf8bc5916228@oss.nttdata.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now, only selected bulk operations (e.g. COPY) did this. If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY. See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules. Maintainers of table access
methods should examine that section.
To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL. A new GUC,
wal_skip_threshold, guides that choice. If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold. Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.
Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node. Make relcache.c retain entries for certain
dropped relations until end of transaction.
Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
Future servers accept older WAL, so this bump is discretionary.
Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas. Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem. Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.
Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The definitions of the routines defined in xlogarchive.c have been part
of xlog_internal.h which is included by several frontend tools, but all
those routines are only called by the backend. More cleanup could be
done within xlog_internal.h, but that's already a nice cut.
This will help a follow-up patch for pg_rewind where handling of
restore_command is added for frontends.
Author: Alexey Kondratov, Michael Paquier
Reviewed-by: Álvaro Herrera, Alexander Korotkov
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deduplication reduces the storage overhead of duplicates in indexes that
use the standard nbtree index access method. The deduplication process
is applied lazily, after the point where opportunistic deletion of
LP_DEAD-marked index tuples occurs. Deduplication is only applied at
the point where a leaf page split would otherwise be required. New
posting list tuples are formed by merging together existing duplicate
tuples. The physical representation of the items on an nbtree leaf page
is made more space efficient by deduplication, but the logical contents
of the page are not changed. Even unique indexes make use of
deduplication as a way of controlling bloat from duplicates whose TIDs
point to different versions of the same logical table row.
The lazy approach taken by nbtree has significant advantages over a GIN
style eager approach. Most individual inserts of index tuples have
exactly the same overhead as before. The extra overhead of
deduplication is amortized across insertions, just like the overhead of
page splits. The key space of indexes works in the same way as it has
since commit dd299df8 (the commit that made heap TID a tiebreaker
column).
Testing has shown that nbtree deduplication can generally make indexes
with about 10 or 15 tuples for each distinct key value about 2.5X - 4X
smaller, even with single column integer indexes (e.g., an index on a
referencing column that accompanies a foreign key). The final size of
single column nbtree indexes comes close to the final size of a similar
contrib/btree_gin index, at least in cases where GIN's posting list
compression isn't very effective. This can significantly improve
transaction throughput, and significantly reduce the cost of vacuuming
indexes.
A new index storage parameter (deduplicate_items) controls the use of
deduplication. The default setting is 'on', so all new B-Tree indexes
automatically use deduplication where possible. This decision will be
reviewed at the end of the Postgres 13 beta period.
There is a regression of approximately 2% of transaction throughput with
synthetic workloads that consist of append-only inserts into a table
with several non-unique indexes, where all indexes have few or no
repeated values. The underlying issue is that cycles are wasted on
unsuccessful attempts at deduplicating items in non-unique indexes.
There doesn't seem to be a way around it short of disabling
deduplication entirely. Note that deduplication of items in unique
indexes is fairly well targeted in general, which avoids the problem
there (we can use a special heuristic to trigger deduplication passes in
unique indexes, since we're specifically targeting "version bloat").
Bump XLOG_PAGE_MAGIC because xl_btree_vacuum changed.
No bump in BTREE_VERSION, since the representation of posting list
tuples works in a way that's backwards compatible with version 4 indexes
(i.e. indexes built on PostgreSQL 12). However, users must still
REINDEX a pg_upgrade'd index to use deduplication, regardless of the
Postgres version they've upgraded from. This is the only way to set the
new nbtree metapage flag indicating that deduplication is generally
safe.
Author: Anastasia Lubennikova, Peter Geoghegan
Reviewed-By: Peter Geoghegan, Heikki Linnakangas
Discussion:
https://postgr.es/m/55E4051B.7020209@postgrespro.ru
https://postgr.es/m/4ab6e2db-bcee-f4cf-0916-3a06e6ccbb55@postgrespro.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 558a9165e08 taught _bt_delitems_delete() to produce its own XID
horizon on the primary. Standbys no longer needed to generate their own
latestRemovedXid, since they could just use the explicitly logged value
from the primary instead. The deleted offset numbers array from the
xl_btree_delete WAL record was no longer used by the REDO routine for
anything other than deleting the items.
This enables a minor optimization: We now treat the array as buffer
state, not generic WAL data, following _bt_delitems_vacuum()'s example.
This should be a minor win, since it allows us to avoid including the
deleted items array in cases where XLogInsert() stores the whole buffer
anyway. The primary goal here is to make the code more maintainable,
though. Removing inessential differences between the two functions
highlights the fundamental differences that remain.
Also change xl_btree_delete to use uint32 for the size of the array of
item offsets being deleted. This brings xl_btree_delete closer to
xl_btree_vacuum. Furthermore, it seems like a good idea to use an
explicit-width integer type (the field was previously an "int").
Bump XLOG_PAGE_MAGIC because xl_btree_delete changed.
Discussion: https://postgr.es/m/CAH2-Wzkz4TjmezzfAbaV1zYrh=fr0bCpzuJTvBe5iUQ3aUPsCQ@mail.gmail.com
|
|
|
|
| |
Backpatch-through: update all files in master, backpatch legal files through 9.4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The REDO routine for nbtree's xl_btree_vacuum record type hasn't
performed a "pin scan" since commit 3e4b7d87 went in, so clearly there
isn't any point in VACUUM WAL-logging information that won't actually be
used. Finish off the work of commit 3e4b7d87 (and the closely related
preceding commit 687f2cd7) by removing the code that generates this
unused information. Also remove the REDO routine code disabled by
commit 3e4b7d87.
Replace the unneeded lastBlockVacuumed field in xl_btree_vacuum with a
new "ndeleted" field. The new field isn't actually needed right now,
since we could continue to infer the array length from the overall
record length. However, an upcoming patch to add deduplication to
nbtree needs to add an "items updated" field to xl_btree_vacuum, so we
might as well start being explicit about the number of items now.
(Besides, it doesn't seem like a good idea to leave the xl_btree_vacuum
struct without any fields; the C standard says that that's undefined.)
nbtree VACUUM no longer forces writing a WAL record for the last block
in the index. Writing out a WAL record with no items for the final
block was supposed to force processing of a lastBlockVacuumed field by a
pin scan.
Bump XLOG_PAGE_MAGIC because xl_btree_vacuum changed.
Discussion: https://postgr.es/m/CAH2-WzmY_mT7UnTzFB5LBQDBkKpdV5UxP3B5bLb7uP%3D%3D6UQJRQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
XLogFileNameP() is a wrapper routine able to build a palloc'd string for
a WAL segment name, which is used for error string generation. There
were several code paths where it gets called in a critical section,
where memory allocation is not allowed. This results in triggering
an assertion failure instead of generating the wanted error message.
Another, more annoying, problem is that if the allocation to generate
the WAL segment name fails on OOM, then the failure would be escalated
to a PANIC.
This removes the routine and all its callers are replaced with a logic
using a fixed-size buffer. This way, all the existing mistakes are
fixed and future ones are prevented.
Author: Masahiko Sawada
Reviewed-by: Michael Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/CA+fd4k5gC9H4uoWMLg9K_QfNrnkkdEw+-AFveob9YX7z8JnKTA@mail.gmail.com
|
|
|
|
|
|
|
| |
Oversight in commit e6d8069522. Since that commit changed the format of
XLOG_DBASE_DROP WAL record, XLOG_PAGE_MAGIC needs to be bumped.
Spotted by Michael Paquier
|
|
|
|
|
|
|
|
| |
This fixes various typos in docs and comments, and removes some orphaned
definitions.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
This is numbered take 7, and addresses a set of issues around:
- Fixes for typos and incorrect reference names.
- Removal of unneeded comments.
- Removal of unreferenced functions and structures.
- Fixes regarding variable name consistency.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/10bfd4ac-3e7c-40ab-2b2e-355ed15495e8@gmail.com
|
|
|
|
|
|
|
|
|
| |
Switch to 2.1 version of pg_bsd_indent. This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.
Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
|
|
|
|
|
|
|
|
| |
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.
Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to parallel development, gist added the missing conflict
information in c952eae52a3, while 558a9165e08 moved that computation
to the primary for the index types that already had it. Thus adapt
gist to also compute on the primary, using
index_compute_xid_horizon_for_tuples() instead of its own copy of the
logic.
This also adds pg_waldump support for XLOG_GIST_DELETE records, which
previously was not properly present.
Bumps WAL version.
Author: Andres Freund
Discussion: https://postgr.es/m/20190406050243.bszosdg4buvabfrt@alap3.anarazel.de
|