| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows recovery to notice certain incorrect recovery scenarios.
If a server has recovered to point X on timeline 5, and you restart
recovery, it better be on timeline 5 when it reaches point X again, not on
some timeline with a higher ID. This can happen e.g if you a standby server
is shut down, a new timeline appears in the WAL archive, and the standby
server is restarted. It will try to follow the new timeline, which is wrong
because some WAL on the old timeline was already replayed before shutdown.
Requires an initdb (or at least pg_resetxlog), because this adds a field to
the control file.
|
|
|
|
|
|
| |
f21bb9cfb5646e1793dcc9c0ea697bab99afa523.
We can't use type uint, so use uint32.
|
|
|
|
|
|
|
|
|
|
| |
Rename PGXACT->inCommit flag into delayChkpt flag,
and generalise comments to allow use in other situations,
such as the forthcoming potential use in checksum patch.
Replace wait loop to look for VXIDs with delayChkpt set.
No user visible changes, not behaviour changes at present.
Simon Riggs, reviewed and rebased by Jeff Davis
|
| |
|
|
|
|
|
|
|
| |
Update README to explain prerequisites for
correct access to LSN fields of a page.
Independent chunk removed from checksums
patch to reduce size of patch.
|
|
|
|
|
| |
It is now easier to see that it's a state machine, making the code easier
to understand overall.
|
|
|
|
|
|
|
|
| |
Allow support only for freezing tuples by explicit
command. Previous coding mistakenly extended
slightly beyond what was agreed as correct on -hackers.
So essentially a partial revoke of earlier work,
leaving just the COPY FREEZE command.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit c11130690d6dca64267201a169cfb38c1adec5ef in favor of
actually fixing the problem: namely, that we should never have been
modifying the checkpoint record's nextXid at this point to begin with.
The nextXid should match the state as of the checkpoint's logical WAL
position (ie the redo point), not the state as of its physical position.
It's especially bogus to advance it in some wal_levels and not others.
In any case there is no need for the checkpoint record to carry the
same nextXid shown in the XLOG_RUNNING_XACTS record just emitted by
LogStandbySnapshot, as any replay operation will already have adopted
that value as current.
This fixes bug #7710 from Tarvi Pillessaar, and probably also explains bug
#6291 from Daniel Farina, in that if a checkpoint were in progress at the
instant of XID wraparound, the epoch bump would be lost as reported.
(And, of course, these days there's at least a 50-50 chance of a checkpoint
being in progress at any given instant.)
Diagnosed by me and independently by Andres Freund. Back-patch to all
branches supporting hot standby.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we stored all xids mixed together.
Now we store top-level xids first, followed
by all subxids. Also skip logging any subxids
if the snapshot is suboverflowed, since there
are potentially large numbers of them and they
are not useful in that case anyway. Has value
in the envisaged design for decoding of WAL.
No planned effect on Hot Standby.
Andres Freund, reviewed by me
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If wal_level = hot_standby we update the checkpoint nextxid,
though in the case where a wraparound occurred half-way through
a checkpoint we would neglect updating the epoch also. Updating
the nextxid is arguably the wrong thing to do, but changing that
may introduce subtle bugs into hot standby startup, while updating
the value doesn't cause any known bugs yet. Minimal fix now to
HEAD and backbranches, wider fix later in HEAD.
Bug reported in #6291 by Daniel Farina and slightly differently in
Cause analysis and recommended fixes from Tom Lane and Andres Freund.
Applied patch is minimal version of Andres Freund's work.
|
|
|
|
|
| |
Previous comments left, but were too obscure
for such an important aspect of the system.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a relfilenode is created in this subtransaction or
a committed child transaction and it cannot otherwise
be seen by our own process, mark tuples committed ahead
of transaction commit for all COPY commands in same
transaction. If FREEZE specified on COPY
and pre-conditions met then rows will also be frozen.
Both options designed to avoid revisiting rows after commit,
increasing performance of subsequent commands after
data load and upgrade. pg_restore changes later.
Simon Riggs, review comments from Heikki Linnakangas, Noah Misch and design
input from Tom Lane, Robert Haas and Kevin Grittner
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation. The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY. This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions. Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.
To fix, add an additional boolean column "indislive" to pg_index, so that
the freshly-created and about-to-die states can be distinguished. (This
change obviously is only possible in HEAD. This patch will need to be
back-patched, but in 9.2 we'll use a kluge consisting of overloading the
formerly-impossible state of indisvalid = true and indisready = false.)
In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update. The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption. This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.
In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate. These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.
Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation. Previously we
could have been left with stale values of some fields in an index relcache
entry. It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.
In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.
This will need to be back-patched, but in a noticeably different form,
so I'm committing it to HEAD before working on the back-patch.
Problem reported by Amit Kapila, diagnosis by Pavan Deolassee,
fix by Tom Lane and Andres Freund.
|
|
|
|
|
| |
This is necessary (but not sufficient) to have them compilable outside
of a backend environment.
|
|
|
|
|
|
|
|
| |
This was apparently a typo, which caused recovery to think that it
immediately reached the end of backup, and allowed the database to start
up too early.
Reported by Jeff Janes. Backpatch to 9.2, where this code was introduced.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Files opened with BasicOpenFile or PathNameOpenFile are not automatically
cleaned up on error. That puts unnecessary burden on callers that only want
to keep the file open for a short time. There is AllocateFile, but that
returns a buffered FILE * stream, which in many cases is not the nicest API
to work with. So add function called OpenTransientFile, which returns a
unbuffered fd that's cleaned up like the FILE* returned by AllocateFile().
This plugs a few rare fd leaks in error cases:
1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile
2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't
use OpenTransientFile here because the fd is supposed to persist over
transaction boundaries.
3. lo_import/lo_export - fixed by using OpenTransientFile instead of
PathNameOpenFile.
In addition to plugging those leaks, this replaces many BasicOpenFile() calls
with OpenTransientFile() that were not leaking, because the code meticulously
closed the file on error. That wasn't strictly necessary, but IMHO it's good
for robustness.
The same leaks exist in older versions, but given the rarity of the issues,
I'm not backpatching this. Not yet, anyway - it might be good to backpatch
later, after this mechanism has had some more testing in master branch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When startup process opens a WAL segment after replaying part of it, it
validates the first page on the WAL segment, even though the page it's
really interested in later in the file. As part of the validation, it checks
that the TLI on the page header is >= the TLI it saw on the last page it
read. If the segment contains a timeline switch, and we have already
replayed it, and then re-open the WAL segment (because of streaming
replication got disconnected and reconnected, for example), the TLI check
will fail when the first page is validated. Fix that by relaxing the TLI
check when re-opening a WAL segment.
Backpatch to 9.0. Earlier versions had the same code, but before standby
mode was introduced in 9.0, recovery never tried to re-read a segment after
partially replaying it.
Reported by Amit Kapila, while testing a new feature.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When I moved ExecuteRecoveryCommand() from xlog.c to xlogarchive.c, I didn't
realize that it's called from the checkpoint process, not the startup
process. I tried to use InRedo variable to decide whether or not to attempt
cleaning up the archive (must not do so before we have read the initial
checkpoint record), but that variable is only valid within the startup
process.
Instead, let ExecuteRecoveryCommand() always clean up the archive, and add
an explicit argument to RestoreArchivedFile() to say whether that's allowed
or not. The caller knows better.
Reported by Erik Rijkers, diagnosis by Fujii Masao. Only 9.3devel is
affected.
|
|
|
|
|
|
|
|
| |
At commit all standby locks are released
for the top-level transaction, so searching
for locks for each subtransaction is both
pointless and costly (N^2) in the presence
of many AccessExclusiveLocks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the replay functions for WAL record types that modify more than
one page failed to ensure that those pages were locked correctly to ensure
that concurrent queries could not see inconsistent page states. This is
a hangover from coding decisions made long before Hot Standby was added,
when it was hardly necessary to acquire buffer locks during WAL replay
at all, let alone hold them for carefully-chosen periods.
The key problem was that RestoreBkpBlocks was written to hold lock on each
page restored from a full-page image for only as long as it took to update
that page. This was guaranteed to break any WAL replay function in which
there was any update-ordering constraint between pages, because even if the
nominal order of the pages is the right one, any mixture of full-page and
non-full-page updates in the same record would result in out-of-order
updates. Moreover, it wouldn't work for situations where there's a
requirement to maintain lock on one page while updating another. Failure
to honor an update ordering constraint in this way is thought to be the
cause of bug #7648 from Daniel Farina: what seems to have happened there
is that a btree page being split was rewritten from a full-page image
before the new right sibling page was written, and because lock on the
original page was not maintained it was possible for hot standby queries to
try to traverse the page's right-link to the not-yet-existing sibling page.
To fix, get rid of RestoreBkpBlocks as such, and instead create a new
function RestoreBackupBlock that restores just one full-page image at a
time. This function can be invoked by WAL replay functions at the points
where they would otherwise perform non-full-page updates; in this way, the
physical order of page updates remains the same no matter which pages are
replaced by full-page images. We can then further adjust the logic in
individual replay functions if it is necessary to hold buffer locks
for overlapping periods. A side benefit is that we can simplify the
handling of concurrency conflict resolution by moving that code into the
record-type-specfic functions; there's no more need to contort the code
layout to keep conflict resolution in front of the RestoreBkpBlocks call.
In connection with that, standardize on zero-based numbering rather than
one-based numbering for referencing the full-page images. In HEAD, I
removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are
still there in the header files in previous branches, but are no longer
used by the code.
In addition, fix some other bugs identified in the course of making these
changes:
spgRedoAddNode could fail to update the parent downlink at all, if the
parent tuple is in the same page as either the old or new split tuple and
we're not doing a full-page image: it would get fooled by the LSN having
been advanced already. This would result in permanent index corruption,
not just transient failure of concurrent queries.
Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old
tail page as a candidate for a full-page image; in the worst case this
could result in torn-page corruption.
heap_xlog_freeze() was inconsistent about using a cleanup lock or plain
exclusive lock: it did the former in the normal path but the latter for a
full-page image. A plain exclusive lock seems sufficient, so change to
that.
Also, remove gistRedoPageDeleteRecord(), which has been dead code since
VACUUM FULL was rewritten.
Back-patch to 9.0, where hot standby was introduced. Note however that 9.0
had a significantly different WAL-logging scheme for GIST index updates,
and it doesn't appear possible to make that scheme safe for concurrent hot
standby queries, because it can leave inconsistent states in the index even
between WAL records. Given the lack of complaints from the field, we won't
work too hard on fixing that branch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
errcontext() is typically used in an error context callback function, not
within an ereport() invocation like e.g errmsg and errdetail are. That means
that the message domain that the TEXTDOMAIN magic in ereport() determines
is not the right one for the errcontext() calls. The message domain needs to
be determined by the C file containing the errcontext() call, not the file
containing the ereport() call.
Fix by turning errcontext() into a macro that passes the TEXTDOMAIN to use
for the errcontext message. "errcontext" was used in a few places as a
variable or struct field name, I had to rename those out of the way, now
that errcontext is a macro.
We've had this problem all along, but this isn't doesn't seem worth
backporting. It's a fairly minor issue, and turning errcontext from a
function to a macro requires at least a recompile of any external code that
calls errcontext().
|
|
|
|
|
|
|
| |
This code was refactored in d5497b95 but an extra LWLockRelease call was
left behind.
Per report from Erik Rijkers
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit dfda6eba (which changed segment numbers to use a single 64 bit
variable instead of log/seg) introduced a couple of bogus choices of
exactly which log segment number variable to use in each case.
This is currently pretty harmless; in one place, the bogus number was
only being used in an error message for a pretty unlikely condition
(failure to fsync a WAL segment file). In the other, it was using a
global variable instead of the local variable; but all callsites were
passing the value of the global variable anyway.
No need to backpatch because that commit is not on earlier branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This prevents surprising behavior when a FOR EACH ROW trigger
BEFORE UPDATE or BEFORE DELETE directly or indirectly updates or
deletes the the old row. Prior to this patch the requested action
on the row could be silently ignored while all triggered actions
based on the occurence of the requested action could be committed.
One example of how this could happen is if the BEFORE DELETE
trigger for a "parent" row deleted "children" which had trigger
functions to update summary or status data on the parent.
This also prevents similar surprising problems if the query has a
volatile function which updates a target row while it is already
being updated.
There are related issues present in FOR UPDATE cursors and READ
COMMITTED queries which are not handled by this patch. These
issues need further evalution to determine what change, if any, is
needed.
Where the new error messages are generated, in most cases the best
fix will be to move code from the BEFORE trigger to an AFTER
trigger. Where this is not feasible, the trigger can avoid the
error by re-issuing the triggering statement and returning NULL.
Documentation changes will be submitted in a separate patch.
Kevin Grittner and Tom Lane with input from Florian Pflug and
Robert Haas, based on problems encountered during conversion of
Wisconsin Circuit Court trigger logic to plpgsql triggers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an SMgrRelation is not "owned" by a relcache entry, don't allow it to
live past transaction end. This design allows the same SMgrRelation to be
used for blind writes of multiple blocks during a transaction, but ensures
that we don't hold onto such an SMgrRelation indefinitely. Because an
SMgrRelation typically corresponds to open file descriptors at the fd.c
level, leaving it open when there's no corresponding relcache entry can
mean that we prevent the kernel from reclaiming deleted disk space.
(While CacheInvalidateSmgr messages usually fix that, there are cases
where they're not issued, such as DROP DATABASE. We might want to add
some more sinval messaging for that, but I'd be inclined to keep this
type of logic anyway, since allowing VFDs to accumulate indefinitely
for blind-written relations doesn't seem like a good idea.)
This code replaces a previous attempt towards the same goal that proved
to be unreliable. Back-patch to 9.1 where the previous patch was added.
|
|
|
|
|
| |
I extracted the refactoring patch from a larger patch that contained other
changes too, but missed one unintentional change and didn't test enough...
|
|
|
|
|
| |
This is just refactoring with no user-visible effect, to make the code more
readable.
|
| |
|
|
|
|
|
|
|
| |
The comment explaining the naming of timeline history files was wrong, and
the history file was not being arhived.
Pointed out by Fujii Masao.
|
|
|
|
| |
Hopefully this makes the *BSD buildfarm animals happy.
|
|
|
|
|
|
| |
This is just refactoring, to make the functions accessible outside xlog.c.
A followup patch will make use of that, to allow fetching timeline history
files over streaming replication.
|
|
|
|
|
|
|
|
| |
This fixes another error in commit 9e8da0f75731aaa7605cf4656c21ea09e84d2eb1.
I neglected to make the mark/restore functionality save and restore the
current set of array key values, which led to strange behavior if an
IndexScan with ScalarArrayOpExpr quals was used as the inner side of a
mergejoin. Per bug #7570 from Melese Tesfaye.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These calls were removed in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7
as part of a general refactoring and improvement of DDL locking. However,
there's a problem not solved by the rewrite, which is that GRANT/REVOKE
update pg_class.relacl without taking any particular lock on the target
table as such. If another backend fails to do AcceptInvalidationMessages,
it won't notice a recently-committed change in ACLs. Bug #7557 from Piotr
Czachur demonstrates that there's at least one code path in 9.2.0 in which
a command fails to do any AcceptInvalidationMessages calls at all, if the
current transaction already holds all the locks it will need.
Since we're hard up against the release deadline for 9.2.1, fix this by
putting back the AcceptInvalidationMessages calls in heap_openrv and
heap_openrv_extended, thereby restoring the historical behavior in this
area. We ought to look for a more elegant and perhaps more bulletproof
solution, but there's no time for that right now.
|
|
|
|
|
|
|
| |
This can result in buffers failing to be properly flushed at
checkpoint time, leading to data loss.
Report, diagnosis, and patch by Jeff Davis.
|
|
|
|
|
|
|
|
|
|
|
| |
When the startup process restores a WAL file from the archive, it deletes
any old file with the same name and renames the new file in its place. On
Windows, however, when a file is deleted, it still lingers as long as a
process holds a file handle open on it. With cascading replication, a
walsender process can hold the old file open, so the rename() in the startup
process would fail. To fix that, rename the old file to a temporary name, to
make the original file name available for reuse, before deleting the old
file.
|
|
|
|
|
|
|
|
| |
Give the correct name of the GUC parameter being complained of.
Also, emit a more suitable SQLSTATE (INVALID_PARAMETER_VALUE,
not the default INTERNAL_ERROR).
Gurjeet Singh, errcode adjustment by me
|
|
|
|
| |
It doesn't really need rel.h; relcache.h is enough.
|
|
|
|
| |
Reported by Peter Eisentraut.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cascading replication code assumed that the current RecoveryTargetTLI
never changes, but that's not true with recovery_target_timeline='latest'.
The obvious upshot of that is that RecoveryTargetTLI in shared memory needs
to be protected by a lock. A less obvious consequence is that when a
cascading standby is connected, and the standby switches to a new target
timeline after scanning the archive, it will continue to stream WAL to the
cascading standby, but from a wrong file, ie. the file of the previous
timeline. For example, if the standby is currently streaming from the middle
of file 000000010000000000000005, and the timeline changes, the standby
will continue to stream from that file. However, the WAL on the new
timeline is in file 000000020000000000000005, so the standby sends garbage
from 000000010000000000000005 to the cascading standby, instead of the
correct WAL from file 000000020000000000000005.
This also fixes a related bug where a partial WAL segment is restored from
the archive and streamed to a cascading standby. The code assumed that when
a WAL segment is copied from the archive, it can immediately be fully
streamed to a cascading standby. However, if the segment is only partially
filled, ie. has the right size, but only N first bytes contain valid WAL,
that's not safe. That can happen if a partial WAL segment is manually copied
to the archive, or if a partial WAL segment is archived because a server is
started up on a new timeline within that segment. The cascading standby will
get confused if the WAL it received is not valid, and will get stuck until
it's restarted. This patch fixes that problem by not allowing WAL restored
from the archive to be streamed to a cascading standby until it's been
replayed, and thus validated.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This gets rid of a dangerous-looking use of the not-volatile XLogCtl
pointer in a couple of spinlock-protected sections, where the normal
coding rule is that you should only access shared memory through a
pointer-to-volatile. I think the risk is only hypothetical not actual,
since for there to be a bug the compiler would have to move the spinlock
acquire or release across the memcpy() call, which one sincerely hopes
it will not. Still, it looks cleaner this way.
Per comment from Daniel Farina and subsequent discussion.
|
|
|
|
|
|
|
|
| |
This is mostly cosmetic, but it does eliminate a speculative portability
issue. The previous coding ignored the fact that sum_grow could easily
overflow (in fact, it could be summing multiple IEEE float infinities).
On a platform where that didn't guarantee to produce a positive result,
the code would misbehave. In any case, it was less than readable.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reduces unnecessary exposure of other headers through htup.h, which
is very widely included by many files.
I have chosen to move the function prototypes to the new file as well,
because that means htup.h no longer needs to include tupdesc.h. In
itself this doesn't have much effect in indirect inclusion of tupdesc.h
throughout the tree, because it's also required by execnodes.h; but it's
something to explore in the future, and it seemed best to do the htup.h
change now while I'm busy with it.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every time the best-tuple-found-so-far changes, we need to reset all
the penalty values in which_grow[] to the penalties for the new best
tuple. The old code failed to do this, resulting in inferior index
quality.
The original patch from Alexander Korotkov was just two lines; I took
the liberty of fleshing that out by adding a bunch of comments that I
hope will make this logic easier for others to understand than it was
for me.
|
|
|
|
|
|
|
| |
This includes two micro-optimizations to the tight inner loop in descending
the SP-GiST tree: 1. avoid an extra function call to index_getprocinfo when
calling user-defined choose function, and 2. avoid a useless palloc+pfree
when node labels are not used.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The heapam XLog functions are used by other modules, not all of which
are interested in the rest of the heapam API. With this, we let them
get just the XLog stuff in which they are interested and not pollute
them with unrelated includes.
Also, since heapam.h no longer requires xlog.h, many files that do
include heapam.h no longer get xlog.h automatically, including a few
headers. This is useful because heapam.h is getting pulled in by
execnodes.h, which is in turn included by a lot of files.
|
|
|
|
|
| |
This lets files that are mere users of ResourceOwner not automatically
include the headers for stuff that is managed by the resowner mechanism.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves on commit 51fed14d73ed3acd2282b531fb1396877e44e86a by
eliminating the assumption that we can form <some pointer value> +
<some offset> without overflow. The entire point of those tests is that
we don't trust the offset value, so coding them in a way that could wrap
around if the buffer happens to be near the top of memory doesn't seem
sound. Instead, track the remaining space as a size_t variable and
compare offsets against that.
Also, improve comment about why we need the extra early check on
xl_tot_len.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a WAL record header was split across pages, but xl_tot_len was 0, we
would get confused and conclude that we had already read the whole record,
and proceed to CRC check it. That can lead to a crash in RecordIsValid(),
which isn't careful to not read beyond end-of-record, as defined by
xl_tot_len.
Add an explicit sanity check for xl_tot_len <= SizeOfXlogRecord. Also,
make RecordIsValid() more robust by checking in each step that it doesn't
try to access memory beyond end of record, even if a length field in the
record's or a backup block's header is bogus.
Per report and analysis by Tom Lane.
|
| |
|